Supervised learning problems
Supervised learning problems aim to infer the best mapping between an input and output dataset based on provided labeled pairs of input/output. The labeled dataset acts as feedback for the algorithm, allowing it to gauge the optimality of its solution. For example, given a list of mean yearly crude oil prices from 2010-2018, you may wish to predict the mean yearly crude oil price of 2019. The error that the algorithm makes on the 2010-2018 years will allow the engineer to estimate its error on the target prediction year of 2019.
Given a labeled collection of handwritten digits, you may wish to predict the label of a previously unseen handwritten digit. Similarly, given a dataset of emails that are labeled as being either spam or not spam, a company that wants to create a spam filter would want to predict whether a previously unseen message was spam. All these problems are supervised learning problems.
Supervised ML problems can be further divided into prediction and classification:
- Classification attempts to label an unknown input sample with a known output value. For example, you could train an algorithm to recognize breeds of cats. The algorithm would classify an unknown cat by labeling it with a known breed.
- By contrast, prediction algorithms attempt to label an unknown input sample with either a known or unknown output value. This is also known as estimation or regression. A canonical prediction problem is time series forecasting, where the output value of the series is predicted for a time value that was not previously seen.
We will cover supervised algorithms in more detail in Chapter 3, Supervised Learning.