Training
The most compute-intensive part of the ML development life cycle is the training process. Training an ML algorithm can take seconds in the simplest case or days when the input dataset is enormous and the algorithm requires many iterations to converge. The latter case is usually observed with deep learning techniques. For example, DeepMinds AlphaGo Zero algorithm took forty days to fully master the game of Go, even though it was proficient after only three[22]. Many algorithms that operate on smaller datasets and problems other than image or sound recognition will not require such a large amount of time or computational resource.
While the algorithm is training, particularly if the training phase will take a long time, it is useful to have some real-time measures of how well the training is going, so that it can be interrupted, re-configured, and restarted without waiting for the training to complete. These metrics are typically classified as loss metrics, where loss refers to the notional error that the algorithm makes either on the training or validation subsets.
Some of the most common loss metrics in prediction problems are as follows:
- Mean square error (MSE) measures the sum of the squared distance between the output variable and the predicted values.
- Mean absolute error (MAE) measures the sum of the absolute distance between the output variable and the predicted values.
- Huber loss is a combination of the MSE and MAE that is more robust to outliers while remaining a good estimator of both the mean and median loss.
Some of the most common loss metrics in classification problems are as follows:
- Logarithmic loss measures the accuracy of the classifier by placing a penalty on false classifications. It is closely related to cross-entropy loss.
- Focal loss is a newer loss func aimed at preventing false negatives when the input dataset is sparse[23].