Sometimes, you might want to use a metric that summarizes precision and recall, instead of prioritizing one over the other. Two very popular metrics can be used to summarize precision and recall: F1 score and Area Under Curve (AUC).
The F1 score, also known as the F-measure, computes the harmonic mean of precision and recall. AUC summarizes the approximation of the area under the precision-recall curve.
That brings us to the end of this section on classification metrics. Let’s now take a look at the evaluation metrics for regression models.
Regression models are quite different from classification models since the outcome of the model is a continuous number. Therefore, the metrics around regression models aim to monitor the difference between real and predicted values.
The simplest way to check the difference between a predicted value (yhat) and its actual value (y) is by performing a simple subtraction operation, where the error will be equal to the absolute value of yhat – y. This metric is known as the Mean Absolute Error (MAE).
Since you usually have to evaluate the error of each prediction, i, you have to take the mean value of the errors. Figure 7.8 depicts formula that shows how this error can be formally defined:
Figure 7.8 – Formula for error of each prediction
Sometimes, you might want to penalize bigger errors over smaller errors. To achieve this, you can use another metric, known as the Mean Squared Error (MSE). The MSE will square each error and return the mean value.
By squaring errors, the MSE will penalize bigger ones. Figure 7.9 depicts formula that shows how the MSE can be formally defined:
Figure 7.9 – Formula for MSE
There is a potential interpretation problem with the MSE. Since it has to compute the squared error, it might be difficult to interpret the final results from a business perspective. The Root Mean Squared Error (RMSE) works around this interpretation issue, by taking the square root of the MSE. Figure 7.10 depicts the RMSE equation:
Figure 7.10 – Formula for RMSE
The RMSE is one of the most used metrics for regression models, since it can penalize larger errors and remains easy to interpret.
There are many more metrics that are suitable for regression problems, in addition to the ones that you have just learned. You will not learn about most of them here, but you will be introduced to a few more metrics that might be important for you to know.
One of these metrics is known as the Mean Absolute Percentage Error (MAPE). As the name suggests, the MAPE will compute the absolute percentage error of each prediction and then take the average value. Figure 7.11 depicts formula that shows how this metric is computed:
Figure 7.11 – Formula for MAPE
The MAPE is broadly used in forecasting models since it is very simple to interpret, and it provides a very good sense of how far (or close) the predictions are from the actual values (in terms of a percentage).
You have now completed this section on regression metrics. Next, you will learn about model optimization.