Extracting metrics from a confusion matrix – Evaluating and Optimizing Models – MLS-C01 Study Guide

Extracting metrics from a confusion matrix

The simplest metric that can be extracted from a confusion matrix is known as accuracy. Accuracy is given by the following equation, as shown in Figure 7.2:

Figure 7.2 – Formula for accuracy

For the sake of demonstration, Figure 7.3 shows a confusion matrix with data.

Figure 7.3 – A confusion matrix filled with some examples

According to Figure 7.3, the accuracy would be (100 + 90) / 210, which is equal to 0.90. There is a common issue that occurs when utilizing an accuracy metric, which is related to the balance of each class. Problems with highly imbalanced classes, such as 99% positive cases and 1% negative cases, will impact the accuracy score and make it useless.

For example, if your training data has 99% positive cases (the majority class), your model is likely to correctly classify most of the positive cases but work badly in the classification of negative cases (the minority class). The accuracy will be very high (due to the correctness of the classification of the positive cases), regardless of the bad results in the minority class classification.

The point is that on highly imbalanced problems, you usually have more interest in correctly classifying the minority class, not the majority class. That is the case in most fraud detection systems, for example, where the minority class corresponds to fraudulent cases. For imbalanced problems, you should look for other types of metrics, which you will learn about next.

Another important metric that you can extract from a confusion matrix is known as recall, which is given by the following equation, as shown in Figure 7.4:

Figure 7.4 – Formula for recall

In other words, recall is the number of true positives over the total number of positive cases. Recall is also known as sensitivity.

With the values in Figure 7.3, recall is given by 100 / 112, which is equal to 0.89. Precision, on the other hand, is given by the following formula, as shown in Figure 7.5:

Figure 7.5 – Formula for precision

In other words, precision is the number of true positives over the total number of predicted positive cases. Precision is also known as positive predictive power.

With the values in Figure 7.3, precision is given by 100 / 108, which is equal to 0.93. In general, you can increase precision at the cost of decreasing recall and vice versa. There is another model evaluation artifact in which you can play around with this precision versus recall trade-off. It is known as a precision-recall curve.

Precision-recall curves summarize the precision versus recall trade-off by using different probability thresholds. For example, the default threshold is 0.5, where any prediction above 0.5 will be considered true; otherwise, it is false. You can change the default threshold according to your need so that you can prioritize recall or precision. Figure 7.6 shows an example of a precision-recall curve:

Figure 7.6 – A precision-recall curve

As you can see in Figure 7.6, increasing the precision will reduce the amount of recall and vice versa. Figure 7.6 shows the precision/recall for each threshold for a gradient boosting model (as shown by the orange line) compared to a no-skill model (as shown by the blue dashed line). A perfect model will approximate the curve to the point (1,1), forming a squared corner on the top right-hand side of the chart.

Another visual analysis you can use on top of confusion matrixes is known as a Receiver Operating Characteristic (ROC) curve. ROC curves summarize the trade-off between the true positive rate and the false positive rate according to different thresholds, as in the precision-recall curve.

You already know about the true positive rate, or sensitivity, which is the same as what you have just learned about with the precision-recall curve. The other dimension of an ROC curve is the false positive rate, which is the number of false positives over the number of false positives plus true negatives.

In literature, you might find the false positive rate referred to as inverted specificity, represented by 1 – specificity. Specificity is given as the number of true negatives over the number of true negatives plus false positives. Furthermore, false-positive rates or inverted specificity are the same. Figure 7.7 shows what an ROC curve looks like:

Figure 7.7 – ROC curve

A perfect model will approximate the curve to the point (0,1), forming a squared corner on the top left-hand side of the chart. The orange line represents the trade-off between the true positive rate and the false positive rate of a gradient-boosting classifier. The dashed blue line represents a no-skill model, which cannot predict the classes properly.

To summarize, you can use ROC curves for fairly balanced datasets and precision-recall curves for moderate to imbalanced datasets.