It is now time to learn how to evaluate and optimize machine learning models. During the process of modeling, or even after model completion, you might want to understand how your model is performing. Each type of model has its own set of metrics that can be used to evaluate performance, and that is what you are going to study in this chapter.
Apart from model evaluation, as a data scientist, you might also need to improve your model’s performance by tuning the hyperparameters of your algorithm. You will take a look at some nuances of this modeling task.
In this chapter, the following topics will be covered:
Alright, time to rock it!
There are several different scenarios in which you might want to evaluate model performance. Some of them are as follows:
Important note
The term model drift is used to refer to the problem of model deterioration. When you are building a machine learning model, you must use data to train the algorithm. This set of data is known as training data, and it reflects the business rules at a particular point in time. If these business rules change over time, your model will probably fail to adapt to those changes. This is because it was trained on top of another dataset, which was reflecting another business scenario. To solve this problem, you must retrain the model so that it can consider the rules of the new business scenario.
Model evaluations are commonly inserted in the context of testing. You have learned about holdout validation and cross-validation before. However, both testing approaches share the same requirement: they need a metric in order to evaluate performance.
These metrics are specific to the problem domain. For example, there are specific metrics for regression models, classification models, clustering, natural language processing, and more. Therefore, during the design of your testing approach, you have to consider what type of model you are building in order to define the evaluation metrics.
In the following sections, you will take a look at the most important metrics and concepts that you should know to evaluate your models.
Classification models are one of the most traditional classes of problems that you might face, either during the exam or during your journey as a data scientist. A very important artifact that you might want to generate during the classification model evaluation is known as a confusion matrix.
A confusion matrix compares your model predictions against the real values of each class under evaluation. Figure 7.1 shows what a confusion matrix looks like in a binary classification problem:
Figure 7.1 – A confusion matrix
There are the following components in a confusion matrix:
In a perfect scenario, your confusion matrix will have only true positive and true negative cases, which means that your model has an accuracy of 100%. In practical terms, if that type of scenario occurs, you should be skeptical instead of happy, since it is expected that your model will contain some level of errors. If your model does not contain errors, you are likely to be suffering from overfitting issues, so be careful.
Once false negatives and false positives are expected, the best you can do is prioritize one of them. For example, you can reduce the number of false negatives by increasing the number of false positives and vice versa. This is known as the precision versus recall trade-off. Let’s take a look at these metrics next.