Important note 2 – Applying Machine Learning Algorithms – MLS-C01 Study Guide

Important note

The term weaker is used in this context to describe very simple decision trees.

Although XGBoost is much more robust than a single decision tree, it is important to go into the exam with a clear understanding of what decision trees are and their main configurations. By the way, they are the base model of many ensemble algorithms, such as AdaBoost, Random Forest, gradient boost, and XGBoost.

Decision trees are rule-based algorithms that organize decisions in the form of a tree, as shown in Figure 6.7.

Figure 6.7 – Example of what a decision tree model looks like

They are formed by a root node (at the very top of the tree), intermediary or decision nodes (in the middle of the tree), and leaf nodes (bottom nodes with no splits). The depth of the tree is given by the difference between the root node and the very last leaf node. For example, in Figure 6.7, the depth of the tree is 3.

The depth of the tree is one of the most important hyperparameters of this type of model and it is often known as the max depth. In other words, the max depth controls the maximum depth that a decision tree can reach.

Another very important hyperparameter of decision tree models is the minimum number of samples/observations in the leaf nodes. It is also used to control the growth of the tree.

Decision trees have many other types of hyperparameters, but these two are especially important for controlling how the model overfits. Decision trees with a high depth or a very small number of observations in the leaf nodes are likely to face issues during extrapolation/prediction.

The reason for this is simple: decision trees use data from the leaf nodes to make predictions, based on the proportion (for classification tasks) or average value (for regression tasks) of each observation/target variable that belongs to that node. Thus, the node should have enough data to make good predictions outside the training set.

If you encounter the term CART during the exam, you should know that it stands for Classification and Regression Trees, since decision trees can be used for classification and regression tasks.

To select the best variables to split the data in the tree, the model will choose the ones that maximize the separation of the target variables across the nodes. This task can be performed by different methods, such as Gini and information gain.

Forecasting models

Time series refers to data points that are collected on a regular basis with a sequence dependency. Time series have a measure, a fact, and a time unit, as shown in Figure 6.8.

Figure 6.8 – Time series statement

Additionally, time series can be classified as univariate or multivariate. A univariate time series contains just one variable connected across a period of time, while a multivariate time series contains two or more variables connected across a period. Figure 6.9 shows the univariate time series.

Figure 6.9 – Time series example

Time series can be decomposed as follows:

  • Observed or level: The average values of the series
  • Trend: Increasing, decreasing pattern (sometimes, there is no trend)
  • Seasonality: Regular peaks at specific periods of time (sometimes, there is no seasonality)
  • Noise: Something that cannot be explained

Sometimes, you can also find isolated peaks in the series that cannot be captured in a forecasting model. In such cases, you might want to consider those peaks as outliers. Figure 6.10 is a decomposition of the time series shown in Figure 6.9.

Figure 6.10 – Time series decomposition

It is also worth highlighting that you can use additive or multiplicative approaches to decompose time series. Additive models suggest that your time series adds each component to explain the target variable – that is, y(t) = level + trend + seasonality + noise.

Multiplicative models, on the other hand, suggest that your time series multiplies each component to explain the target variable – that is, y(t) = level * trend * seasonality * noise.

In the next section, you will take a closer look at time series components.