Important note In this example, you have only set two dimensions for each data point (dimensions x and y). In real use cases, you can see far more dimensions, and that is why clustering algorithms play a very important role in identifying groups in the data in a more automated fashion. Hopefully, you have enjoyed […]
Computing K-Means step by step In this example, you will simulate K-Means in a very small dataset, with only two columns (x and y) and six data points (A, B, C, D, E, and F), as defined in Table 6.8. Point x y A 1 1 B 2 2 C 5 5 D 5 6 […]
Unsupervised learning AWS provides several unsupervised learning algorithms for the following tasks: Let us start by talking about clustering and how the most popular clustering algorithm works: K-Means. Clustering Clustering algorithms are very popular in data science. Basically, they aim to identify similar groups in a given dataset, also known as clusters. Clustering algorithms belong […]
Understanding DeepAR The DeepAR forecasting algorithm is a built-in SageMaker algorithm that is used to forecast a one-dimensional time series using a Recurrent Neural Network (RNN). Traditional time series algorithms, such as ARIMA and ETS, are designed to fit one model per time series. For example, if you want to forecast sales per region, you […]
Checking the stationarity of time series Decomposing time series and understanding how their components interact with additive and multiplicative models is a great achievement! However, the more you learn, the more you want to go deeper into the problem. Maybe you have realized that time series without trend and seasonality are easier to predict than […]
Important note The term weaker is used in this context to describe very simple decision trees. Although XGBoost is much more robust than a single decision tree, it is important to go into the exam with a clear understanding of what decision trees are and their main configurations. By the way, they are the base […]
Working with classification models You have been learning what classification models are throughout this book. However, now, you are going to look at some algorithms that are suitable for classification problems. Keep in mind that there are hundreds of classification algorithms out there, but since you are preparing for the AWS Certified Machine Learning Specialty […]
Regression modeling on AWS AWS has a built-in algorithm known as linear learner, where you can implement linear regression models. The built-in linear learner uses Stochastic Gradient Descent (SGD) to train the model. Important note You will learn more about SGD when neural networks are discussed. For now, you can look at SGD as an […]
Important note In Chapter 7, Evaluating and Optimizing Models, you will learn about evaluation metrics. For instance, you will learn that each type of model may have its own set of evaluation metrics. Regression models are commonly evaluated with Mean Squared Error (MSE) and Root Mean Squared Error (RMSE). In other words, apart from R, R […]
Least squares method There are different ways to find the slope and y intercept of a line, but the most used method is known as the least squares method. The principle behind this method is simple: you have to find the best line that reduces the sum of squared error. In Figure 6.1, you can […]