Supervised learning – Applying Machine Learning Algorithms – MLS-C01 Study Guide

Supervised learning AWS provides supervised learning algorithms for general purposes (regression and classification tasks) and more specific purposes (forecasting and vectorization). The list of built-in algorithms that can be found in these sub-categories is as follows: You will start by learning about regression models and the linear learner algorithm. Working with regression models Looking at […]

Storing the training data – Applying Machine Learning Algorithms – MLS-C01 Study Guide

Storing the training data First of all, you can use multiple AWS services to prepare data for machine learning, such as Elastic MapReduce (EMR), Redshift, Glue, and so on. After preprocessing the training data, you should store it in S3, in a format expected by the algorithm you are using. Table 6.1 shows the list […]

Applying Machine Learning Algorithms – MLS-C01 Study Guide

In the previous chapter, you learned about understanding data and visualization. It is now time to move on to the modeling phase and study machine learning algorithms! In the earlier chapters, you learned that building machine learning models requires a lot of knowledge about AWS services, data engineering, data exploration, data architecture, and much more. […]

Summary – Data Understanding and Visualization – MLS-C01 Study Guide

Summary You started this chapter by learning how to visualize relationships in the data. Scatter plots and bubble charts are the most important charts in this category to show relationships between two or three variables, respectively. Then, you moved to another category of data visualization, which aimed to make comparisons in the data. The most […]

Building key performance indicators – Data Understanding and Visualization – MLS-C01 Study Guide

Building key performance indicators Before you wrap up these data visualization sections, you need to be introduced to key performance indicators, or KPIs for short. A KPI is usually a single value that describes the results of a business indicator, such as the churn rate, net promoter score (NPS), return on investment (ROI), and so […]

Visualizing distributions in your data – Data Understanding and Visualization – MLS-C01 Study Guide

Visualizing distributions in your data Exploring the distribution of your feature is very important to understand some key characteristics of it, such as its skewness, mean, median, and quantiles. You can easily visualize skewness by plotting a histogram. This type of chart groups your data into bins or buckets and performs counts on top of […]

Visualizing comparisons in your data – Data Understanding and Visualization – MLS-C01 Study Guide

Visualizing comparisons in your data Comparisons are very common in data analysis and there are different ways to present them. Starting with the bar chart, you must have seen many reports that have used this type of visualization. Bar charts can be used to compare one variable among different classes – for example, a car’s […]

Data Understanding and Visualization – MLS-C01 Study Guide

Data visualization is an art! No matter how much effort you and your team put into data preparation and preliminary analysis for modeling, if you don’t know how to show your findings effectively, your audience may not understand the point you are trying to make. Often, such situations may be even worse when you are […]

Exam Readiness Drill – Chapter Review Questions – Data Preparation and Transformation – MLS-C01 Study Guide

Exam Readiness Drill – Chapter Review Questions Apart from a solid understanding of key concepts, being able to think quickly under time pressure is a skill that will help you ace your certification exam. That is why working on these skills early on in your learning journey is key. Chapter review questions are designed to […]

Important note 7 – Data Preparation and Transformation – MLS-C01 Study Guide

Important note You should be aware that there are many alternatives to co-occurrence matrices with a fixed context window, such as using TD-IDF vectorization or even simpler counters of words per document. The most important message here is that, somehow, you must come up with a numerical representation for each word. The last step is […]