Data standardization – Data Preparation and Transformation – MLS-C01 Study Guide

Data standardization Data standardization is another scaling method that transforms the distribution of the data, so that the mean will become 0 and the standard deviation will become 1. Figure 4.4 formally describes this scaling technique, where X represents the value to be transformed, µ refers to the mean of X, and σ is the […]

Important note – Data Preparation and Transformation – MLS-C01 Study Guide

Important note You will learn about these algorithms, along with the appropriate details, in the later chapters of this book. For instance, you can look at entropy and information gain as two types of metrics used by decision trees to check feature importance. Knowing the predictive power of each feature helps the algorithm define the […]

Dealing with numerical features – Data Preparation and Transformation – MLS-C01 Study Guide

Dealing with numerical features In terms of numerical features (discrete and continuous), you can think of transformations that rely on the training data and others that rely purely on the (individual) observation being transformed. Those who rely on the training data will use the training set to learn the necessary parameters during fit, and then […]

Applying binary encoding – Data Preparation and Transformation – MLS-C01 Study Guide

Applying binary encoding For those types of variables with a higher number of unique categories, a potential approach to creating a numerical representation for them is applying binary encoding. In this approach, the goal is transforming a categorical variable into multiple binary columns, but minimizing the number of new columns. This process consists of three […]

Important note – Data Preparation and Transformation – MLS-C01 Study Guide

Important note Before feeding any ML algorithm with data, make sure your feature types have been properly identified. In theory, if you are happy with your features and have properly classified each of them, you should be ready to go into the modeling phase of the CRISP-DM methodology, shouldn’t you? Well, maybe not. There are […]

Data Preparation and Transformation – MLS-C01 Study Guide

You have probably heard that data scientists spend most of their time working on data-preparation-related activities. It is now time to explain why that happens and what types of activities they work on. In this chapter, you will learn how to deal with categorical and numerical features, as well as how to apply different techniques […]

Exam Readiness Drill – Chapter Review Questions – AWS Services for Data Migration and Processing – MLS-C01 Study Guide

Exam Readiness Drill – Chapter Review Questions Apart from a solid understanding of key concepts, being able to think quickly under time pressure is a skill that will help you ace your certification exam. That is why working on these skills early on in your learning journey is key. Chapter review questions are designed to […]

AWS Batch – AWS Services for Data Migration and Processing – MLS-C01 Study Guide

AWS Batch This is a managed batch-processing product. If you are using AWS Batch, then jobs can be run without end user interaction or can be scheduled to run: Note If you get a question in the exam on an event-style workload that requires flexible compute, a higher disk space, no time limit (more than […]

Processing stored data on AWS – AWS Services for Data Migration and Processing – MLS-C01 Study Guide

Processing stored data on AWS There are several services for processing the data stored in AWS. You will learn about AWS Batch and AWS Elastic MapReduce (EMR) in this section. EMR is a product from AWS that primarily runs MapReduce jobs and Spark applications in a managed way. AWS Batch is used for long-running, compute-heavy […]

AWS Database Migration Service – AWS Services for Data Migration and Processing – MLS-C01 Study Guide

AWS Database Migration Service There are several situations when an organization might decide to migrate their databases from one to another, such as the need for better performance, enhanced security, or advanced features or to avoid licensing costs from vendors. If an organization wants to expand its business to a different geolocation, it will need […]