Bootstrapping methods Cross-validation is a good strategy to validate ML models, and you should try it in your daily activities as a data scientist. However, you should also know about other resampling techniques available out there. Bootstrapping is one of them. While cross-validation works with no replacement, a bootstrapping approach works with replacement. With replacement means […]
Important note As per Amazon’s docs, S3 provides read-after-write consistency for PUTs of new objects, which means that if you upload a new object or create a new object and you immediately try to read the object using its key, then you get the exact data that you just uploaded. However, for overwrites and deletes, […]
AWS provides a wide range of services to store your data safely and securely. There are various storage options available on AWS, such as block storage, file storage, and object storage. It is expensive to manage on-premises data storage due to the higher investment in hardware, admin overheads, and managing system upgrades. With AWS storage […]
Exam Readiness Drill – Chapter Review Questions Apart from a solid understanding of key concepts, being able to think quickly under time pressure is a skill that will help you ace your certification exam. That is why working on these skills early on in your learning journey is key. Chapter review questions are designed to […]
ML in the cloud ML has gone to the cloud and developers can now use it as a service. AWS has implemented ML services at different levels of abstraction. ML application services, for example, aim to offer out-of-the-box solutions for specific problem domains. AWS Lex is a very clear example of an ML application as […]
Introducing ML frameworks Being aware of some ML frameworks will put you in a much better position to pass the AWS Machine Learning Specialty exam. There is no need to master these frameworks since this is not a framework-specific certification; however, knowing some common terms and solutions will help you to understand the context of […]
Overfitting and underfitting ML models might suffer from two types of fitting issues: overfitting and underfitting. Overfitting means that your model performs very well on the training data but cannot be generalized to other datasets, such as testing and, even worse, production data. In other words, if you have an overfitted model, it only works […]
Shuffling your training set Now that you know what variance and data splitting are, you can go a little deeper into the training dataset requirements. You are very likely to find questions around data shuffling in the exam. This process consists of randomizing your training dataset before you start using it to fit an algorithm. […]
Data splitting Training and evaluating ML models are key tasks of the modeling pipeline. ML algorithms need data to find relationships among features in order to make inferences, but those inferences need to be validated before they are moved to production environments. The dataset used to train ML models is commonly called the training set. […]
The CRISP-DM modeling life cycle Modeling is a very common term used in ML when you want to specify the steps taken to solve a particular problem. For example, you could create a binary classification model to predict whether the transactions from Table 1.1 are fraudulent or not. A model, in this context, represents all […]