AWS Batch – AWS Services for Data Migration and Processing – MLS-C01 Study Guide

AWS Batch

This is a managed batch-processing product. If you are using AWS Batch, then jobs can be run without end user interaction or can be scheduled to run:

  • Imagine an event-driven application that launches a Lambda function to process the data stored in S3. If the processing time goes beyond 15 minutes, then Lambda stops the execution and fails. For such scenarios, AWS Batch is a better solution, where computation-heavy workloads can be scheduled or driven through API events.
  • AWS Batch is a good fit for use cases where a longer processing time is required or more computation resources are needed.
  • AWS Batch jobs can be a script or an executable. One job can depend on another job. A job needs to be defined, such as who can run the job (with IAM permissions), where the job can be run (resources to be used), mount points, and other metadata.
  • Jobs are submitted to queues, where they wait for compute environment capacity. These queues are associated with one or more compute environments.
  • Compute environments do the actual work of executing the jobs. These can be ECS or EC2 instances, or any computing resources. You can define their sizes and capacities too.
  • Environments receive jobs from the queues based on their priority and execute them. They can be managed or unmanaged compute environments.
  • AWS Batch can store the metadata in DynamoDB for further use and can also store the output in an S3 bucket.

Note

If you get a question in the exam on an event-style workload that requires flexible compute, a higher disk space, no time limit (more than 15 minutes), or an effective resource limit, then the answer is likely to be AWS Batch.

Summary

In this chapter, you learned about different ways of processing data in AWS. You also learned the capabilities in terms of extending your data centers to AWS, migrating data to AWS, and the ingestion process. You learned about the various ways of using data to process it and make it ready for analysis. You understood the magic of using a data catalog, which helps you to query your data via AWS Glue and Athena.

In the next chapter, you will learn about various machine learning algorithms and their usage.