Snowball, Snowball Edge, and Snowmobile These belong to the same product category or family for the physical transfer of data between business operating locations and AWS. To move a large amount of data into and out of AWS, you can use any of the three: AWS DataSync AWS DataSync is designed to move data from […]
AWS Storage Gateway Storage Gateway is a hybrid storage virtual appliance. It can run in three different modes – File Gateway, Tape Gateway, and Volume Gateway. It can be used for the extension, migration, and backups of an on-premises data center to AWS:
Storing and transforming real-time data using Kinesis Data Firehose There are a lot of use cases that require data to be streamed and stored for future analytics purposes. To overcome such problems, you can write a Kinesis consumer to read the Kinesis stream and store the data in S3. This solution needs an instance or […]
Processing real-time data using Kinesis Data Streams Kinesis is Amazon’s streaming service and can be scaled based on requirements. It has a level of persistence that retains data for 24 hours by default or optionally up to 365 days. Kinesis Data Streams is used for large-scale data ingestion, analytics, and monitoring: Note Amazon Kinesis shouldn’t […]
Querying S3 data using Athena Athena is a serverless service designed for querying data stored in S3. It is serverless because the client doesn’t manage the servers that are used for computation: Now, to help you understand this, here’s an example, where you will use AWSDataCatalog created in AWS Glue on the S3 data and […]
Getting hands-on with AWS Glue ETL components In this section, you will use the Data Catalog components created earlier to build a job. You will start by creating a job: This is optional. Then, click on the Run job button: Figure 3.6 – A screenshot of the AWS Glue ETL job Figure 3.7 – A […]
Features of AWS Glue AWS Glue is a completely managed serverless ETL service on AWS. It has the following features: AWS Glue has the Data Catalog, and that’s the secret to its success. It helps with discovering data from data sources and understanding a bit about it: As you now have a brief idea of […]
Technical requirements You can download the data used in the examples from GitHub, available here: https://github.com/PacktPublishing/AWS-Certified-Machine-Learning-Specialty-MLS-C01-Certification-Guide-Second-Edition/tree/main/Chapter03. Creating ETL jobs on AWS Glue In a modern data pipeline, there are multiple stages, such as generating data, collecting data, storing data, performing ETL, analyzing, and visualizing. In this section, you will cover each of these at a […]
Exam Readiness Drill For the first three attempts, don’t worry about the time limit. ATTEMPT 1 The first time, aim for at least 40%. Look at the answers you got wrong and read the relevant sections in the chapter again to fix your learning gaps. ATTEMPT 2 The second time, aim for at least 60%. […]
Amazon DynamoDB for NoSQL Database-as-a-Service Amazon DynamoDB is a NoSQL database-as-a-service product within AWS. It’s a fully managed key/value and document database. Accessing DynamoDB is easy via its endpoint. The input and output throughputs can be managed or scaled manually or automatically. It also supports data backup, point-in-time recovery, and data encryption. One example where […]