The S3 Glacier Flexible Retrieval storage tier provides you with a low-cost, durable storage archive with low fees for data retrieval. There are three tiers of retrieval speeds to bring back your data for access to your S3 bucket. However, unlike the Glacier storage service available from AWS, you do not have to wait for days until your data is available. The first availability tier is an expedited one that can return your objects in one to five minutes. The second tier is the standard retrieval tier, which restores objects in three to five hours. The third and final tier is the bulk tier. Objects in this bulk tier take around 12 hours to be restored.
Key Points to Remember about S3 Glacier Flexible Retrieval
Glacier Deep Archive can be a practical solution for your storage needs if you have items that you rarely access but are necessary to archive and retain. These can often be cases such as moving from tape backup to a digital tape backup system where you would only be retrieving the data once or twice per year and could withstand waiting 12 hours for data retrieval. These controls come with deep savings, as storage in Glacier Deep Archive only costs $1 per TB per month.
Key Points to Remember about S3 Glacier Deep Archive
Now that you fully understand the different storage tiers that can be used in the S3 service and how the cost and access options differ, you will look at how to use lifecycle policies to move objects automatically between the tiers without user interaction.
S3 Lifecycle policies provide a tool that helps you manage storage costs for objects residing on the Amazon S3 storage service for lengths of time greater than 24 hours. When adding an S3 Lifecycle configuration, a set of rules defining actions for the underlying objects stored in a particular S3 bucket, you can move those objects between different classes of storage tiers (both up and down) and expire/delete the objects altogether.
Lifecycle policies in Amazon S3 can apply to all objects in the bucket, or they can be made to work on items with particular prefixes (think of files that are being placed in a logs/ folder). They can also apply to files that have a specific set of tag values placed on them. If you have a bucket that is only being used for logs and nothing else, then you could craft the policy such that it moves all the objects with the same cadence. Suppose the bucket is multi-use, as in the case of a development team’s bucket where they have both static assets (e.g., pictures or images) and code assets, and they are storing their log files back to this same bucket as they and their service role has both read and write access to this bucket. In this case, you may need to place the Lifecycle policy only in the logs/ folder within the S3 bucket.
Figure 9.1: Object going through the S3 Lifecycle policy
If you look at the flow depicted in Figure 9.1, you will see that logs are initially ingested from a source to Kinesis Data Firehose. The logs go into the destination S3 bucket, where they are stored with the default tier of S3 Standard. This bucket has a Lifecycle policy on it, so in 30 days’ time, those initial log files that have been placed in the bucket will move from the default Standard tier to Standard-IA. After 60 days in the bucket, the log files are then moved to the S3 Glacier Instant Retrieval tier, adding to further cost savings for the customer. Finally, 365 days after being placed in the bucket by Kinesis Data Firehose, the files will be deleted by the Lifecycle policy without manual intervention.
In the following exercise, you will create a new S3 bucket and then add a Lifecycle policy to that bucket that will mock the end of the lifecycle shown in Figure 9.1 of deleting the file after 24 hours.