Introducing AWS Storage Gateway – Designing Networks for Complex Organizations – SAP-C02 Study Guide

Introducing AWS Storage Gateway

AWS Storage Gateway is a service that provides a series of solutions to expand your storage infrastructure into the AWS cloud for purposes such as data migration, file shares, backup, and archiving. It uses standard protocols to access AWS storage services such as Amazon Simple Storage Service (S3), Amazon S3 Glacier, Amazon Elastic Block Store (EBS) snapshots, and Amazon FSx.

There are three different flavors of Storage Gateway as listed here:

  • File Gateway
  • Volume Gateway
  • Tape Gateway

The following section dives into the details of each.

File Gateway

File Gateway is nowadays further split into two distinct types: S3 File Gateway and FSx File Gateway.

S3 File Gateway

Initially the only available type of file gateway when AWS Storage Gateway launched, S3 File Gateway allows you to store files on S3 transparently accessible from your on-premises environment through the Network File System (NFS) and Server Message Block (SMB) protocols. S3 File Gateway does a one-to-one mapping of your files to S3 objects and stores the file metadata (for example, Portable Operating System Interface (POSIX) file access control lists (ACLs)) in the S3 object metadata. The files are written synchronously to the file gateway local cache before being copied over to S3 asynchronously.

Concretely, S3 File Gateway comes either as a preset hardware appliance or as a software appliance that you deploy in your on-premises environment. The software appliance consists of a virtual machine (VM) that can run either on VMware Elastic Sky X (ESX), Microsoft Hyper-V, or a Linux kernel-based VM (KVM) hypervisor (but also on Amazon EC2 instances, should you need to).

See the following diagram for an illustration of how S3 File Gateway works:

Figure 2.7: Amazon S3 File Gateway

Once deployed and configured, your servers on-premises can use it like any other file share through the NFS and SMB protocols. Multiple elements can condition the performance of your gateway, but key factors are CPUs, local disk size, and network capacity.

The CPU resources and network capacity available to the appliance will directly influence the amount of data the gateway can process in parallel. The local disk size assigned to the file gateway will condition the cache size (on the hardware appliance, this is obviously constrained by the amount of physical storage available, so it is best to think it through before ordering the appliance). The cache size is to be determined such that it provides enough capacity to store your most frequently accessed files so that they benefit from low-latency access. On the software appliance, you can always add more cache capacity (additional storage volumes) later if you realize that your cache is undersized.

In terms of security, it remains your responsibility to control and manage access to the S3 bucket(s) sitting behind the gateway and to follow best practices. Therefore, remember to set up the right permissions (Identity and Access Management (IAM) role identity-based policies and/or S3 bucket policies) accordingly, to follow a least-privileges approach.

Because the files are ultimately stored as objects on S3, you also have the freedom to use the rich set of capabilities Amazon S3 provides to manage their life cycle, such as life cycle policies, versioning, cross-replication rules, and so on.

Finally, back up your file gateway storage. AWS Backup integrates with AWS Storage Gateway, so you can back up your file gateway storage to AWS. AWS Backup stores the gateway backup on Amazon S3 as EBS snapshots that can later be restored either on-premises or on AWS.