First things first, it’s essential to consider your foundations, that is, your AWS environment, which must be able to accommodate the workload requirements. Two elements in particular must be tackled as they could impact the reliability of any workload: resource constraints and network topology.
Resource constraints can be further split into two types: service quotas and environmental constraints.
As mentioned previously, service quotas are default predefined values on each AWS account – on the one hand, to protect you from over-provisioning AWS resources, and on the other hand, to protect the AWS cloud from abuse. Different quotas apply to each service and could represent very different items and quantities. Some of them are adjustable and represent soft limits, while others cannot be changed and represent hard limits. To illustrate this, the VPC service has a number of quotas for various features. For instance, your VPC entitles you, by default, to up to five IPv4 Classless Inter-Domain Routing (CIDR) blocks and a single IPv6 CIDR block. The former limit is adjustable, but the latter isn’t. Thus, when designing your solution, you must take those service quotas into account: be very mindful of hard and soft limits, and put in place a mechanism to monitor your usage of the AWS services to detect whenever you’re getting close to any relevant quota limit. For soft limits, you can request any quota at stake to be raised by submitting a request via the Service Quotas console or API at any time.
Now that you have understood the importance and ways of monitoring and managing service quotas, you can review the second type of constraint that was mentioned earlier, that is, environmental constraints. These refer to the constraints imposed by the physical resources supporting the AWS infrastructure. For instance, it could be the amount of storage available on a physical disk used for Amazon EC2 instances, or the network bandwidth available between your AWS environment and your on-premises environment. Those environmental constraints may impact your solution, so it is key to bear them in mind. Imagine, for instance, that you are building an application on AWS that relies on data that is stored in an operational data store located in your on-premises environment. The bandwidth and latency of the network connection between your on-premises and AWS environments will naturally constrain the possible use cases.
It’s also paramount to plan your network topology when you architect for reliability. Several aspects of networking were discussed in Chapter 2, Designing Networks for Complex Organizations. You can review that chapter if essential networking concepts on AWS, such as VPCs, VPN, Direct Connect (DX), and Transit Gateway (TGW), are not entirely clear to you. Now, you can proceed to consider networking from the aspect of resiliency.
When you’re laying out the foundations in your AWS environment, you must prepare for the foreseeable future and be ready for the unknown. It could be quite painful to have to revise your entire network topology on AWS after a couple of projects because of decisions made without careful planning and forward thinking.
As a solutions architect, an essential part of your job consists of making decisions supported by a rationale, and not light-heartedly choosing to turn right or left. As much as possible, you want to use two-way doors, that is, to make reversible decisions. One-way doors, or irreversible decisions (or possibly reversible but at a very high cost), should be avoided or at least limited, and in any case delayed until you can’t further delay making that choice. Picking an EC2 instance type to deploy an application is an example of a two-way-door type of decision. You should make that decision without further ado, as soon as you have sufficient information to make an educated choice. You should check, possibly before going into production, whether the selected instance type is the optimal choice. If not, changing the instance type is in most cases straightforward and painless, even more so if you put in place the proper mechanisms, such as infrastructure as code, automated build and deployment, and rolling updates. Choosing between a hub-and-spoke and a mesh network topology is the type of one-way-door decision you want to make sure to do right the first time. Although you could object that there is always a way to migrate from one such network topology to the other one and that it is not entirely one-way, it is going to be such a painful exercise that you will likely regret it.
Among the many different things that you, as a solutions architect, will have to question is the right network topology for your organization. You need to lay out a futureproof topology that can accommodate not just your first workload but also future growth, and you must make sure that you can cope with failures.
The following sub-sections present some general recommendations based on best practices, that will help you avoid having to change a topology soon after having implemented it.