After you design all the instance layers to be scalable, you should take advantage of the AWS Autoscaling service to automate the scale-in and scale-out operations for your application layers based on performance metrics—for example, EC2 CPU usage, network capacities, and other metrics captured in the CloudWatch service.
The AutoScaling service can scale the following AWS services:
EC2: Add or remove instances from an EC2 AutoScaling group.
EC2 Spot Fleets: Add or remove instances from a Spot Fleet request.
ECS: Increase or decrease the number of containers in an ECS service.
DynamoDB: Increase or decrease the provisioned read and write capacity.
RDS Aurora: Add or remove Aurora read replicas from an Aurora DB cluster.
To create an autoscaling configuration on EC2, you need the following:
EC2 Launch template: Specifies the instance type, AMI, key pair, block device mapping, and other features the instance should be created with.
Scaling policy: Defines a trigger that specifies a metric ceiling (for scaling out) and floor (for scaling in). Any breach of the floor or ceiling for a certain period of time triggers autoscaling.
EC2 AutoScaling group: Defines scaling limits and the minimum, maximum, and desired numbers of instances. You need to provide a launch configuration and a scaling policy to apply during a scaling event.
Traditionally, scaling policies have been designed with dynamic scaling in mind. For example, a common setup would include
An AutoScaling group with a minimum of 1 and a maximum of 10 instances
A CPU % ceiling of 70 percent for scale-out
A CPU % floor of 30 percent for scale-in
A breach duration of 10 minutes
A scaling definition of +/− 33 percent capacity on each scaling event
The application is now designed to operate at a particular scale between 30 and 70 percent aggregate CPU usage of the AutoScaling group. After the ceiling is breached for 10 minutes, the Autoscaling service adds a third more instances to the AutoScaling group. If you are running one instance, it adds another because it needs to meet 33 percent or more of the capacity. If you are running two instances, it also adds one more; however, at three instances, it needs to add two more instances to meet the rules set out in the scaling policy. When the application aggregate CPU usage falls below 30 percent for 10 minutes, the AutoScaling group is reduced by 33 percent, and the appropriate number of instances is removed each time the floor threshold is breached. Figure 4.4 illustrates dynamic scaling.
FIGURE 4.4 Dynamic scaling
The AutoScaling configuration also has a desired instance count. This feature enables you to scale manually and override the configuration as per the scaling policy. You can set the desired count to any size at any time and resize the AutoScaling group accordingly. This capability is useful if you have knowledge of an upcoming event that will result in an increase of traffic to your site. You can prepare your environment to meet the demand in a much better way because you can increase the AutoScaling group preemptively in anticipation of the traffic.
You can also set up a schedule to scale if you have a very predictable application. Perhaps it is a service being used only from 9 a.m. to 5 p.m. each day. You simply set the scale-out to happen at 8 a.m. in anticipation of the application being used and then set a scale-in scheduled action at 6 p.m. after the application is not being used anymore. This way you can easily reduce the cost of operating intermittently used applications by over 50 percent. Figure 4.5 illustrates scheduled scaling.
FIGURE 4.5 Scheduled scaling