External load balancing is determined by the VPC subnet the ELB is placed in. If you create an Internet-facing load balancer, a public IP address will be assigned to the load balancer node.
External load balancers are accessible from the public Internet and will always have a public IP address assigned to the listener, as shown in Figure 4.3.
FIGURE 4.3 External load balancing
An external load balancer can accept connections from the public Internet with its public IP address and then distribute traffic on the backend with either public- or private-addressed targets. The backend server target groups can be either public or private IP addresses. For example, an EC2 instance does not require a public IP address to work with an external (public) load balancer and can reside in a private subnet in a VPC.
Autoscaling is an ELB feature that enables you to dynamically add and remove capacity based on your workload. This feature matches the compute servers offered to the current workload to save costs by using only the compute services you need at any given point in time. If your server workload grows as the site becomes more heavily used, autoscaling can automatically add services to meet the workload. Later, when the load drops, those servers will be removed, saving you the cost of paying for unused capacity.
Autoscaling groups can have a minimum and maximum number of servers which allows for redundancy. For example, if you have a web deployment that requires a minimum of four EC2 instances to meet the average workload, you can set the minimum capacity at four and then a maximum to not exceed to control costs of, for example, eight servers. If you should have a server fail and go below the minimum capacity, the server can be automatically replaced.
You set up autoscaling plans in the AWS console to monitor your applications and then automatically add or remove capacity based on the current workload. The console will also give you scaling recommendations. When the scaling plan is created, it uses combined predictive and dynamic scaling to match your workload. Autoscaling can also scan your resources and automatically discover resources to scale saving you the time of having to search your deployment for scaling candidates by consolidating this into one centralized automatic resource discovery area. The service lets you select optimization on either performance or cost, a combination of the two, or a custom policy that you define.
Predictive scaling uses machine learning to scale based on future traffic loads; both daily and weekly load patterns are recorded, and the ML algorithms predict-and-deploy compute services are based on those calculations. This saves research and costs. The capacity planning and provisioning are adjusted over time by the machine learning algorithms that AWS includes with the service. This self-adjusting feature will calculate the optimal resources and immediately add or remove capacity based on your workloads.
CloudWatch alarms can be defined to monitor metrics such as CPU utilization or connection counts to adjust capacity. For example, you can use predictive scaling to monitor the CPU utilization on your EC2 instances. When the hosting web servers reach a sustained 80 percent utilization, add two servers, and then when the utilization drops below 45 percent, remove one server at a time. This matches the compute power with the load on your web servers. Figure 4.4 shows the autoscaling configuration in the web console.
Autoscaling is available at no charge; you pay only for CloudWatch and the underlying services being used.