At the top layer of the monitoring and alerting stack are security issues. These issues also encompass a wide range of aspects that need to be determined for each application beforehand. A range of different alerts can be configured for security issues, including but not limited to
Large numbers of failed login attempts: These could indicate brute-force break-in attempts to the application.
Sudden spikes in data transfer out: These could indicate a breach or data leak.
Attempts to assume roles from unknown locations: These could indicate a breach of credentials.
Large number of failed access attempts: These could indicate reconnaissance by a rogue actor.
The AWS infrastructure exposes a public HTTP API, and all calls either receive a 200-type HTTP response if the action is accepted and will be processed, or a 400-type or 500-type HTTP response, indicating the problem is with the query. All 400-type responses indicate there is an issue with the request. All 500-type responses indicate that there is an issue with the AWS infrastructure. In case of infrastructure issues, always make sure to repeat the request with an exponential back-off approach, meaning that you wait for an increasingly longer period of time before reissuing the request.
Here are some examples of HTTP 400-type and 500-type responses:
400 – bad request: Any 400 error includes a message like InvalidAction, MessageRejected, or RequestExpired. Specific responses by some services also indicate throttling. In case of throttling, you should retry the requests with exponential back-off.
403 – access denied: All IAM polices apply with equal weight, and a deny in one policy denies an action across all policies. Check all the policies attached to the user, group, or role. Check any inline policies and resource policies attached to buckets, queues, and so on.
404 – page not found: This error indicates the object, instance, or resource specified in the query does not exist.
500 – internal failure: This error indicates an internal error on an operational service on the AWS side. You can immediately retry the request and will probably succeed on the second try. If not, retry with exponential back-off.
503 – service unavailable: These errors are rare because they indicate a major failure in an AWS service. You can retry your request using exponential back-off. This way you ensure the request will succeed at some point after the issue is resolved.