The previous chapter showed you how the CloudWatch service can help collect and store logs from a myriad of services in AWS. You are now ready to turn your attention to the most cost-effective ways to retain those log files for long-term storage, along with the methods to pull out the necessary data from them.
One of the critical duties of a security professional is to assimilate all the information coming in from different sources and distinguish the relevant bits of information from that which is just noise. Services and applications in any environment (not just the cloud) constantly produce logs. Knowing which services and techniques can gather, collect, and then help you quickly sift through and analyze these logs is an essential skill for real-life situations as well as for the AWS Security competency exam.
Several services can help you with this task. This chapter will cover such services, including storing logs on the S3 storage service, using Amazon Kinesis Data Firehose to move the logs to other storage options, and using Amazon Athena and OpenSearch to search through the log files.
The following main topics will be covered in this chapter:
- Log storage options and their cost implications
- AWS OpenSearch Service
- Using AWS Kinesis to ship logs
- Running queries with Amazon Athena
Technical Requirements
You will require access to an active AWS account, the AWS Management Console, the CLI, and a text editor for this chapter.
Log Storage Options and Their Cost Implications
As you think about storing all the logs generated in your account, there are a few different factors that you should consider as you come up with long-term solutions for log storage for your organization:
- Building a storage solution that is both secure and resilient: Your logs should be stored in a secure manner that includes, at a minimum, default encryption on that log storage. Furthermore, the space you create for the log storage should be able to store all of your logs in real time without delays in processing and storing.
- Central storage for the log files: You need a location to direct any internal or external auditors should they need access to the log files generated for your account. This is also true for any configuration changes that have occurred in the different accounts you manage, as the logs need to be stored in a centralized location both for auditors and in case of access for incident response.
- Establishing log file integrity when storing log files: You need to refer to your raw log files as the source of truth for what actions have occurred in the accounts for which you are accountable. Therefore, it is crucial to make sure that those log files have not been tampered with in any way such that their integrity is intact. Using tools such as IAM access controls to prevent the modification of the log files combined with generated checksum values for the logs can help establish log file integrity.
- Understanding how long logs need to be retained according to organizational policy: Working with the leaders of your organization to establish how long the logs should be retained based on both the company’s needs as well as any regulatory guidelines will drive the retention process of log files. Once these timelines have been set, you can create automated (or manual) workflows to remove older log files that are no longer needed to save both storage space and costs. When log files must be retained strictly for compliance purposes, they can also be moved to lower-cost, infrequent-access storage for cost savings.
- Defining a process for adding new logs to the log storage: As new logs are generated in your system, either through new services being added, new accounts being added, or new applications being added to existing services, you and your organization should set some standards to ensure that all defined logs are captured and stored in your centralized log storage. Optimally, this would be an automated process so that no logs are lost accidentally.
- Granting access to the log storage and files: As you define your log storage and think about the need for ensuring the integrity, there will be some users (i.e., development team members) who will need access to some or all of the log files as they perform their day-to-day duties. The access should be provided on a role-by-role or user basis using the principle of least privilege, granting only read access to the logs necessary to perform their job capabilities. Going a step further, you could provide time-based access, which only allows read-only access when requested for a short period of time, such as 24 hours, limiting access only to those who request access for the files they need temporarily.
- Monitoring the log storage: If you are the responsible party for the log storage, then you need to ensure both the health of the log storage and the success of the files being delivered to the log storage system. Having a plan in place for automated alerts for errors such as low space warnings, failures on delivery, or deletions of log files can help you rectify any issues before it’s too late, such as when you are trying to retrieve those log files when you need them most.
After looking at the foundational aspects of what goes into building the log storage solution for your organization, the next step is to examine the details of storing logs on the Amazon S3 service.