Storing and transforming real-time data using Kinesis Data Firehose
There are a lot of use cases that require data to be streamed and stored for future analytics purposes. To overcome such problems, you can write a Kinesis consumer to read the Kinesis stream and store the data in S3. This solution needs an instance or a machine to run the code with the required access to read from the stream and write to S3. The other possible option would be to run a Lambda function that gets triggered on the putRecord or putRecords API made to the stream and reads the data from the stream to store in the S3 bucket:
In the next section, you will learn about different AWS services used for ingesting data from on-premises servers to AWS.
Different ways of ingesting data from on-premises into AWS
With the increasing demand for data-driven use cases, managing data on on-premises servers is pretty tough at the moment. Taking backups is not easy when you deal with a huge amount of data. This data in data lakes is used to build deep neural networks, create a data warehouse to extract meaningful information from it, run analytics, and generate reports.
Now, if you look at the available options to migrate data into AWS, this comes with various challenges too. For example, if you want to send data to S3, then you have to write a few lines of code to send your data to AWS. You will have to manage the code and servers to run the code. It has to be ensured that the data is commuting via the HTTPS network. You need to verify whether the data transfer was successful. This adds complexity as well as time and effort challenges to the process. To avoid such scenarios, AWS provides services to match or solve your use cases by designing a hybrid infrastructure that allows data sharing between the on-premises data centers and AWS. You will learn about these in the following sections.