Getting hands-on with Amazon Textract – AWS Application Services for AI/ML – MLS-C01 Study Guide
Getting hands-on with Amazon Textract
In this section, you will use the Amazon Textract API to read an image file from our S3 bucket and print the FORM details on Cloudwatch. The same can be stored in S3 in your desired format for further use or can be stored in DynamoDB as a key-value pair. Let’s get started:
First, create an IAM role called textract-use-case-role with the following policies. This will allow the Lambda function to execute so that it can read from S3, use Amazon Textract, and print the output in CloudWatch logs:
CloudWatchFullAccess
AmazonTextractFullAccess
AmazonS3ReadOnlyAccess
Let’s create an S3 bucket called textract-document-analysis and upload the receipt.png image file. This will be used to contain the FORM details that will be extracted. The image file is available at https://github.com/PacktPublishing/AWS-Certified-Machine-Learning-Specialty-MLS-C01-Certification-Guide-Second-Edition/tree/main/Chapter08/Amazon%20Textract%20Demo/input_doc:
Figure 8.20 – An S3 bucket with an image (.png) file uploaded to the input folder
The next step is to create a Lambda function called read-scanned-doc, as shown in Figure 8.21, with an existing execution role called textract-use-case-role:
Figure 8.21 – The AWS Lambda Create function dialog
Once the function has been created, paste the following code and deploy it. Scroll down to Basic Settings to change the default timeout to a higher value (40 seconds) to prevent timeout errors. You have used the analyze_document API from Amazon Textract to get the Table and Form details via the FeatureTypes parameter of the API: