Classifying and Extracting Mortgage Loan Data with Amazon Textract
Mortgage loan applications, at least in the United States, comprise around 500 or more pages of diverse documents. In order for applications to be reviewed, all these documents need to be classified, and the data on each form extracted. This isn’t as easy as it might sound! Besides different data structures in each document, the…
Mortgage loan applications, at least in the United States, comprise around 500 or more pages of diverse documents. In order for applications to be reviewed, all these documents need to be classified, and the data on each form extracted. This isn’t as easy as it might sound! Besides different data structures in each document, the same data element may have different names on different documents—for example, SSN, or Social Security Number, or Tax ID. These three all refer to the same data.
Today, a new Analyze Lending API, for analyzing and classifying the documents contained in mortgage loan application packages, and extracting the data they contain, is available for Amazon Textract. The new API was created in response to requests from major lenders in the industry to help them process applications faster and reduce errors, which improves the end-customer experience and lower operating costs.
Until now, classification and extraction of data from mortgage loan application packages have been human-intensive tasks, although some lenders have used a hybrid approach, using technology such as Amazon Textract. However, customers told us that they needed even greater workflow automation to speed up automation efforts and reduce human error so that their staff could focus on higher-value tasks.
The new API also provides additional value-add services. It’s able to perform signature detection in terms of which documents have signatures and which don’t. It also provides a summary output of the documents in a mortgage application package and identifies select important documents such as bank statements and 1003 forms that would normally be present. The new workflow is powered by a collection of machine learning (ML) models. When a mortgage application package is uploaded, the workflow classifies the documents in the package before routing them to the right ML model, based on their classification, for data extraction.
Test-Driving the New Analyze Lending API Although the new API is intended for lenders to incorporate into their business process workflows and applications, anyone can actually try it using the Amazon Textract console. This enables you to see how the API classifies documents and extracts the data elements they contain. If you’re interested in the application of machine learning and artificial intelligence, this may be of interest to you even if you’re not processing a mortgage application package.
I start by opening the Amazon Textract console, expanding Analyze Lending in the navigation panel, and then selecting Demo. The demo console immediately analyzes a set of synthetic test files, and outputs the results shown below (you can always restart the demo by clicking the Reset demo button). I get a summary of the analysis results and a document carousel for each of the documents in the package. The demo console also has a handy help panel containing (among other things) a summary of terminology related to the documents.
In the carousel I can see that one document has a signature badge, indicating a signature was detected, but, before taking a look, if I scroll the carousel, I can see that one document was labeled Unclassified:
Returning in the carousel to the document marked with a signature badge, I can see that it’s a check. Signature detection is usually a highly manual process so having the document analysis automatically mark when one is detected is a significant time saver.
Payslips are another document type that customers have told us can be difficult and time-consuming to handle. Selecting the detected payslip in the carousel shows the data extracted from it.
The synthetic data in the demo console provides an overview of how the API is able to analyze, classify, and extract data from the documents in a mortgage application package. However, I can also use my own documents. To do this in the demo console, I click the Upload package button and provide a single file, up to 5 MB, and 10 pages maximum for testing in the demo console, containing documents to analyze. Outside use in the demo console, the API supports documents with up to 3000 pages.
The results, for both the synthetic and your own data, can be downloaded by clicking the Download results button. This provides a .zip file containing four files—two are the raw JSON responses from the API. The other two are CSV-format files containing the summary (summary.csv) and the extracted data (extractions.csv). Both files are in key-value format.
The contents of the summary data file, for the synthetic test data, are below.
Below is an example of the data contained in the extractions file.
‘key,’value “‘PAY PERIOD END DATE”,”‘7/18/2008″ “‘PAY DATE”,”‘7/25/2008″ “‘BORROWER NAME”,”‘JOHN STILES” “‘BORROWER ADDRESS”,”‘101 MAIN STREET ANYTOWN, USA 12345″ “‘COMPANY NAME”,”‘ANY COMPANY CORP.” “‘COMPANY ADDRESS”,”‘475 ANY AVENUE ANYTOWN, USA 10101″ “‘FEDERAL FILING STATUS”,”‘Married” “‘STATE FILING STATUS”,”‘2″ “‘CURRENT GROSS PAY”,”‘$ 452.43″ “‘YTD GROSS PAY”,”‘23,526.80″ “‘CURRENT NET PAY”,”‘$ 291.90″ “‘REGULAR HOURLY RATE”,”‘10.00″ “‘HOLIDAY HOURLY RATE”,”‘10.00″ “‘WARNINGS MESSAGES NOTES”,”‘EFFECTIVE THIS PAY PERIOD YOUR REGULAR HOURLY RATE HAS BEEN CHANGED FROM $8.00 TO $10.00 PER HOUR.” “‘CURRENT REGULAR PAY”,”‘320″ …
Try the Analyze Lending API Yourself The new API is available in all Regions where Amazon Textract is offered but do be aware that the workflow and processing are focused on US-centric documents. Pricing for the new API is the same as for the existing table, form, and queries. You can find more details on the service pricing page. Finally, you can read more on the API in the Developer Guide.
Search enterprise data assets using LLMs backed by knowledge graphs
In this post, we present a generative AI-powered semantic search solution that empowers business users to quickly and accurately find relevant data assets across various enterprise data sources. In this solution, we integrate large language models (LLMs) hosted on Amazon Bedrock backed by a knowledge base that is derived from a knowledge graph built on…
In this post, we present a generative AI-powered semantic search solution that empowers business users to quickly and accurately find relevant data assets across various enterprise data sources. In this solution, we integrate large language models (LLMs) hosted on Amazon Bedrock backed by a knowledge base that is derived from a knowledge graph built on Amazon Neptune to create a powerful search paradigm that enables natural language-based questions to integrate search across documents stored in Amazon Simple Storage Service (Amazon S3), data lake tables hosted on the AWS Glue Data Catalog, and enterprise assets in Amazon DataZone.
Getting started with Amazon Bedrock Agents custom orchestrator
In this post, we explore how Amazon Bedrock Agents simplify the orchestration of generative AI workflows, particularly with the introduction of the custom orchestrator feature. You can use the custom orchestrator to fine-tune and optimize agentic workflows that align more closely with specific business and operational needs. We outline the feature’s key benefits, including full…
In this post, we explore how Amazon Bedrock Agents simplify the orchestration of generative AI workflows, particularly with the introduction of the custom orchestrator feature. You can use the custom orchestrator to fine-tune and optimize agentic workflows that align more closely with specific business and operational needs. We outline the feature’s key benefits, including full control over orchestration, real-time adjustments, and reusability, followed by a breakdown of how it manages state transitions and contract-based interactions between Amazon Bedrock Agents and AWS Lambda.
Amazon FSx for Lustre increases throughput to GPU instances by up to 15x
Amazon FSx for Lustre now features Elastic Fabric Adapter and NVIDIA GPUDirect Storage for up to 15x higher throughput to GPUs, unlocking new possibilities in deep learning, autonomous vehicles, and HPC workloads. Source
Amazon FSx for Lustre now features Elastic Fabric Adapter and NVIDIA GPUDirect Storage for up to 15x higher throughput to GPUs, unlocking new possibilities in deep learning, autonomous vehicles, and HPC workloads.