Connect with us


Smart city traffic anomaly detection using Amazon Lookout for Metrics and Amazon Kinesis Data Analytics Studio

Cities across the world are transforming their public services infrastructure with the mission of enhancing the quality of life of its residents. Roads and traffic management systems are part of the central nervous system of every city. They need intelligent monitoring and automation in order to prevent substantial productivity loss and in extreme cases life-threatening…



Cities across the world are transforming their public services infrastructure with the mission of enhancing the quality of life of its residents. Roads and traffic management systems are part of the central nervous system of every city. They need intelligent monitoring and automation in order to prevent substantial productivity loss and in extreme cases life-threatening situations such as obstruction to free movement of emergency services.

Today, in most city traffic operations centers, monitoring video feeds from roadside cameras is a fairly manual activity. It requires operations center engineers to mentally correlate and apply institutional knowledge to determine if an ongoing situation is an anomaly, which makes this activity error-prone and susceptible to delays.

Where AI/ML based solutions have been applied to analyze video feeds, there is a lot of complexity involved in ingesting, curating, and preparing data in the right format and then optimizing and maintaining the effectiveness of these machine learning (ML) models over long periods of time. This has been one of the barriers to quickly implementing and scaling the adoption of ML capabilities and in turn realizing the automation outcomes at city traffic operation centers.

This post shows you how to use an integrated solution with Amazon Lookout for Metrics and Amazon Kinesis Data Analytics Studio (among other AWS services) to break these barriers by quickly and easily ingesting streaming data, aggregating and curating it, and subsequently detecting anomalies in the key performance indicators of your interest.

Lookout for Metrics automatically detects and diagnoses anomalies (outliers from the norm) in business and operational data. It’s a fully managed ML service, which uses specialized ML models to detect anomalies based on the characteristics of your data. You don’t need ML experience to use Lookout for Metrics.

Kinesis Data Analytics Studio provides an interactive notebook experience powered by Apache Zeppelin and Apache Flink to analyze streaming data. It also helps productionize your analytics application by building and deploying code as a Kinesis data analytics application straight from the notebook.

We demonstrate one of the most common traffic management scenarios, in which we detect anomalies in the number of vehicles and persons passing the view of ML-capable video infrastructure deployed at the roadside in order to enable capabilities such as automated traffic light pattern optimization, dynamic utilization of reserved lanes, and rapid deployment of emergency services. By the end of this post, you’ll learn how to use these managed services from AWS to meet the outcomes of smart cities to increase safety and reduce traffic. This solution can be equally applied to accelerate other outcomes for smart city management such as detecting anomalies in water supply (water pipe leaks), crowd density (safe campuses in the context of the pandemic), non-green energy usage, and community Wi-Fi traffic patterns, to name a few.

Solution architecture

The architecture consists of three functional blocks:

  • Smart roadside cameras
  • Streaming data ingestion, transformation, and storage
  • Anomaly detection and notification

The solution provides a fully automated data path from the smart cameras all the way to a notification being raised to the user. You can also interact with the solution using the Lookout for Metrics UI in order to analyze the identified anomalies.

The following diagram illustrates our solution architecture.

Smart roadside cameras using AWS Panorama

Data can be ingested from traffic cameras in several ways. The most optimal way is to analyze the video feed using computer vision ML algorithms instead of transporting thousands of video streams to the cloud and then running ML algorithms there. AWS released a computer vision appliance called AWS Panorama at AWS re:Invent 2020; it’s an ML appliance and software development kit (SDK) that allows you to bring computer vision to on-premises cameras to make predictions locally with high accuracy and low latency. AWS Panorama is well suited to address the use case of traffic monitoring. One AWS Panorama Appliance can ingest and analyze video streams from multiple video cameras. You can deploy a multi-object tracking ML model on the AWS Panorama Appliance to identify and track the vehicles passing by the camera. You can install AWS Panorama Appliances in sheltered cabinets in large road junctions or in cabinets by the roadside to cover a section of the road.

You can also send the inference results to AWS IoT Core to further process the data and make business decisions based on your requirements.

Because this post focuses on anomaly detection of traffic patterns, we assume that the AWS Panorama Appliance is already sending inference results to AWS IoT Core. In the subsequent sections, we focus on how the streaming data is processed from AWS IoT Core and analyzed to detect anomalies.

For more information, see AWS IoT and Building and deploying an object detection computer vision application at the edge with AWS Panorama.

Streaming data ingestion and transformation using Amazon Kinesis

Data from IoT devices is usually in the form of a continuous time series data stream. This is data that usually must be processed sequentially and incrementally on a record-by-record basis or over sliding time windows, and can be used for a variety of analytics, including correlations, aggregations, filtering, and sampling.

AWS IoT integrates with Amazon Kinesis Data Streams, which is a fully managed streaming data integration service. By default, the streaming data is available for 24 hours. This streaming data is then queried and transformed using a Kinesis data analytics application, which is built and deployed using Kinesis Data Analytics Studio. For more information, see Introducing Amazon Kinesis Data Analytics Studio – Quickly Interact with Streaming Data Using SQL, Python, or Scala.

We show you in the subsequent sections how to use Kinesis Data Analytics Studio, Kinesis Data Streams, and Amazon Kinesis Data Firehose to ingest traffic data from the IoT-powered smart cameras, transform it in real time using Flink SQL, and then deliver the data to an Amazon Simple Storage Service (Amazon S3) bucket in a custom prefix pattern that is compatible with Lookout for Metrics. It’s easy to build this data pipeline with minimal code. Also, because all the AWS services involved in the data pipeline are managed services, you can focus on enhancing functionality rather than running and maintaining infrastructure.

Anomaly detection using Lookout for Metrics

Lookout for Metrics automatically inspects and prepares the data dropped into Amazon S3 to detect anomalies with greater speed and accuracy than traditional methods used for anomaly detection. As anomalies are detected, you can provide feedback on detected anomalies to tune the results and improve accuracy over time. Lookout for Metrics makes it easy to diagnose detected anomalies by grouping anomalies that are related to the same event and sending an alert that includes a summary of the potential root cause. It also ranks anomalies in order of severity so you can prioritize your attention to what matters most to your business.

In the subsequent sections, we dive deep into configuring Amazon Kinesis services and Lookout for Metrics.


To follow along and test this solution yourself, you need the following prerequisites:

Data ingestion and transformation using AWS IoT and Kinesis

We use Kinesis Data Streams to ingest the streaming data from AWS IoT Core. We create two streams: one for the source data and another for the transformed data.

The following AWS CLI command creates the input stream:

$ aws kinesis create-stream –stream-name traffic-data-stream –shard-count 1 –region eu-central-1

The following AWS CLI command creates the output stream:

$ aws kinesis create-stream –stream-name processed-traffic-stream –shard-count 1 –region eu-central-1

Let’s look at the data coming in from the AWS Panorama Appliance. You can view the inferred data from the multi-object tracking model by running a test on the AWS IoT console by providing the IoT topic where the inferences of the model are published. The data the you receive is in JSON format and shows the object detected and a unique ID for the appearance of a particular person or car in the region of interest of the camera. A snippet of the data is as follows:

{ “person”: } { “car”: }

Now we create an AWS IoT Core rule to send the incoming data from the AWS Panorama Appliance to the Kinesis data stream. We limit the data to persons and cars, and include a timestamp using the following query in the AWS IoT rule definition. Inclusion of the timestamp is mandatory for anomaly detection.

SELECT person,car,timestamp() as event_time FROM

We also add an action as part of the rule definition to send the data selected to the input data stream we created previously.

At this point, we have data from the AWS Panorama Appliance streaming into our traffic-data-stream stream.

Now let’s create a Kinesis Data Analytics studio to analyze the data streaming in. We create the notebook of type Apache Flink, which allows us to analyze the data using SQL. You need to either create a new AWS Glue database or use an existing one to store the table definitions for the incoming and outgoing data streams. For detailed steps for creating an Apache Zeppelin notebook, see Introducing Amazon Kinesis Data Analytics Studio – Quickly Interact with Streaming Data Using SQL, Python, or Scala or Using a Studio notebook with Kinesis Data Analytics for Apache Flink.

After we create the notebook, we create a new note and call it traffic-anomaly-data-transformer. This should provide you an interactive environment to write your code (see the following screenshot).

Enter the following SQL statement to create a table for the traffic data source:

%flink.ssql create TABLE traffic_data_source ( person FLOAT, car FLOAT, event_time AS PROCTIME() ) PARTITIONED BY (person) WITH ( ‘connector’ = ‘kinesis’, ‘stream’ = ‘traffic-data-stream’, ‘aws.region’ = ‘eu-central-1’, ‘’ = ‘LATEST’, ‘format’ = ‘json’ );

The first part of the SQL statement uses %flink.ssql to tell Apache Zeppelin to provide a stream SQL environment for the Apache Flink interpreter.

The second part describes the connector used to receive data in the table (for example, Kinesis or Kafka), the name of the stream, the AWS Region, and the overall data format of the stream (such as JSON or CSV). We can also choose the starting position to process the stream; we use LATEST to read the most recent data first.

Now let’s create a table for the traffic data destination as follows:

%flink.ssql CREATE TABLE traffic_data_dest ( person BIGINT, car BIGINT, hop_time TIMESTAMP(3) ) WITH ( ‘connector’ = ‘kinesis’, ‘stream’ = ‘processed-traffic-stream’, ‘aws.region’ = ‘eu-central-1’, ‘’ = ‘LATEST’, ‘format’ = ‘json’ );

Next, we run the following query, which counts the number of persons and cars seen in a window of 5 minutes and inserts that data into the traffic_data_dest data stream:

%flink.ssql(type=update) INSERT INTO traffic_data_dest SELECT COUNT(person) AS person,COUNT(car) AS car, TUMBLE_END(event_time, INTERVAL ‘5’ minute) as tumble_time FROM traffic_data_source GROUP BY TUMBLE(event_time, INTERVAL ‘5’ minute);

In the preceding code, we use the TUMBLE_END function to record the timestamp on the record sent to the data stream as the end of the 5-minute window rather than the start of the window. This is important later in the post, when we assign custom prefix names in Amazon S3 based on time intervals—using TUMBLE_END ensures that the timestamp on the record and the prefix name are for the same 5-minute interval.

After this is successful, deploy this query as a Kinesis data analytics application. To do this, we add a new note called tda-nb1 (short for “traffic data application, notebook 1”) and copy only the preceding SQL statement that queries data from the source stream and inserts it into the destination stream. We don’t need to copy the create table SQL statements because the tables have already been created in AWS Glue.

The Apache Zeppelin notebook provides a fully automated deployment capability at the push of a button. You perform two steps: build and export the code to an S3 bucket of your choice, and deploy the code as a Kinesis data analytics application.

One important step is to update the AWS Identity and Access Management (IAM) roles of the Kinesis data analytics application in order to access the source data stream, destination data stream, and the AWS Glue Data Catalog.

We now run the application. We should see the application graph (as in the following screenshot) showing the data flow from source to destination, and you should be able to open the Apache Flink dashboard to get more information about the application run.

We now create a Kinesis Data Firehose delivery stream that reads from the processed-traffic-data stream (output stream) and delivers the data to Amazon S3. One of the key points to note is the configuration of the custom prefix that is configured for the Amazon S3 destination. This prefix pattern ensures that the data is created in the S3 bucket as per the prefix hierarchy expected by Lookout for Metrics. (More on this later in this post.) For more information about custom prefixes for S3 objects, see Custom Prefixes for Amazon S3 Objects.

As shown in the following screenshot, the data is delivered to the specified S3 bucket in the prefix structure.

The data within one of the files is as follows:

{“person”:12,”car”:56,”hop_time”:”2021-06-05T16:55:00″} {“person”:15,”car”:121,”hop_time”:”2021-06-05T17:00:00″}

The timestamps show that each file contains data for two 5-minute intervals.

With minimal code, we have now ingested the data from the camera, created a durable input stream from the ingested data, transformed the data based on the metrics we want to measure (the number of cars and persons), and stored the data in an S3 bucket based on the requirements for Lookout for Metrics.

In the following section, we take a deeper look at the constructs within Lookout for Metrics.

Lookout for Metrics deep dive

Let’s look at the terms and concepts within Lookout for Metrics, how they apply to this use case, and how easy it is to configure these concepts using the Lookout for Metrics console.


A detector is a Lookout for Metrics resource that monitors a dataset and identifies anomalies at a predefined frequency. Detectors use ML to find patterns in data and distinguish between expected variations in data and legitimate anomalies. To improve its performance, a detector learns more about your data over time.

In our use case, the detector analyzes aggregated data from the camera every 5 minutes. To create the detector, navigate to the Lookout for Metrics console and choose Create Detector. Provide the name and description (optional) for the detector, along with the interval of 5 minutes.

Your data is encrypted by default with a key that AWS owns and manages for you. You can also configure if you want to use a different encryption key from the one that is used by default.

Now, let’s point this detector to the data that you want it to run anomaly detection on.


A dataset tells the detector where to find your data and which metrics to analyze for anomalies.

We create the dataset on the Amazon Lookout for Metrics console, and provide a name (for this post, traffic-data-set), description (optional), and time zone.

In our use case, we choose Amazon S3 as our data source. With Amazon S3, you can create a detector in two modes:

  • Backtest – This mode is used to find anomalies in historical data. It needs all records to be consolidated in a single file.
  • Continuous – This mode is used to detect anomalies in live data. We use this mode with our use case because we want to detect anomalies as we receive traffic data from the roadside camera. The rest of this post talks about configuring continuous mode.

The path where the live data lands every 5 minutes is configured. As explained in the data ingestion and transformation section, the data is stored in the following folder structure.

The details of the S3 prefix path and the folder structure is configured as shown in the following screenshot.

If there is a delay in the data ingestion into Amazon S3, you can define an offset in seconds that defines the time the detector waits before it runs the anomaly analysis for a particular interval.

Also, if you have historical data from which the detector can learn patterns, you can provide it during this configuration. The data here is expected to be in the same format that you use to perform a backtest. Providing historical data speeds up the ML model training process. If this isn’t available, the continuous detector waits for sufficient data to be available before making inferences.

You can specify additional configurations to help Lookout for Metrics understand the format of the data that you are analyzing. Also, you must specify the IAM role that is used by Lookout for Metrics to access the data source.

At this point, Lookout for Metrics accesses the data source and validates whether it can parse the data. If the parsing is successful, it gives you a “Validation successful” message and takes you to the next screen, where you configure measures, dimensions, and timestamps.

Measures, dimensions and timestamps

Measures define KPIs that you want to track anomalies for. You can add up to five measures per detector. The fields that are used to create KPIs from your source data must be of numeric format. The KPIs can be currently defined by aggregating records within the time interval by doing a SUM or AVERAGE.

In our use case, we add one measure, which does the SUM of the objects seen in the 5-minute interval.

Dimensions give you the ability to slice and dice your data by defining categories or segments. This allows you to track anomalies for a subset of the whole set of data for which a particular measure is applicable.

In our use case, we have only one camera feeding data to the solution. Imagine if we had all the cameras from the city connected. Then we could use the camera ID or other metadata such as locality name as dimensions.

Every record in the dataset must have a timestamp. The following configuration allows you to choose the field that represents the timestamp value and also the format of the timestamp.

The next screen allows you to review all the details you have added and then save and activate the detector.

The detector then begins learning the data streaming into the data source. At this stage, the status of the detector changes to Initializing.

It’s important to note the minimum amount of data that is required before which Lookout for Metrics can start detecting anomalies. For more information about requirements and limits, see Lookout for Metrics quotas.

Fantastic! With minimal configuration, you have created your detector, pointed it at a dataset, and defined the metrics that you want Lookout for Metrics to find anomalies in.

Anomaly visualization

Lookout for Metrics provides a rich UI experience for users who want to use the AWS Management Console to analyze the anomalies being detected. It also provides the capability to query the anomalies via API.

Let’s look at an example anomaly detected from our traffic data use case.

The following screenshot shows an anomaly detected in the count of cars at the said time and date with a severity score of 100. It also shows the percentage contribution of the dimension towards the anomaly. In this case, 100% contribution comes from the car dimension.

Further down the page, you also get a line graph of the metric, along with the ability to view the metric data across time periods of the previous 30 minutes, 1 hour, 2 hours, 6 hours, or the maximum number of intervals. The anomaly is highlighted in blue on the graph.

As the service detects anomalies, it also allows you to provide human feedback to specify whether the detected anomaly is relevant. In our scenario, because the number of cars suddenly increased to 87 from 1, it was highlighted as an anomaly as compared to previous data, and we confirmed that by choosing Yes.


Lookout for Metrics allows you to send alerts using a variety of channels. You can configure the anomaly severity score threshold at which the alerts must be triggered.

In our use case, we configure alerts to be sent to an Amazon Simple Notification Service (Amazon SNS) channel, which in turn sends an SMS.

You can also use an alert to trigger automations using AWS Lambda functions in order to drive API-driven operations on AWS IoT Core. You can use IoT device shadows to control display devices in order to control traffic signals or traffic lane sign boards. If a rapid action team of personnel needs to be deployed, we could imagine the alert triggering a workforce management system that selects and deploys available personnel to the site to deal with the emergency.


Cities have the opportunity to derive insights from traffic management systems and make better decisions to improve public safety, make daily commutes less frustrating, and improve the overall quality of life of its residents.

In this post, we showed you how to use Lookout for Metrics and Kinesis to remove the undifferentiated heavy lifting involved in managing the end-to-end lifecycle of building ML-powered anomaly detection applications.

We believe that this solution will truly help you accelerate your ability to find anomalies in key business metrics and allow you focus your efforts on growing and improving your business.

We encourage you to learn more by visiting the Amazon Lookout for Metrics Developer Guide and Using a Studio notebook with Kinesis Data Analytics for Apache Flink, and also try out the end-to-end solution enabled by these services with the dataset relevant to your business KPIs.

We’re eager and excited to see what you’ll build using Amazon Lookout for Metrics and Amazon Kinesis Data Analytics Studio in the context of smart cities and beyond.

About the Authors

Ajay Ravindranathan is a Sr Partner Solutions Architect with Amazon Web Services, and is passionate about helping customers build modern applications using AWS services. Ajay has a keen interest in building AI/ML powered cloud-native products that can help telecommunications service providers on their digital transformation journey.



Bala KP is a Sr Partner Solutions Architect with Amazon Web Services. He works with partners and customers in the financial services and insurance domain to provide them with architecture guidance on building scalable and secure applications in AWS.


Continue Reading
Click to comment

Leave a Reply

Your email address will not be published.


AWS Week in Review – May 16, 2022

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS! I had been on the road for the last five weeks and attended many of the AWS Summits in Europe. It was great to talk to so many of you…




This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

I had been on the road for the last five weeks and attended many of the AWS Summits in Europe. It was great to talk to so many of you in person. The Serverless Developer Advocates are going around many of the AWS Summits with the Serverlesspresso booth. If you attend an event that has the booth, say “Hi ” to my colleagues, and have a coffee while asking all your serverless questions. You can find all the upcoming AWS Summits in the events section at the end of this post.

Last week’s launches
Here are some launches that got my attention during the previous week.

AWS Step Functions announced a new console experience to debug your state machine executions – Now you can opt-in to the new console experience of Step Functions, which makes it easier to analyze, debug, and optimize Standard Workflows. The new page allows you to inspect executions using three different views: graph, table, and event view, and add many new features to enhance the navigation and analysis of the executions. To learn about all the features and how to use them, read Ben’s blog post.

Example on how the Graph View looks

Example on how the Graph View looks

AWS Lambda now supports Node.js 16.x runtime – Now you can start using the Node.js 16 runtime when you create a new function or update your existing functions to use it. You can also use the new container image base that supports this runtime. To learn more about this launch, check Dan’s blog post.

AWS Amplify announces its Android library designed for Kotlin – The Amplify Android library has been rewritten for Kotlin, and now it is available in preview. This new library provides better debugging capacities and visibility into underlying state management. And it is also using the new AWS SDK for Kotlin that was released last year in preview. Read the What’s New post for more information.

Three new APIs for batch data retrieval in AWS IoT SiteWise – With this new launch AWS IoT SiteWise now supports batch data retrieval from multiple asset properties. The new APIs allow you to retrieve current values, historical values, and aggregated values. Read the What’s New post to learn how you can start using the new APIs.

AWS Secrets Manager now publishes secret usage metrics to Amazon CloudWatch – This launch is very useful to see the number of secrets in your account and set alarms for any unexpected increase or decrease in the number of secrets. Read the documentation on Monitoring Secrets Manager with Amazon CloudWatch for more information.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Some other launches and news that you may have missed:

IBM signed a deal with AWS to offer its software portfolio as a service on AWS. This allows customers using AWS to access IBM software for automation, data and artificial intelligence, and security that is built on Red Hat OpenShift Service on AWS.

Podcast Charlas Técnicas de AWS – If you understand Spanish, this podcast is for you. Podcast Charlas Técnicas is one of the official AWS podcasts in Spanish. This week’s episode introduces you to Amazon DynamoDB and shares stories on how different customers use this database service. You can listen to all the episodes directly from your favorite podcast app or the podcast web page.

AWS Open Source News and Updates – Ricardo Sueiras, my colleague from the AWS Developer Relation team, runs this newsletter. It brings you all the latest open-source projects, posts, and more. Read edition #112 here.

Upcoming AWS Events
It’s AWS Summits season and here are some virtual and in-person events that might be close to you:

You can register for re:MARS to get fresh ideas on topics such as machine learning, automation, robotics, and space. The conference will be in person in Las Vegas, June 21–24.

That’s all for this week. Check back next Monday for another Week in Review!

— Marcia


Continue Reading


Personalize your machine translation results by using fuzzy matching with Amazon Translate

A person’s vernacular is part of the characteristics that make them unique. There are often countless different ways to express one specific idea. When a firm communicates with their customers, it’s critical that the message is delivered in a way that best represents the information they’re trying to convey. This becomes even more important when…




A person’s vernacular is part of the characteristics that make them unique. There are often countless different ways to express one specific idea. When a firm communicates with their customers, it’s critical that the message is delivered in a way that best represents the information they’re trying to convey. This becomes even more important when it comes to professional language translation. Customers of translation systems and services expect accurate and highly customized outputs. To achieve this, they often reuse previous translation outputs—called translation memory (TM)—and compare them to new input text. In computer-assisted translation, this technique is known as fuzzy matching. The primary function of fuzzy matching is to assist the translator by speeding up the translation process. When an exact match can’t be found in the TM database for the text being translated, translation management systems (TMSs) often have the option to search for a match that is less than exact. Potential matches are provided to the translator as additional input for final translation. Translators who enhance their workflow with machine translation capabilities such as Amazon Translate often expect fuzzy matching data to be used as part of the automated translation solution.

In this post, you learn how to customize output from Amazon Translate according to translation memory fuzzy match quality scores.

Translation Quality Match

The XML Localization Interchange File Format (XLIFF) standard is often used as a data exchange format between TMSs and Amazon Translate. XLIFF files produced by TMSs include source and target text data along with match quality scores based on the available TM. These scores—usually expressed as a percentage—indicate how close the translation memory is to the text being translated.

Some customers with very strict requirements only want machine translation to be used when match quality scores are below a certain threshold. Beyond this threshold, they expect their own translation memory to take precedence. Translators often need to apply these preferences manually either within their TMS or by altering the text data. This flow is illustrated in the following diagram. The machine translation system processes the translation data—text and fuzzy match scores— which is then reviewed and manually edited by translators, based on their desired quality thresholds. Applying thresholds as part of the machine translation step allows you to remove these manual steps, which improves efficiency and optimizes cost.

Machine Translation Review Flow

Figure 1: Machine Translation Review Flow

The solution presented in this post allows you to enforce rules based on match quality score thresholds to drive whether a given input text should be machine translated by Amazon Translate or not. When not machine translated, the resulting text is left to the discretion of the translators reviewing the final output.

Solution Architecture

The solution architecture illustrated in Figure 2 leverages the following services:

  • Amazon Simple Storage Service – Amazon S3 buckets contain the following content:
    • Fuzzy match threshold configuration files
    • Source text to be translated
    • Amazon Translate input and output data locations
  • AWS Systems Manager – We use Parameter Store parameters to store match quality threshold configuration values
  • AWS Lambda – We use two Lambda functions:
    • One function preprocesses the quality match threshold configuration files and persists the data into Parameter Store
    • One function automatically creates the asynchronous translation jobs
  • Amazon Simple Queue Service – An Amazon SQS queue triggers the translation flow as a result of new files coming into the source bucket

Solution Architecture Diagram

Figure 2: Solution Architecture

You first set up quality thresholds for your translation jobs by editing a configuration file and uploading it into the fuzzy match threshold configuration S3 bucket. The following is a sample configuration in CSV format. We chose CSV for simplicity, although you can use any format. Each line represents a threshold to be applied to either a specific translation job or as a default value to any job.

default, 75 SourceMT-Test, 80

The specifications of the configuration file are as follows:

  • Column 1 should be populated with the name of the XLIFF file—without extension—provided to the Amazon Translate job as input data.
  • Column 2 should be populated with the quality match percentage threshold. For any score below this value, machine translation is used.
  • For all XLIFF files whose name doesn’t match any name listed in the configuration file, the default threshold is used—the line with the keyword default set in Column 1.

Auto-generated parameter in Systems Manager Parameter Store

Figure 3: Auto-generated parameter in Systems Manager Parameter Store

When a new file is uploaded, Amazon S3 triggers the Lambda function in charge of processing the parameters. This function reads and stores the threshold parameters into Parameter Store for future usage. Using Parameter Store avoids performing redundant Amazon S3 GET requests each time a new translation job is initiated. The sample configuration file produces the parameter tags shown in the following screenshot.

The job initialization Lambda function uses these parameters to preprocess the data prior to invoking Amazon Translate. We use an English-to-Spanish translation XLIFF input file, as shown in the following code. It contains the initial text to be translated, broken down into what is referred to as segments, represented in the source tags.

Consent Form CONSENT FORM FORMULARIO DE CONSENTIMIENTO Screening Visit: Screening Visit Selección

The source text has been pre-matched with the translation memory beforehand. The data contains potential translation alternatives—represented as tags—alongside a match quality attribute, expressed as a percentage. The business rule is as follows:

  • Segments received with alternative translations and a match quality below the threshold are untouched or empty. This signals to Amazon Translate that they must be translated.
  • Segments received with alternative translations with a match quality above the threshold are pre-populated with the suggested target text. Amazon Translate skips those segments.

Let’s assume the quality match threshold configured for this job is 80%. The first segment with 99% match quality isn’t machine translated, whereas the second segment is, because its match quality is below the defined threshold. In this configuration, Amazon Translate produces the following output:

Consent Form FORMULARIO DE CONSENTIMIENTO CONSENT FORM FORMULARIO DE CONSENTIMIENTO Screening Visit: Visita de selección Screening Visit Selección

In the second segment, Amazon Translate overwrites the target text initially suggested (Selección) with a higher quality translation: Visita de selección.

One possible extension to this use case could be to reuse the translated output and create our own translation memory. Amazon Translate supports customization of machine translation using translation memory thanks to the parallel data feature. Text segments previously machine translated due to their initial low-quality score could then be reused in new translation projects.

In the following sections, we walk you through the process of deploying and testing this solution. You use AWS CloudFormation scripts and data samples to launch an asynchronous translation job personalized with a configurable quality match threshold.


For this walkthrough, you must have an AWS account. If you don’t have an account yet, you can create and activate one.

Launch AWS CloudFormation stack

  1. Choose Launch Stack:
  2. For Stack name, enter a name.
  3. For ConfigBucketName, enter the S3 bucket containing the threshold configuration files.
  4. For ParameterStoreRoot, enter the root path of the parameters created by the parameters processing Lambda function.
  5. For QueueName, enter the SQS queue that you create to post new file notifications from the source bucket to the job initialization Lambda function. This is the function that reads the configuration file.
  6. For SourceBucketName, enter the S3 bucket containing the XLIFF files to be translated. If you prefer to use a preexisting bucket, you need to change the value of the CreateSourceBucket parameter to No.
  7. For WorkingBucketName, enter the S3 bucket Amazon Translate uses for input and output data.
  8. Choose Next.

    Figure 4: CloudFormation stack details

  9. Optionally on the Stack Options page, add key names and values for the tags you may want to assign to the resources about to be created.
  10. Choose Next.
  11. On the Review page, select I acknowledge that this template might cause AWS CloudFormation to create IAM resources.
  12. Review the other settings, then choose Create stack.

AWS CloudFormation takes several minutes to create the resources on your behalf. You can watch the progress on the Events tab on the AWS CloudFormation console. When the stack has been created, you can see a CREATE_COMPLETE message in the Status column on the Overview tab.

Test the solution

Let’s go through a simple example.

  1. Download the following sample data.
  2. Unzip the content.

There should be two files: an .xlf file in XLIFF format, and a threshold configuration file with .cfg as the extension. The following is an excerpt of the XLIFF file.

English to French sample file extract

Figure 5: English to French sample file extract

  1. On the Amazon S3 console, upload the quality threshold configuration file into the configuration bucket you specified earlier.

The value set for test_En_to_Fr is 75%. You should be able to see the parameters on the Systems Manager console in the Parameter Store section.

  1. Still on the Amazon S3 console, upload the .xlf file into the S3 bucket you configured as source. Make sure the file is under a folder named translate (for example, /translate/test_En_to_Fr.xlf).

This starts the translation flow.

  1. Open the Amazon Translate console.

A new job should appear with a status of In Progress.

Auto-generated parameter in Systems Manager Parameter Store

Figure 6: In progress translation jobs on Amazon Translate console

  1. Once the job is complete, click into the job’s link and consult the output. All segments should have been translated.

All segments should have been translated. In the translated XLIFF file, look for segments with additional attributes named lscustom:match-quality, as shown in the following screenshot. These custom attributes identify segments where suggested translation was retained based on score.

Custom attributes identifying segments where suggested translation was retained based on score

Figure 7: Custom attributes identifying segments where suggested translation was retained based on score

These were derived from the translation memory according to the quality threshold. All other segments were machine translated.

You have now deployed and tested an automated asynchronous translation job assistant that enforces configurable translation memory match quality thresholds. Great job!


If you deployed the solution into your account, don’t forget to delete the CloudFormation stack to avoid any unexpected cost. You need to empty the S3 buckets manually beforehand.


In this post, you learned how to customize your Amazon Translate translation jobs based on standard XLIFF fuzzy matching quality metrics. With this solution, you can greatly reduce the manual labor involved in reviewing machine translated text while also optimizing your usage of Amazon Translate. You can also extend the solution with data ingestion automation and workflow orchestration capabilities, as described in Speed Up Translation Jobs with a Fully Automated Translation System Assistant.

About the Authors

Narcisse Zekpa is a Solutions Architect based in Boston. He helps customers in the Northeast U.S. accelerate their adoption of the AWS Cloud, by providing architectural guidelines, design innovative, and scalable solutions. When Narcisse is not building, he enjoys spending time with his family, traveling, cooking, and playing basketball.

Dimitri Restaino is a Solutions Architect at AWS, based out of Brooklyn, New York. He works primarily with Healthcare and Financial Services companies in the North East, helping to design innovative and creative solutions to best serve their customers. Coming from a software development background, he is excited by the new possibilities that serverless technology can bring to the world. Outside of work, he loves to hike and explore the NYC food scene.


Continue Reading


Enhance the caller experience with hints in Amazon Lex

We understand speech input better if we have some background on the topic of conversation. Consider a customer service agent at an auto parts wholesaler helping with orders. If the agent knows that the customer is looking for tires, they’re more likely to recognize responses (for example, “Michelin”) on the phone. Agents often pick up…




We understand speech input better if we have some background on the topic of conversation. Consider a customer service agent at an auto parts wholesaler helping with orders. If the agent knows that the customer is looking for tires, they’re more likely to recognize responses (for example, “Michelin”) on the phone. Agents often pick up such clues or hints based on their domain knowledge and access to business intelligence dashboards. Amazon Lex now supports a hints capability to enhance the recognition of relevant phrases in a conversation. You can programmatically provide phrases as hints during a live interaction to influence the transcription of spoken input. Better recognition drives efficient conversations, reduces agent handling time, and ultimately increases customer satisfaction.

In this post, we review the runtime hints capability and use it to implement verification of callers based on their mother’s maiden name.

Overview of the runtime hints capability

You can provide a list of phrases or words to help your bot with the transcription of speech input. You can use these hints with built-in slot types such as first and last names, street names, city, state, and country. You can also configure these for your custom slot types.

You can use the capability to transcribe names that may be difficult to pronounce or understand. For example, in the following sample conversation, we use it to transcribe the name “Loreck.”

Conversation 1

IVR: Welcome to ACME bank. How can I help you today?

Caller: I want to check my account balance.

IVR: Sure. Which account should I pull up?

Caller: Checking

IVR: What is the account number?

Caller: 1111 2222 3333 4444

IVR: For verification purposes, what is your mother’s maiden name?

Caller: Loreck

IVR: Thank you. The balance on your checking account is 123 dollars.

Words provided as hints are preferred over other similar words. For example, in the second sample conversation, the runtime hint (“Smythe”) is selected over a more common transcription (“Smith”).

Conversation 2

IVR: Welcome to ACME bank. How can I help you today?

Caller: I want to check my account balance.

IVR: Sure. Which account should I pull up?

Caller: Checking

IVR: What is the account number?

Caller: 5555 6666 7777 8888

IVR: For verification purposes, what is your mother’s maiden name?

Caller: Smythe

IVR: Thank you. The balance on your checking account is 456 dollars.

If the name doesn’t match the runtime hint, you can fail the verification and route the call to an agent.

Conversation 3

IVR: Welcome to ACME bank. How can I help you today?

Caller: I want to check my account balance.

IVR: Sure. Which account should I pull up?

Caller: Savings

IVR: What is the account number?

Caller: 5555 6666 7777 8888

IVR: For verification purposes, what is your mother’s maiden name?

Caller: Jane

IVR: There is an issue with your account. For support, you will be forwarded to an agent.

Solution overview

Let’s review the overall architecture for the solution (see the following diagram):

  • We use an Amazon Lex bot integrated with an Amazon Connect contact flow to deliver the conversational experience.
  • We use a dialog codehook in the Amazon Lex bot to invoke an AWS Lambda function that provides the runtime hint at the previous turn of the conversation.
  • For the purposes of this post, the mother’s maiden name data used for authentication is stored in an Amazon DynamoDB table.
  • After the caller is authenticated, the control is passed to the bot to perform transactions (for example, check balance)

In addition to the Lambda function, you can also send runtime hints to Amazon Lex V2 using the PutSession, RecognizeText, RecognizeUtterance, or StartConversation operations. The runtime hints can be set at any point in the conversation and are persisted at every turn until cleared.

Deploy the sample Amazon Lex bot

To create the sample bot and configure the runtime phrase hints, perform the following steps. This creates an Amazon Lex bot called BankingBot, and one slot type (accountNumber).

  1. Download the Amazon Lex bot.
  2. On the Amazon Lex console, choose Actions, Import.
  3. Choose the file that you downloaded, and choose Import.
  4. Choose the bot BankingBot on the Amazon Lex console.
  5. Choose the language English (GB).
  6. Choose Build.
  7. Download the supporting Lambda code.
  8. On the Lambda console, create a new function and select Author from scratch.
  9. For Function name, enter BankingBotEnglish.
  10. For Runtime, choose Python 3.8.
  11. Choose Create function.
  12. In the Code source section, open and delete the existing code.
  13. Download the function code and open it in a text editor.
  14. Copy the code and enter it into the empty function code field.
  15. Choose deploy.
  16. On the Amazon Lex console, select the bot BankingBot.
  17. Choose Deployment and then Aliases, then choose the alias TestBotAlias.
  18. On the Aliases page, choose Languages and choose English (GB).
  19. For Source, select the bot BankingBotEnglish.
  20. For Lambda version or alias, enter $LATEST.
  21. On the DynamoDB console, choose Create table.
  22. Provide the name as customerDatabase.
  23. Provide the partition key as accountNumber.
  24. Add an item with accountNumber: “1111222233334444” and mothersMaidenName “Loreck”.
  25. Add item with accountNumber: “5555666677778888” and mothersMaidenName “Smythe”.
  26. Make sure the Lambda function has permissions to read from the DynamoDB table customerDatabase.
  27. On the Amazon Connect console, choose Contact flows.
  28. In the Amazon Lex section, select your Amazon Lex bot and make it available for use in the Amazon Connect contact flow.
  29. Download the contact flow to integrate with the Amazon Lex bot.
  30. Choose the contact flow to load it into the application.
  31. Make sure the right bot is configured in the “Get Customer Input” block.
  32. Choose a queue in the “Set working queue” block.
  33. Add a phone number to the contact flow.
  34. Test the IVR flow by calling in to the phone number.

Test the solution

You can now call in to the Amazon Connect phone number and interact with the bot.


Runtime hints allow you to influence the transcription of words or phrases dynamically in the conversation. You can use business logic to identify the hints as the conversation evolves. Better recognition of the user input allows you to deliver an enhanced experience. You can configure runtime hints via the Lex V2 SDK. The capability is available in all AWS Regions where Amazon Lex operates in the English (Australia), English (UK), and English (US) locales.

To learn more, refer to runtime hints.

About the Authors

Kai Loreck is a professional services Amazon Connect consultant. He works on designing and implementing scalable customer experience solutions. In his spare time, he can be found playing sports, snowboarding, or hiking in the mountains.

Anubhav Mishra is a Product Manager with AWS. He spends his time understanding customers and designing product experiences to address their business challenges.

Sravan Bodapati is an Applied Science Manager at AWS Lex. He focuses on building cutting edge Artificial Intelligence and Machine Learning solutions for AWS customers in ASR and NLP space. In his spare time, he enjoys hiking, learning economics, watching TV shows and spending time with his family.


Continue Reading


Copyright © 2021 Today's Digital.