Connect with us

Amazon

Use a SageMaker Pipeline Lambda step for lightweight model deployments

With Amazon SageMaker Pipelines, you can create, automate, and manage end-to-end machine learning (ML) workflows at scale. SageMaker Projects build on SageMaker Pipelines by providing several MLOps templates that automate model building and deployment pipelines using continuous integration and continuous delivery (CI/CD). To help you get started, SageMaker Pipelines provides many predefined step types, such…

Published

on

With Amazon SageMaker Pipelines, you can create, automate, and manage end-to-end machine learning (ML) workflows at scale. SageMaker Projects build on SageMaker Pipelines by providing several MLOps templates that automate model building and deployment pipelines using continuous integration and continuous delivery (CI/CD). To help you get started, SageMaker Pipelines provides many predefined step types, such as steps for data processing, model training, model tuning, and batch scoring. You can use a callback step to add functionality to your pipelines using other AWS services. With the new Lambda step that enables you to add an AWS Lambda function to your pipelines, you now have even more flexibility to create custom pipeline functionality.

You can use the Lambda step to run serverless tasks or jobs on Lambda such as splitting datasets or sending custom notifications. There are many other possibilities. For example, although you generally should use SageMaker Projects for robust model deployments with CI/CD best practices, there may be circumstances where it makes sense to use a Lambda step for lightweight model deployments to SageMaker hosting services. By “lightweight,” we mean deployments to development or test endpoints, or to internal endpoints that aren’t customer-facing or serving high volumes of traffic.

In this post, we examine the details of the Lambda step and how you can use it to add custom functionality to your pipelines. Next, we go into the specifics of using the Lambda step for lightweight model deployments. Our example use case involves using SageMaker Pipelines to orchestrate training a Hugging Face natural language processing (NLP) model on the IMDb movie reviews dataset, and deploying it to SageMaker Asynchronous Inference. You can follow along with sample code on GitHub.

Create a Lambda step

Lambda is a serverless compute service that lets you run code without provisioning or managing servers. With Lambda, you can run code in your preferred language for virtually any type of application or backend service—all with minimal administration. In the context of SageMaker Pipelines, a Lambda step enables you to add a Lambda function to your pipelines to support arbitrary compute operations, especially lightweight operations that have short duration. Keep in mind that in a SageMaker Pipelines Lambda step, a Lambda function is limited to 10 minutes maximum runtime, with a modifiable default timeout of 2 minutes.

You have two ways to add a Lambda step to your pipelines. First, you can supply the ARN of an existing Lambda function that you created with the AWS Cloud Development Kit (AWS CDK), AWS Management Console, or otherwise. Second, the high-level SageMaker Python SDK has a Lambda helper convenience class that allows you to create a new Lambda function along with your other code defining your pipeline. Just provide three parameters: a script with your function code, a function name, and an AWS Identity and Access Management (IAM) role for the Lambda function. The role specifies permissions for the Lambda function, such as permissions to access the SageMaker Model Registry or other features and services. As always, follow the principle of least privilege. Given these parameters, SageMaker automatically creates a new Lambda function and runs it as part of the Lambda step. For more information, see the Lambda step documentation.

The Lambda function script for a Lambda step is written the same way as for any other Lambda function. Like all Lambda functions, it must implement a handler method to respond to events, in this case events from the pipeline. To create the actual Lambda function, use the Lambda helper convenience as shown in the following code snippet. You can change the default timeout and specify the Lambda function size.

from sagemaker.lambda_helper import Lambda function_name = “sagemaker-demo-hf-lambda-step” + current_time # Lambda helper class can be used to create the Lambda function func = Lambda( function_name=function_name, execution_role_arn=lambda_role, script=”lambda_deployer.py”, handler=”lambda_deployer.lambda_handler”, timeout=600, memory_size=10240, )

To create a Lambda step, supply the Lambda function created with the helper, and specify the inputs and outputs.

from sagemaker.workflow.lambda_step import ( LambdaStep, LambdaOutput, LambdaOutputTypeEnum, ) step_deploy_lambda = LambdaStep( name=”LambdaStepHuggingFaceDeploy”, lambda_func=func, inputs={ “model_name”: model_name, “endpoint_config_name”: endpoint_config_name, “endpoint_name”: endpoint_name, “model_package_arn”: step_register.steps[0].properties.ModelPackageArn, “role”: role, }, outputs=[output_param_1, output_param_2, output_param_3], )

A lightweight deployment pipeline

Sometimes you need to deploy models to development or test endpoints, or to internal endpoints that aren’t customer-facing or serving high volumes of traffic. These kinds of endpoints may be suitable for lightweight deployment with a Lambda step. The Lambda step can perform tasks such as looking up the latest approved model registered in the SageMaker Model Registry after model building is complete, and then updating an endpoint with that model (or creating a new endpoint if one doesn’t already exist). The overall architecture is shown in the following diagram.

In the sample code for this post, we use SageMaker Pipelines to orchestrate training a Hugging Face NLP model on the IMDb movie reviews dataset and deploying it. Libraries such as the transformers library published by Hugging Face make it easy to add state-of-the-art NLP models to your applications. SageMaker offers a direct integration for Hugging Face models: the HuggingFace estimator in the high-level SageMaker Python SDK enables training, fine-tuning, and optimizing Hugging Face models built with PyTorch and TensorFlow, all with minimal setup. There also is a Hugging Face Inference Toolkit for SageMaker, as described in Announcing managed inference for Hugging Face models in Amazon SageMaker.

Besides a Lambda step, SageMaker Asynchronous Inference is another helpful feature for lightweight deployments. Asynchronous inference is suitable for handling large inference payloads that require a substantial amount of processing time, and where latency requirements are flexible. This may be the case for complex models such as some of the larger Hugging Face NLP models. Asynchronous inference creates a persistent endpoint where the number of instances behind the endpoint can be scaled down to zero. This helps with cost-optimizing workloads where data arrives at the endpoint intermittently, such as our example use case where new movie reviews may occasionally trickle in for predictions.

To create an Asynchronous Inference endpoint, we define an AsyncInferenceConfig object that configures the endpoint to receive asynchronous invocations. This is part of the overall EndpointConfiguration that defines the endpoint’s parameters, such as the model to be deployed and the initial instance count and type. In the AsyncInferenceConfig object, we provide an output Amazon Simple Storage Service (Amazon S3) location for predictions. You also can optionally specify an Amazon Simple Notification Service (Amazon SNS) topic on which to send notifications about predictions. This is shown in the following code snippet.

import boto3 sm_client = boto3.client(‘sagemaker-dev’) endpoint_config_name = resource_name.format(“EndpointConfig”) create_endpoint_config_response = sm_client.create_endpoint_config( EndpointConfigName=endpoint_config_name, ProductionVariants=[ { “VariantName”: “variant1”, “ModelName”: model_name, “InstanceType”: “ml.m5.xlarge”, “InitialInstanceCount”: 1 } ], AsyncInferenceConfig={ “OutputConfig”: { “S3OutputPath”: f”s3://{s3_bucket}/{bucket_prefix}/output”, # Optionally specify Amazon SNS topics “NotificationConfig”: { “SuccessTopic”: “arn:aws:sns:::AsyncInference”, “ErrorTopic”: “arn:aws:sns:::AsyncInference”, } }, “ClientConfig”: { “MaxConcurrentInvocationsPerInstance”: 4 } } )

The complete pipeline includes a processing step, training step, evaluation step and model quality condition. If the model quality condition threshold is met, then additional pipeline steps are run: a model registration step, and deployment using the Lambda step we defined. To begin a pipeline execution, simply call the start method of the pipeline.

from sagemaker.workflow.pipeline import Pipeline import json pipeline = Pipeline( name=f”HuggingFaceDemoPipeline” + “-” + current_time, parameters=[ processing_instance_type, processing_instance_count, training_instance_type, training_instance_count, ], steps=[step_process, step_train, step_eval, step_cond], sagemaker_session=sagemaker_session, ) json.loads(pipeline.definition()) pipeline.upsert(role_arn=role) execution = pipeline.start()

After the endpoint is deployed, you can use the endpoint name to create a SageMaker Predictor object to wrap the endpoint so you can get predictions from it. In the pipeline details page in SageMaker Studio, the pipeline graph will appear as follows after a successful pipeline execution.

Conclusion

SageMaker Pipelines offers extensive flexibility to customize your pipelines to orchestrate virtually any ML workflow. A convenient way to customize a pipeline is to add a Lambda step, which enables you to add serverless compute operations to your pipelines for many different kinds of tasks such as lightweight deployments. To get started with the Lambda step, see the Lambda step documentation and the sample code for this post on GitHub.

About the Authors

Brent Rabowsky focuses on data science at AWS, and leverages his expertise to help AWS customers with their own data science projects.

 

 

 

Kirit Thadaka is an ML Solutions Architect working in the SageMaker Service SA team. Prior to joining AWS, Kirit spent time working in early stage AI startups followed by some time in consulting in various roles in AI research, MLOps, and technical leadership.



Source

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published.

Amazon

Enhance the caller experience with hints in Amazon Lex

We understand speech input better if we have some background on the topic of conversation. Consider a customer service agent at an auto parts wholesaler helping with orders. If the agent knows that the customer is looking for tires, they’re more likely to recognize responses (for example, “Michelin”) on the phone. Agents often pick up…

Published

on

By

We understand speech input better if we have some background on the topic of conversation. Consider a customer service agent at an auto parts wholesaler helping with orders. If the agent knows that the customer is looking for tires, they’re more likely to recognize responses (for example, “Michelin”) on the phone. Agents often pick up such clues or hints based on their domain knowledge and access to business intelligence dashboards. Amazon Lex now supports a hints capability to enhance the recognition of relevant phrases in a conversation. You can programmatically provide phrases as hints during a live interaction to influence the transcription of spoken input. Better recognition drives efficient conversations, reduces agent handling time, and ultimately increases customer satisfaction.

In this post, we review the runtime hints capability and use it to implement verification of callers based on their mother’s maiden name.

Overview of the runtime hints capability

You can provide a list of phrases or words to help your bot with the transcription of speech input. You can use these hints with built-in slot types such as first and last names, street names, city, state, and country. You can also configure these for your custom slot types.

You can use the capability to transcribe names that may be difficult to pronounce or understand. For example, in the following sample conversation, we use it to transcribe the name “Loreck.”

Conversation 1

IVR: Welcome to ACME bank. How can I help you today?

Caller: I want to check my account balance.

IVR: Sure. Which account should I pull up?

Caller: Checking

IVR: What is the account number?

Caller: 1111 2222 3333 4444

IVR: For verification purposes, what is your mother’s maiden name?

Caller: Loreck

IVR: Thank you. The balance on your checking account is 123 dollars.

Words provided as hints are preferred over other similar words. For example, in the second sample conversation, the runtime hint (“Smythe”) is selected over a more common transcription (“Smith”).

Conversation 2

IVR: Welcome to ACME bank. How can I help you today?

Caller: I want to check my account balance.

IVR: Sure. Which account should I pull up?

Caller: Checking

IVR: What is the account number?

Caller: 5555 6666 7777 8888

IVR: For verification purposes, what is your mother’s maiden name?

Caller: Smythe

IVR: Thank you. The balance on your checking account is 456 dollars.

If the name doesn’t match the runtime hint, you can fail the verification and route the call to an agent.

Conversation 3

IVR: Welcome to ACME bank. How can I help you today?

Caller: I want to check my account balance.

IVR: Sure. Which account should I pull up?

Caller: Savings

IVR: What is the account number?

Caller: 5555 6666 7777 8888

IVR: For verification purposes, what is your mother’s maiden name?

Caller: Jane

IVR: There is an issue with your account. For support, you will be forwarded to an agent.

Solution overview

Let’s review the overall architecture for the solution (see the following diagram):

  • We use an Amazon Lex bot integrated with an Amazon Connect contact flow to deliver the conversational experience.
  • We use a dialog codehook in the Amazon Lex bot to invoke an AWS Lambda function that provides the runtime hint at the previous turn of the conversation.
  • For the purposes of this post, the mother’s maiden name data used for authentication is stored in an Amazon DynamoDB table.
  • After the caller is authenticated, the control is passed to the bot to perform transactions (for example, check balance)

In addition to the Lambda function, you can also send runtime hints to Amazon Lex V2 using the PutSession, RecognizeText, RecognizeUtterance, or StartConversation operations. The runtime hints can be set at any point in the conversation and are persisted at every turn until cleared.

Deploy the sample Amazon Lex bot

To create the sample bot and configure the runtime phrase hints, perform the following steps. This creates an Amazon Lex bot called BankingBot, and one slot type (accountNumber).

  1. Download the Amazon Lex bot.
  2. On the Amazon Lex console, choose Actions, Import.
  3. Choose the file BankingBot.zip that you downloaded, and choose Import.
  4. Choose the bot BankingBot on the Amazon Lex console.
  5. Choose the language English (GB).
  6. Choose Build.
  7. Download the supporting Lambda code.
  8. On the Lambda console, create a new function and select Author from scratch.
  9. For Function name, enter BankingBotEnglish.
  10. For Runtime, choose Python 3.8.
  11. Choose Create function.
  12. In the Code source section, open lambda_function.py and delete the existing code.
  13. Download the function code and open it in a text editor.
  14. Copy the code and enter it into the empty function code field.
  15. Choose deploy.
  16. On the Amazon Lex console, select the bot BankingBot.
  17. Choose Deployment and then Aliases, then choose the alias TestBotAlias.
  18. On the Aliases page, choose Languages and choose English (GB).
  19. For Source, select the bot BankingBotEnglish.
  20. For Lambda version or alias, enter $LATEST.
  21. On the DynamoDB console, choose Create table.
  22. Provide the name as customerDatabase.
  23. Provide the partition key as accountNumber.
  24. Add an item with accountNumber: “1111222233334444” and mothersMaidenName “Loreck”.
  25. Add item with accountNumber: “5555666677778888” and mothersMaidenName “Smythe”.
  26. Make sure the Lambda function has permissions to read from the DynamoDB table customerDatabase.
  27. On the Amazon Connect console, choose Contact flows.
  28. In the Amazon Lex section, select your Amazon Lex bot and make it available for use in the Amazon Connect contact flow.
  29. Download the contact flow to integrate with the Amazon Lex bot.
  30. Choose the contact flow to load it into the application.
  31. Make sure the right bot is configured in the “Get Customer Input” block.
  32. Choose a queue in the “Set working queue” block.
  33. Add a phone number to the contact flow.
  34. Test the IVR flow by calling in to the phone number.

Test the solution

You can now call in to the Amazon Connect phone number and interact with the bot.

Conclusion

Runtime hints allow you to influence the transcription of words or phrases dynamically in the conversation. You can use business logic to identify the hints as the conversation evolves. Better recognition of the user input allows you to deliver an enhanced experience. You can configure runtime hints via the Lex V2 SDK. The capability is available in all AWS Regions where Amazon Lex operates in the English (Australia), English (UK), and English (US) locales.

To learn more, refer to runtime hints.

About the Authors

Kai Loreck is a professional services Amazon Connect consultant. He works on designing and implementing scalable customer experience solutions. In his spare time, he can be found playing sports, snowboarding, or hiking in the mountains.

Anubhav Mishra is a Product Manager with AWS. He spends his time understanding customers and designing product experiences to address their business challenges.

Sravan Bodapati is an Applied Science Manager at AWS Lex. He focuses on building cutting edge Artificial Intelligence and Machine Learning solutions for AWS customers in ASR and NLP space. In his spare time, he enjoys hiking, learning economics, watching TV shows and spending time with his family.



Source

Continue Reading

Amazon

The Intel®3D Athlete Tracking (3DAT) scalable architecture deploys pose estimation models using Amazon Kinesis Data Streams and Amazon EKS

This blog post is co-written by Jonathan Lee, Nelson Leung, Paul Min, and Troy Squillaci from Intel.  In Part 1 of this post, we discussed how Intel®3DAT collaborated with AWS Machine Learning Professional Services (MLPS) to build a scalable AI SaaS application. 3DAT uses computer vision and AI to recognize, track, and analyze over 1,000…

Published

on

By

This blog post is co-written by Jonathan Lee, Nelson Leung, Paul Min, and Troy Squillaci from Intel. 

In Part 1 of this post, we discussed how Intel®3DAT collaborated with AWS Machine Learning Professional Services (MLPS) to build a scalable AI SaaS application. 3DAT uses computer vision and AI to recognize, track, and analyze over 1,000 biomechanics data points from standard video. It allows customers to create rich and powerful biomechanics-driven products, such as web and mobile applications with detailed performance data and three-dimensional visualizations.

In Part 2 of this post, we dive deeper into each stage of the architecture. We explore the AWS services used to meet the 3DAT design requirements, including Amazon Kinesis Data Streams and Amazon Elastic Kubernetes Service (Amazon EKS), in order to scalably deploy the necessary pose estimation models for this software as a service (SaaS) application.

Architecture overview

The primary goal of the MLPS team was to productionalize the 2D and 3D pose estimation model pipelines and create a functional and scalable application. The following diagram illustrates the solution architecture.

The complete architecture is broken down into five major components:

  • User application interface layers
  • Database
  • Workflow orchestration
  • Scalable pose estimation inference generation
  • Operational monitoring

Let’s go into detail on each component, their interactions, and the rationale behind the design choices.

User application interface layers

The following diagram shows the application interface layers that provide user access and control of the application and its resources.

These access points support different use cases based on different customer personas. For example, an application user can submit a job via the CLI, whereas a developer can build an application using the Python SDK and embed pose estimation intelligence into their applications. The CLI and SDK are built as modular components—both layers are wrappers of the API layer, which is built using Amazon API Gateway to resolve the API calls and associated AWS Lambda functions, which take care of the backend logic associated with each API call. These layers were a crucial component for the Intel OTG team because it opens up a broad base of customers that can effectively use this SaaS application.

API layer

The solution has a core set of nine APIs, which correspond to the types of objects that operate on this platform. Each API has a Python file that defines the API actions that can be run. The creation of new objects is automatically assigned an object ID sequentially. The attributes of these objects are stored and tracked in the Amazon Aurora Serverless database using this ID. Therefore, the API actions tie back to functions that are defined in a central file that contains the backend logic for querying the Aurora database. This backend logic uses the Boto3 Amazon RDS DataService client to access the database cluster.

The one exception is the /job API, which has a create_job method that handles video submission for creating a new processing job. This method starts the AWS Step Functions workflow logic for running the job. By passing in a job_id, this method uses the Boto3 Step Functions client to call the start_execution method for a specified stateMachineARN (Amazon Resource Name).

The eight object APIs have the methods and similar access pattern as summarized in the following table.

Method Type Function Name Description
GET list_[object_name]s Selects all objects of this type from the database and displays.
POST create_[object] Inserts a new object record with required inputs into the database.
GET get_[object] Selects object attributes based on the object ID from the database and displays.
PUT update_[object] Updates an existing object record with the required inputs.
DELETE delete_[object] Deletes an existing object record from the database based on object ID.

The details of the nine APIs are as follows:

  1. /user – A user is the identity of someone authorized to submit jobs to this application. The creation of a user requires a user name, user email, and group ID that the user belongs to.
  2. /user_group – A user group is a collection of users. Every user group is mapped to one project and one pipeline parameter set. To have different tiers (in terms of infrastructural resources and pipeline parameters), users are divided into user groups. Each user can belong to only one user group. The creation of a user group requires a project ID, pipeline parameter set ID, user group name, and user group description. Note that user groups are different from user roles defined in the AWS account. The latter is used to provide different level of access based on their access roles (for example admin).
  3. /project – A project is used to group different sets of infrastructural resources together. A project is associated with a single project_cluster_url (Aurora cluster) for recording users, jobs, and other metadata, a project_queue_arn (Kinesis Data Streams ARN), and a compute runtime environment (currently controlled via Cortex) used for running inference on the frame batches or postprocessing on the videos. Each user group is associated to one project, and this mechanism is how different tiers are enabled in terms of latency and compute power for different groups of users. The creation of a project requires a project name, project cluster URL, and project queue ARN.
  4. /pipeline – A pipeline is associated with a single configuration for a sequence of processing containers that perform video processing in the Amazon EKS inference generation cluster coordinated by Cortex (see the section on video processing inference generation for more details). Typically, this consists of three containers: preprocessing and decoding, object detection, and pose estimation. For example, the decode and object detection step are the same for the 2D and 3D pipelines, but swapping out the last container using either HRNet or 3DMPPE results in the parameter set for 2D vs. 3D processing pipelines. You can create new configurations to define possible pipelines that can be used for processing, and it requires as input a new Python file in the Cortex repo that details the sequence of model endpoints call that define that pipeline (see the section on video processing inference generation). The pipeline endpoint is the Cortex endpoint that is called to process a single frame. The creation of a pipeline requires a pipeline name, pipeline description, and pipeline endpoint.
  5. /pipeline_parameter_set – A pipeline parameter set is a flexible JSON collection of multiple parameters (a pipeline configuration runtime) for a particular pipeline, and is added to provide flexibility for future customization when multiple pipeline configuration runtimes are required. User groups can be associated with a particular pipeline parameter set, and its purpose is to have different groups of parameters per user group and per pipeline. This was an important forward-looking addition for Intel OTG to build in customization that supports portability as different customers, particularly ISVs, start using the application.
  6. /pipeline_parameters – A single collection of pipeline parameters is an instantiation of a pipeline parameter set. This makes it a 1:many mapping of a pipeline parameter set to pipeline parameters. This API requires a pipeline ID to associate with the set of pipeline parameters that enables the creation of a pipeline for a 1:1 mapping of pipeline parameters to the pipeline. The other inputs required by this API are a pipeline parameter set ID, pipeline parameters value, and pipeline parameters name.
  7. /video – A video object is used to define individual videos that make up a .zip package submitted during a job. This file is broken up into multiple videos after submission. A video is related to the job_id for the job where the .zip package is submitted, and Amazon Simple Storage Service (Amazon S3) paths for the location of the raw separated videos and the postprocessing results of each video. The video object also contains a video progress percentage, which is consistently updated during processing of individual frame batches of that video, as well as a video status flag for complete or not complete. The creation of a video requires a job ID, video path, video results path, video progress percentage, and video status.
  8. /frame_batch – A frame_batch object is a mini-batch of frames created by sampling a single video. Separating a video into regular-sized frame batches provides a lever to tune latency and increases parallelization and throughput. This is the granular unit that is run through Kinesis Data Streams for inference. A creation of a frame batch requires a video ID, frame batch start number, frame batch end number, frame batch input path, frame batch results path, and frame batch status.
  9. /job – This interaction API is used for file submission to start a processing job. This API has a different function from other object APIs because it’s the direct path to interact with the video processing backend Step Functions workflow coordination and Amazon EKS cluster. This API requires a user ID, project ID, pipeline ID, pipeline parameter set ID, job parameters, and job status. In the job parameters, an input file path is specified, which is the location in Amazon S3 where the .zip package of videos to be processed is located. File upload is handled with the upload_handler method, which generates a presigned S3 URL for the user to place a file. A WORKFLOW_STATEMACHINE_ARN is an environment variable that is passed to the create_job API to specify where a video .zip package with an input file path is submitted to start a job.

The following table summarizes the API’s functions.

Method Type Function Description
GET list_jobs Selects all jobs from the database and displays.
POST create_ job Inserts a new job record with user ID, project ID, pipeline ID, pipeline parameter set ID, job results path, job parameters, and job status.
GET get_ job Selects job attributes based on job ID from the database and displays.
GET upload_handler Generates a presigned S3 URL as the location for the .zip file upload. Requires an S3 bucket name and expects an application/zip file type.

Python SDK layer

Building upon the APIs, the team created a Python SDK client library as a wrapper to make it easier for developers to access the API methods. They used the open-source Poetry, which handles Python packaging and dependency management. They created a client.py file that contains functions wrapping each of the APIs using the Python requests library to handle API requests and exceptions.

For developers to launch the Intel 3DAT SDK, they need to install and build the Poetry package. Then, they can add a simple Python import of intel_3dat_sdk to any Python code.

To use the client, you can create an instance of the client, specifying the API endpoint:

client = intel_3dat_sdk.client(endpoint_url=”API Endpoint Here”)

You can then use the client to call the individual methods such as the create_pipeline method (see the following code), taking in the proper arguments such as pipeline name and pipeline description.

client.create_pipeline({ “pipeline_name”: “pipeline_1”, “pipeline_description”: “first instance of a pipeline” })

CLI layer

Similarly, the team built on the APIs to create a command line interface for users who want to access the API methods with a straightforward interface without needing to write Python code. They used the open-source Python package Click (Command Line Interface Creation Kit). The benefits of this framework are the arbitrary nesting of commands, automatic help page generation, and support of lazy loading of subcommands at runtime. In the same client.py file as in the SDK, each SDK client method was wrapped using Click and the required method arguments were converted to command line flags. The flag inputs are then used when calling the SDK command.

To launch the CLI, you can use the CLI configure command. You’re prompted for the endpoint URL:

$ intel-cli configure Endpoint url:

Now you can use the CLI to call different commands related to the API methods, for example:

$ intel-cli list-users [ { “user_id”: 1, “group_id”: 1, “user_name”: “Intel User”, “user_email”: Example_Intel_User@intel.com } ]

Database

As a database, this application uses Aurora Serverless to store metadata associated with each of the APIs with MYSQL as the database engine. Choosing the Aurora Serverless database service adheres to the design principle to minimize infrastructural overhead by utilizing serverless AWS services when possible. The following diagram illustrates this architecture.

The serverless engine mode meets the intermittent usage pattern because this application scales up to new customers and workloads are still uncertain. When launching a database endpoint, a specific DB instance size isn’t required, only a minimum and maximum range for cluster capacity. Aurora Serverless handles the appropriate provisioning of a router fleet and distributes the workload amongst the resources. Aurora Serverless automatically performs backup retention for a minimum of 1 day up to 35 days. The team optimized for safety by setting the default to the maximum value of 35.

In addition, the team used the Data API to handle access to the Aurora Serverless cluster, which doesn’t require a persistent connection, and instead provides a secure HTTP endpoint and integration with AWS SDKs. This feature uses AWS Secrets Manager to store the database credentials so there is no need to explicitly pass credentials. CREATE TABLE scripts were written in .sql files for each of the nine tables that correspond to the nine APIs. Because this database contained all the metadata and state of objects in the system, the API methods were run using the appropriate SQL commands (for example select * from Job for the list_jobs API) and passed to the execute_statement method from the Amazon RDS client in the Data API.

Workflow orchestration

The functional backbone of the application was handled using Step Functions to coordinate the workflow, as shown in the following diagram.

The state machine consisted of a sequence of four Lambda functions, which starts when a job is submitted using the create_job method from the job API. The user ID, project ID, pipeline ID, pipeline parameter set ID, job results path, job parameters, and job status are required for job creation. You can first upload a .zip package of video files using the upload_handler method from the job API to generate a presigned S3 URL. During job submission, the input file path is passed via the job parameters, to specify the location of the file. This starts the run of the workflow state machine, triggering four main steps:

  • Initializer Lambda function
  • Submitter Lambda function
  • Completion Check Lambda function
  • Collector Lambda function

Initializer Lambda function

The main function of the Initializer is to separate the .zip package into individual video files and prepare them for the Submitter. First, the .zip file is downloaded, and then each individual file, including video files, is unzipped and extracted. The videos, preferably in .mp4 format, are uploaded back into an S3 bucket. Using the create_video method in the video API, a video object is created with the video path as input. This inserts data on each video into the Aurora database. Any other file types, such as JSON files, are considered metadata and similarly uploaded, but no video object is created. A list of the names of files and video files extracted is passed to the next step.

Submitter Lambda function

The Submitter function takes the video files that were extracted by the Initializer and creates mini-batches of video frames as images. Most current computer vision models in production have been trained on images so even when video is processed, they’re first separated into image frames before model inference. This current solution using a state-of-the-art pose estimation model is no different—the frame batches from the Submitter are passed to Kinesis Data Streams to initiate the inference generation step.

First, the video file is downloaded by the Lambda function. The frame rate and number of frames is calculated using the FileVideoStream module from the imutils.video processing library. The frames are extracted and grouped according to a specified mini-batch size, which is one of the key tunable parameters of this pipeline. Using the Python pickle library, the data is serialized and uploaded to Amazon S3. Subsequently, a frame batch object is created and the metadata entry is created in the Aurora database. This Lambda function was built using a Dockerfile with dependencies on opencv-python, numpy, and imutils libraries.

Completion Check Lambda function

The Completion Check function continues to query the Aurora database to see, for each video in the .zip package for this current job, how many frame batches are in the COMPLETED status. When all frame batches for all videos are complete, then this check process is complete.

Collector Lambda function

The Collector function takes the outputs of the inferences that were performed on each frame during the Consumer stage and combines them across a frame batch and across a video. The combined merged data is then upload to an S3 bucket. The function then invokes the Cortex postprocessing API for a particular ML pipeline in order to perform any postprocessing computations, and adds the aggregated results by video to the output bucket. Many of these metrics are calculated across frames, such as speed, acceleration, and joint angle, so this calculation needs to be performed on the aggregated data. The main outputs include body key points data (aggregated into CSV format), BMA calculations (such as acceleration), and visual overlay of key points added to each frame in an image file.

Scalable pose estimation inference generation

The processing engine that powers the scaling of ML inference happens in this stage. It involves three main pieces, each with have their own concurrency levers that can be tuned for latency tradeoffs (see the following diagram).

This architecture allows for experimentation in testing latency gains, as well as flexibility for the future when workloads may change with different mixes of end-user segments that access the application.

Kinesis Data Streams

The team chose Kinesis Data Streams because it’s typically used to handle streaming data, and in this case is a good fit because it can handle frame batches in a similar way to provide scalability and parallelization. In the Submitter Lambda function, the Kinesis Boto3 client is used, with the put_record method passing in the metadata associated with a single frame batch, such as the frame batch ID, frame batch starting frame, frame batch ending frame, image shape, frame rate, and video ID.

We defined various job queue and Kinesis data stream configurations to set levels of throughput that tie back to the priority level of different user groups. Access to different levels of processing power is linked by passing a project queue ARN when creating a new project using the project API. Every user group is then linked to a particular project during user group creation. Three default stream configurations are defined in the AWS Serverless Application Model (AWS SAM) infrastructure template:

  • Standard – JobStreamShardCount
  • Priority – PriorityJobStreamShardCount
  • High priority – HighPriorityJobStreamShardCount

The team used a few different levers to differentiate the processing power of each stream or tune the latency of the system, as summarized in the following table.

Lever Description Default value
Shard A shard is native to Kinesis Data Streams; it’s the base unit of throughput for ingestion. The default is 1MB/sec, which equates to 1,000 data records per second. 2
KinesisBatchSize The maximum number of records that Kinesis Data Streams retrieves in a single batch before invoking the consumer Lambda function. 1
KinesisParallelizationFactor The number of batches to process from each shard concurrently. 1
Enhanced fan-out Data consumers who have enhanced fan-out activated have a dedicated ingestion throughput per consumer (such as the default 1MB/sec) instead of sharing throughput across consumers. Off

Consumer Lambda function

From the perspective of Kinesis Data Streams, a data consumer is an AWS service that retrieves data from a data stream shard as data is generated in a stream. This application uses a Consumer Lambda function, which is invoked when messages are passed from the data stream queues. Each Consumer function processes one frame batch by performing the following steps. First, a call is made to the Cortex processor API synchronously, which is the endpoint that hosts the model inference pipeline (see the next section regarding Amazon EKS with Cortex for more details). The results are stored in Amazon S3, and an update is made to the database by changing the status of the processed frame batch to Complete. Error handling is built in to manage the Cortex API call with a retry loop to handle any 504 errors from the Cortex cluster, with number of retries set to 5.

Amazon EKS with Cortex for ML inference

The team used Amazon EKS, a managed Kubernetes service in AWS, as the compute engine for ML inference. A design choice was made to use Amazon EKS to host ML endpoints, giving the flexibility of running upstream Kubernetes with the option of clusters both fully managed in AWS via AWS Fargate, or on-premises hardware via Amazon EKS Anywhere. This was a critical piece of functionality desired by Intel OTG, which provided the option to hook up this application to specialized on-premises hardware, for example.

The three ML models that were the building blocks for constructing the inference pipelines were a custom Yolo model (for object detection), a custom HRNet model (for 2D pose estimation), and a 3DMPPE model (for 3D pose estimation) (see the previous ML section for more details). They used the open-source Cortex library to handle deployment and management of ML inference pipeline endpoints, and launching and deployment of the Amazon EKS clusters. Each of these models were packaged up into Docker containers—model files were stored in Amazon S3 and model images were stored in Amazon Elastic Container Registry (Amazon ECR)—and deployed as Cortex Realtime APIs. Versions of the model containers that run on CPU and GPU were created to provide flexibility for the type of compute hardware. In the future, if additional models or model pipelines need to be added, they can simply create additional Cortex Realtime APIs.

They then constructed inference pipelines by composing together the Cortex Realtime model APIs into Cortex Realtime pipeline APIs. A single Realtime pipeline API consisted of calling a sequence of Realtime model APIs. The Consumer Lambda functions treated a pipeline API as a black box, using a single API call to retrieve the final inference output for an image. Two pipelines were created: a 2D pipeline and a 3D pipeline.

The 2D pipeline combines a decoding preprocessing step, object detection using a custom Yolo model to locate the athlete and produce bounding boxes, and finally a custom HRNet model for creating 2D key points for pose estimation.

The 3D pipeline combines a decoding preprocessing step, object detection using a custom Yolo model to locate the athlete and produce bounding boxes, and finally a 3DMPPE model for creating 3D key points for pose estimation.

After generating inferences on a batch of frames, each pipeline also includes a separate postprocessing Realtime Cortex endpoint that generates three main outputs:

  • Aggregated body key points data into a single CSV file
  • BMA calculations (such as acceleration)
  • Visual overlay of key points added to each frame in an image file

The Collector Lambda function submits the appropriate metadata associated with a particular video, such as the frame IDs and S3 locations of the pose estimation inference outputs, to the endpoint to generate these postprocessing outputs.

Cortex is designed to be integrated with Amazon EKS, and only requires a cluster configuration file and a simple command to launch a Kubernetes cluster:

cortex cluster up –config cluster_config_file.yaml

Another lever for performance tuning was the instance configuration for the compute clusters. Three tiers were created with varying mixes of M5 and G4dn instances, codified as .yaml files with specifications such as cluster name, Region, and instance configuration and mix. M5 instances are lower-cost CPU-based and G4dn are higher cost GPU-based to provide some cost/performance tradeoffs.

Operational monitoring

To maintain operational logging standards, all Lambda functions include code to record and ingest logs via Amazon Kinesis Data Firehose. For example, every frame batch processed from the Submitter Lambda function is logged with the timestamp, name of action, and Lambda function response JSON and saved to Amazon S3. The following diagram illustrates this step in the architecture.

Deployment

Deployment is handled using AWS SAM, an open-source framework for building serverless applications in AWS. AWS SAM enables infrastructure design, including functions, APIs, databases, and event source mappings to be codified and easily deployed in new AWS environments. During deployment, the AWS SAM syntax is translated into AWS CloudFormation to handle the infrastructure provisioning.

A template.yaml file contains the infrastructure specifications along with tunable parameters, such as Kinesis Data Streams latency levers detailed in the preceding sections. A samconfig.toml file contains deployment parameters such as stack name, S3 bucket name where application files like Lambda function code is stored, and resource tags for tracking cost. A deploy.sh shell script with the simple commands is all that is required to build and deploy the entire template:

sam build -t template.yaml sam deploy –config-env “env_name” –profile “profile_name”

User work flow

To sum up, after the infrastructure has been deployed, you can follow this workflow to get started:

  1. Create an Intel 3DAT client using the client library.
  2. Use the API to create a new instance of a pipeline corresponding to the type of processing that is required, such as one for 3D pose estimation.
  3. Create a new instance of a project, passing in the cluster ARN and Kinesis queue ARN.
  4. Create a new instance of a pipeline parameter set.
  5. Create a new instance of pipeline parameters that map to the pipeline parameter set.
  6. Create a new user group that is associated with a project ID and a pipeline parameter set ID.
  7. Create a new user that is associated with the user group.
  8. Upload a .zip file of videos to Amazon S3 using a presigned S3 URL generated by the upload function in the job API.
  9. Submit a create_job API call, with job parameters that specify location of the video files. This starts the processing job.

Conclusion

The application is now live and ready to be tested with athletes and coaches alike. Intel OTG is excited to make innovative pose estimation technology using computer vision accessible for a variety of users, from developers to athletes to software vendor partners.

The AWS team is passionate about helping customers like Intel OTG accelerate their ML journey, from the ideation and discovery stage with ML Solutions Lab to the hardening and deployment stage with AWS ML ProServe. We will all be watching closely at the 2021 Tokyo Olympics this summer to envision all the progress that ML can unlock in sports.

Get started today! Explore your use case with the services mentioned in this post and many others on the AWS Management Console.

About the Authors

Han Man is a Senior Manager- Machine Learning & AI at AWS based in San Diego, CA. He has a PhD in engineering from Northwestern University and has several years of experience as a management consultant advising clients in manufacturing, financial services, and energy. Today he is passionately working with customers from a variety of industries to develop and implement machine learning & AI solutions on AWS. He enjoys following the NBA and playing basketball in his spare time.

Iman Kamyabi is an ML Engineer with AWS Professional Services. He has worked with a wide range of AWS customers to champion best practices in setting up repeatable and reliable ML pipelines.

Jonathan Lee is the Director of Sports Performance Technology, Olympic Technology Group at Intel. He studied the application of machine learning to health as an undergrad at UCLA and during his graduate work at University of Oxford. His career has focused on algorithm and sensor development for health and human performance. He now leads the 3D Athlete Tracking project at Intel.

Nelson Leung is the Platform Architect in the Sports Performance CoE at Intel, where he defines end-to-end architecture for cutting-edge products that enhance athlete performance. He also leads the implementation, deployment and productization of these machine learning solutions at scale to different Intel partners.

Troy Squillaci is a DecSecOps engineer at Intel where he delivers professional software solutions to customers through DevOps best practices. He enjoys integrating AI solutions into scalable platforms in a variety of domains.

Paul Min is an Associate Solutions Architect Intern at Amazon Web Services (AWS), where he helps customers across different industry verticals advance their mission and accelerate their cloud adoption. Previously at Intel, he worked as a Software Engineering Intern to help develop the 3D Athlete Tracking Cloud SDK. Outside of work, Paul enjoys playing golf and can be heard singing.



Source

Continue Reading

Amazon

Intelligently search your Jira projects with Amazon Kendra Jira cloud connector

Organizations use agile project management platforms such as Atlassian Jira to enable teams to collaborate to plan, track, and ship deliverables. Jira captures organizational knowledge about the workings of the deliverables in the issues and comments logged during project implementation. However, making this knowledge easily and securely available to users is challenging due to it…

Published

on

By

Organizations use agile project management platforms such as Atlassian Jira to enable teams to collaborate to plan, track, and ship deliverables. Jira captures organizational knowledge about the workings of the deliverables in the issues and comments logged during project implementation. However, making this knowledge easily and securely available to users is challenging due to it being fragmented across issues belonging to different projects and sprints. Additionally, because different stakeholders such as developers, test engineers, and project managers contribute to the same issue by logging it and then adding attachments and comments, traditional keyword-based search is rendered ineffective when searching for information in Jira projects.

You can now use the Amazon Kendra Jira cloud connector to index issues, comments, and attachments in your Jira projects, and search this content using Amazon Kendra intelligent search, powered by machine learning (ML).

This post shows how to use the Amazon Kendra Jira cloud connector to configure a Jira cloud instance as a data source for an Amazon Kendra index, and intelligently search the contents of the projects in it. We use an example of Jira projects where team members collaborate by creating issues and adding information to them in the form of descriptions, comments, and attachments throughout the issue lifecycle.

Solution overview

A Jira instance has one or more projects, where each project has team members working on issues in that project. Each team member has set of permissions about the operations they can perform with respect to different issues in the project they belong to. Team members can create new issues, or add more information to the issues in the form of attachments and comments, as well as change the status of an issue from its opening to closure throughout the issue lifecycle defined for that project. A project manager creates sprints, assigns issues to specific sprints, and assigns owners to issues. During the course of the project, the knowledge captured in these issues keeps evolving.

In our solution, we configure a Jira cloud instance as a data source to an Amazon Kendra search index using the Amazon Kendra Jira connector. Based on the configuration, when the data source is synchronized, the connector crawls and indexes the content from the projects in the Jira instance. Optionally, you can configure it to index the content based on the change log. The connector also collects and ingests access control list (ACL) information for each issue, comment, and attachment. The ACL information is used for user context filtering, where search results for a query are filtered by what a user has authorized access to.

Prerequisites

To try out the Amazon Kendra connector for Jira using this post as a reference, you need the following:

Jira instance configuration

This section describes the Jira configuration used to demonstrate how to configure an Amazon Kendra data source using the Jira connector, ingest the data from the Jira projects into the Amazon Kendra index, and make search queries. You can use your own Jira instance for which you have admin access or create a new project and carry out the steps to try out the Amazon Kendra connector for Jira.

In our example Jira instance, we created two projects to demonstrate that the search queries made by users return results from only the projects to which they have access. We used data from the following public domain projects to simulate the use case of real-life software development projects:

The following is a screenshot of our Kanban-style board for project 1.

Create an API token for the Jira instance

To get the API token needed to configure the Amazon Kendra Jira connector, complete the following steps:

  1. Log in to https://id.atlassian.com/manage/api-tokens.
  2. Choose Create API token.
  3. In the dialog box that appears, enter a label for your token and choose Create.
  4. Choose Copy and enter the token on a temporary notepad.

You can’t copy this token again, and you need it to configure the Amazon Kendra Jira connector.

Configure the data source using the Amazon Kendra connector for Jira

To add a data source to your Amazon Kendra index using the Jira connector, you can use an existing index or create a new index. Then complete the following steps. For more information on this topic, refer to Amazon Kendra Developer Guide.

  1. On the Amazon Kendra console, open your index and choose Data sources in the navigation pane.
  2. Choose Add data source.
  3. Under Jira, choose Add connector.
  4. In the Specify data source details section, enter the details of your data source and choose Next.
  5. In the Define access and security section, for Jira Account URL, enter the URL of your Jira cloud instance.
  6. Under Authentication, you have two options:
    1. Choose Create to add a new secret using the Jira API token you copied from your Jira instance and use the email address used to log in to Jira as the Jira ID. (This is the option we choose for this post.)
    2. Use an existing AWS Secrets Manager secret that has the API token for the Jira instance you want the connector to access.
  7. For IAM role, choose Create a new role or choose an existing IAM role configured with appropriate IAM policies to access the Secrets Manager secret, Amazon Kendra index, and data source.
  8. Choose Next.
  9. In the Configure sync settings section, provide information about your sync scope and run schedule.
  10. Choose Next.
  11. In the Set field mappings section, you can optionally configure the field mappings, or how the Jira field names are mapped to Amazon Kendra attributes or facets.
  12. Choose Next.
  13. Review your settings and confirm to add the data source.
  14. After the data source is added, choose Data sources in the navigation pane, select the newly added data source, and choose Sync now to start data source synchronization with the Amazon Kendra index.
    The sync process can take about 10–15 minutes. Let’s now enable access control for the Amazon Kendra index.
  15. In the navigation pane, choose your index.
  16. In the middle pane, choose the User access control tab.
  17. Choose Edit settings and change the settings to look like the following screenshot.
  18. Choose Next and then choose Update.

Perform intelligent search with Amazon Kendra

Before you try searching on the Amazon Kendra console or using the API, make sure that the data source sync is complete. To check, view the data sources and verify if the last sync was successful.

  1. To start your search, on the Amazon Kendra console, choose Search indexed content in the navigation pane.
    You’re redirected to the Amazon Kendra Search console.
  2. Expand Test query with an access token and choose Apply token.
  3. For Username, enter the email address associated with your Jira account.
  4. Choose Apply.

Now we’re ready to search our index. Let’s use the query “where does boto3 store security tokens?”

In this case, Kendra provides a suggested answer from one of the cards in our Kanban project on Jira.

Note that this is also a suggested answer pointing to an issue discussing AWS security tokens and Boto3. You may also build search experience with multiple data sources including SDK documentation and wikis with Amazon Kendra, and present results and related links accordingly. The following screenshot shows another search query made against the same index.

Note that when we apply a different access token (associate the search with a different user), the search results are restricted to projects that this user has access to.

Lastly, we can also use filters relevant to Jira in our search. First, we navigate to our index’s Facet definition page and check Facetable for j_status, j_assignee, and j_project_name. For every search, we can then filter by these fields, as shown in the following screenshot.

Clean up

To avoid incurring future costs, clean up the resources you created as part of this solution. If you created a new Amazon Kendra index while testing this solution, delete it. If you only added a new data source using the Amazon Kendra connector for Jira, delete that data source.

Conclusion

With the Amazon Kendra Jira connector, your organization can make invaluable knowledge in your Jira projects available to your users securely using intelligent search powered by Amazon Kendra.

To learn more about the Amazon Kendra Jira connector, refer to the Amazon Kendra Jira connector section of the Amazon Kendra Developer Guide.

For more information on other Amazon Kendra built-in connectors to popular data sources, refer to Unravel the knowledge in Slack workspaces with intelligent search using the Amazon Kendra Slack connector and Search for knowledge in Quip documents with intelligent search using the Quip connector for Amazon Kendra.

About the Authors

Shreyas Subramanian is an AI/ML specialist Solutions Architect, and helps customers by using Machine Learning to solve their business challenges on the AWS Cloud.

Abhinav JawadekarAbhinav Jawadekar is a Principal Solutions Architect focused on Amazon Kendra in the AI/ML language services team at AWS. Abhinav works with AWS customers and partners to help them build intelligent search solutions on AWS.



Source

Continue Reading

Trending

Copyright © 2021 Today's Digital.