Connect with us

Amazon

Annotate DICOM images and build an ML model using the MONAI framework on Amazon SageMaker

DICOM (Digital Imaging and Communications in Medicine) is an image format that contains visualizations of X-Rays and MRIs as well as any associated metadata. DICOM is the standard for medical professionals and healthcare researchers for visualizing and interpreting X-Rays and MRIs. The purpose of this post is to solve two problems: Visualize and label DICOM…

Published

on

[]DICOM (Digital Imaging and Communications in Medicine) is an image format that contains visualizations of X-Rays and MRIs as well as any associated metadata. DICOM is the standard for medical professionals and healthcare researchers for visualizing and interpreting X-Rays and MRIs. The purpose of this post is to solve two problems:

  • Visualize and label DICOM images using a custom data labeling workflow on Amazon SageMaker Ground Truth, a fully managed data labeling service supporting built-in or custom data labeling workflows
  • Develop a DenseNet image classification model using the MONAI framework on Amazon SageMaker, a comprehensive and fully managed data science platform with purpose-built tools to prepare, build, train, and deploy machine learning (ML) models on the cloud

[]For this post, we use a chest X-Ray DICOM images dataset from the MIMIC Chest X-Ray (MIMIC-CXR) Database, a publicly available database of chest X-Ray images in DICOM format and the associated radiology reports as free text files. To access the files, you must be a registered user and sign the data use agreement.

[]We label the images through the Ground Truth private workforce. AWS can also provide professional managed workforces with experience labeling medical images.

Solution overview

[]The following diagram shows the high-level workflow with the following key components:

  1. The DICOM images are stored in a third-party picture archiving and communication system (PACS) or Vendor Neutral Archive (VNA), and retrieved through DICOMwebTM.
  2. An input manifest.json file is uploaded to Amazon Simple Storage Service (Amazon S3). The file contains the DICOM instance ID as the data source and potential labels used by annotators when they perform the labeling jobs.
  3. Two AWS Lambda functions are essential for creating labeling jobs on Ground Truth:
  4. A HTML template with Crowd elements for submitting the labeling jobs and processing the output object. Subsequently, the output of labeling jobs is saved in an output label S3 bucket.
  5. A SageMaker notebook can retrieve the outputs of labeling jobs and use them to train a supervised ML model.

[]

[]We have built a HTML template that you can use to classify the chest X-Ray DICOM images into one or more categories, out of 13 possible acute and chronic cardiopulmonary conditions. The HTML template built on top of cornerstone.js supports DICOM image retrieval and interactive visualization, plus several annotation and general Cornerstone tools as an example.

[]In the following sections, we walk through building the DICOM data labeling workflow and performing ML model training using the output of the labeling jobs.

Deploy a third-party PACS on AWS

[]We use Orthanc as an open-source, lightweight PACS for this post, in which any PACS or VNA supporting DICOMwebTM can be used. You can deploy the Orthanc for Docker container on AWS by launching the following AWS CloudFormation stack:

[]

[]Fill in the required information during the deployment, including Amazon Elastic Compute Cloud (Amazon EC2) key pair to access the hosting EC2 instance and network infrastructure (VPC and subnets). An NGINX server has been added in the container to proxy the HTTPS traffic to the Orthanc server at port 8042, which also adds Access-Control-Allow-Origin headers for cross-origin resource sharing (CORS). The Orthanc container is deployed on Amazon Elastic Container Service (Amazon ECS) and connected to an Amazon Aurora PostgreSQL database.

[]After the CloudFormation stack is successfully created, take note of the Orthanc endpoint URL on the Outputs tab.

[]

Create the Lambda functions, S3 bucket, and SageMaker notebook instance

[]The following CloudFormation stack creates the required Lambda functions with appropriate AWS Identity and Access Management (IAM) roles, S3 bucket for input and output files, and SageMaker notebook instance with sample Jupyter notebook:

[]

[]For the parameter PreLabelLambdaSourceEndpointURL, enter the Orthanc endpoint URL from previous step, which the pre-labeling task Lambda function uses to generate WADO-URI for a given DICOM instance ID. We recommend creating a notebook instance type of ml.m5.xlarge to carry out the ML modeling after image annotations.

[]After the stack deployment, take note of the outputs, including the SMGTLabelingExecutionRole and SageMakerAnnotationS3Bucket values.

[]

[]The IAM role SMGTLabelingExecutionRole is used to create the Ground Truth labeling job. For more information about adding those policies, as well as a step-by-step tutorial on how to run the Ground Truth labeling job after launching the CloudFormation stack, see Build a custom data labeling workflow with Amazon SageMaker Ground Truth.

Upload DICOM images and prepare the input manifest

[]You can upload the DICOM images to the Orthanc server either through its web UI or the WADO-RS REST API. After the DICOM images are uploaded, you can retrieve the DICOM instance IDs for them, and generate a manifest file with the instance IDs. Each JSON object separated by a standard line break in the manifest file represents an input task sent to the workforce for labeling. The data object in this case contains the instance ID and metadata for the potential labels, which are 13 possible diseases indicated by the image. Assuming one DICOM instance ID is 502b0a4b-5cb43965-7f092716-bd6fe6d6-4f7fc3ce, the corresponding JSON object in the manifest file looks like the following:

{“source”: “502b0a4b-5cb43965-7f092716-bd6fe6d6-4f7fc3ce”, “labels”: “Atelectasis,Cardiomegaly,Consolidation,Edema,Enlarged Cardiomediastinum,Fracture,Lung Lesion,Lung Opacity,Pleural Effusion,Pneumonia,Pneumothorax,Pleural Other,Support Devices,No Finding”} []After the manifest.json file is compiled, upload it to the S3 bucket created earlier: SageMakerAnnotationS3Bucket.

Build a custom data labeling job:

[]You should now be able to create a custom labeling job on Ground Truth:

  1. Create a private work team and add members to the team.

[]The workers receive an email with the labeling portal sign-in URL, which is also available on the Amazon SageMaker console.

[]

[]AWS can also provide medical imaging experts to label your data. Contact your AWS team for details.

  1. Specify the input and output data locations using the SageMakerAnnotationS3Bucket bucket created earlier.
  2. Specify SMGTLabelingExecutionRole as the IAM role for the labeling job.
  3. For Task category, choose Custom.

[]

  1. Enter the content in liquid.html in the Custom Template text field.
  2. Configure the gt-prelabel-task-lambda and gt-postlabel-task-lambda functions created earlier.
  3. Choose Create.

[]

  1. After you configure the custom labeling task, choose Preview.

[]The following video shows our preview.

[]If you created a private workforce, you can go to the Labeling Workforces tab and find the annotation console link.

[]You can get started with a base HTML template, or modify the sample Ground Truth task UIs for image, text, and audio data labeling jobs. The basic building blocks for the custom template are Crowd HTML elements. The crowd-form and crowd-button are essential for submitting the annotations to Ground Truth. In addition, you need Liquid objects in the template for job automation, in particular, the task input object read in by the pre-labeling Lambda function:

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Amazon

Search enterprise data assets using LLMs backed by knowledge graphs

In this post, we present a generative AI-powered semantic search solution that empowers business users to quickly and accurately find relevant data assets across various enterprise data sources. In this solution, we integrate large language models (LLMs) hosted on Amazon Bedrock backed by a knowledge base that is derived from a knowledge graph built on…

Published

on

By

In this post, we present a generative AI-powered semantic search solution that empowers business users to quickly and accurately find relevant data assets across various enterprise data sources. In this solution, we integrate large language models (LLMs) hosted on Amazon Bedrock backed by a knowledge base that is derived from a knowledge graph built on Amazon Neptune to create a powerful search paradigm that enables natural language-based questions to integrate search across documents stored in Amazon Simple Storage Service (Amazon S3), data lake tables hosted on the AWS Glue Data Catalog, and enterprise assets in Amazon DataZone.

Source

Continue Reading

Amazon

Getting started with Amazon Bedrock Agents custom orchestrator

In this post, we explore how Amazon Bedrock Agents simplify the orchestration of generative AI workflows, particularly with the introduction of the custom orchestrator feature. You can use the custom orchestrator to fine-tune and optimize agentic workflows that align more closely with specific business and operational needs. We outline the feature’s key benefits, including full…

Published

on

By

In this post, we explore how Amazon Bedrock Agents simplify the orchestration of generative AI workflows, particularly with the introduction of the custom orchestrator feature. You can use the custom orchestrator to fine-tune and optimize agentic workflows that align more closely with specific business and operational needs. We outline the feature’s key benefits, including full control over orchestration, real-time adjustments, and reusability, followed by a breakdown of how it manages state transitions and contract-based interactions between Amazon Bedrock Agents and AWS Lambda.

Source

Continue Reading

Amazon

Amazon FSx for Lustre increases throughput to GPU instances by up to 15x

Amazon FSx for Lustre now features Elastic Fabric Adapter and NVIDIA GPUDirect Storage for up to 15x higher throughput to GPUs, unlocking new possibilities in deep learning, autonomous vehicles, and HPC workloads. Source

Published

on

By

Amazon FSx for Lustre now features Elastic Fabric Adapter and NVIDIA GPUDirect Storage for up to 15x higher throughput to GPUs, unlocking new possibilities in deep learning, autonomous vehicles, and HPC workloads.

Source

Continue Reading

Trending

Copyright © 2021 Today's Digital.