Build machine learning at the edge applications using Amazon SageMaker Edge Manager and AWS IoT Greengrass V2
Running machine learning (ML) models at the edge can be a powerful enhancement for Internet of Things (IoT) solutions that must perform inference without a constant connection back to the cloud. Although there are numerous ways to train ML models for countless applications, effectively optimizing and deploying these models for IoT devices can present many…
[]Running machine learning (ML) models at the edge can be a powerful enhancement for Internet of Things (IoT) solutions that must perform inference without a constant connection back to the cloud. Although there are numerous ways to train ML models for countless applications, effectively optimizing and deploying these models for IoT devices can present many obstacles.
[]Some popular questions include: How can ML models be packaged for deployment across a fleet of devices? How can an ML model be optimized for specific edge device hardware? How can we efficiently get inference feedback back to the cloud? What ML libraries do we need to install on our storage-constrained IoT devices?
[]In this post, we show how you can integrate Amazon SageMaker Edge Manager and AWS IoT Greengrass to build robust ML applications that are targeted specifically for edge use cases. AWS IoT Greengrass is an open-source edge runtime that we can use to build, deploy, and manage edge applications across a fleet of devices. Edge Manager can optimize and package our ML models for specific device targets, and provides an integration capability for inference in edge applications via gRPC.
Solution overview
[]We take a use case in the agriculture industry as an example. Today’s agriculture customers are looking for solutions to monitor, track, and count livestock in the most remote areas when transferred from the nursery, weaning, growth to finish, and market locations. These solutions must be run at the edge for connectivity, latency, and cost reasons. To solve this problem, we show you how to use the Amazon SageMaker built-in object detection model and deploy it on edge devices like NVIDIA Nano, TX2, and Xavier via AWS IoT Greengrass and run inferences on them. You can then use these detections as input to tracking algorithms that help track and monitor animals.
[]One of the applications of tracking animals is to count them. Counting pigs can be hard; they move quickly, they turn around, they all look the same! Three AWS experts tried to count pigs manually, and all three got different answers. Computer vision and ML at the edge can increase efficiency and accuracy for livestock management. The most important customer benefit is getting consistent, accurate, near-real-time livestock counts to support sound economic decisions such as feed and weight management that can optimize revenue gain or reduce revenue loss. Secondly, reducing or eliminating manual counting tasks allows workers to focus on higher-value tasks such as animal care. This increases operational efficiency and product quality. Apart from the agriculture industry, this has applications in monitoring wildlife as well.
[]We look at how to set up an edge device, (in this case, a NVIDIA Jetson Xavier) with AWS IoT Greengrass. After we have trained the model, we deploy it to this device. We then run inference at the edge to count how many animals are in a given image. You can feed a live camera stream to the system, where you can use object detection outputs combined with a tracker to count animals in real time. We go over the following sections to take you through creating this application:
Prepare your dataset.
Use the Amazon SageMaker built-in object detection model.
Optimize the model for the edge device.
Package the model for the edge.
Deploy the models to the edge.
Build an AWS IoT Greengrass application for running inference and counting using an AWS Lambda function.
Set up the edge device.
Run the application at the edge to perform inference.
[]The following diagram shows a high-level architecture of the components that reside on the farm and how they interact with the AWS services in the cloud.
[]
Prepare the dataset
[]For this post, we gather videos of the livestock from multiple farms with enough lighting and diverse floors. When you collect videos for training your model, use a ceiling-mounted camera that covers the whole alley where the livestock are transferred. Split those videos into frames, and use Amazon SageMaker Ground Truth to create annotations with the help of Amazon Mechanical Turk. The following is an example of an annotated pigs image.
[]
[]Example notebooks available for dataset creation are on the GitHub repo. You can use data augmentation techniques to increase your dataset for training.
Build a livestock detection model
[]We use the SageMaker built-in object detection model and train it on a dataset of pigs. One of the biggest challenges in livestock is crowding. To make the model learn such scenes, we recommend experimenting with bounding boxes around the heads instead of entire bodies.
[]When we have the annotated dataset, we can use the built-in object detection model that uses the Single Shot multibox Detector (SSD) algorithm. The following example notebook illustrates how to do this, and you can pass in your livestock dataset to create an ML model.
Optimize the model for the edge
[]We use Amazon SageMaker Neo to optimize the model to the target device—in this case, the Jetson Xavier. Neo automatically optimizes ML models for inference on cloud instances and edge devices to run faster with no loss in accuracy. You start with an ML model already built with DarkNet, Keras, MXNet, PyTorch, TensorFlow, TensorFlow-Lite, ONNX, or XGBoost and trained in SageMaker or anywhere else. Then you choose your target hardware platform, which can be a SageMaker hosting instance or an edge device based on processors from Ambarella, Apple, ARM, Intel, MediaTek, Nvidia, NXP, Qualcomm, RockChip, Texas Instruments, or Xilinx. With a single click, Neo optimizes the trained model and compiles it into an executable. The compiler uses an ML model to apply the performance optimizations that extract the best available performance for your model on the cloud instance or edge device. You then deploy the model as a SageMaker endpoint or on supported edge devices and start making predictions.
Package the model for the edge
[]After you optimize the model for the edge device, we can use Edge Manager to optimize, secure, monitor, and maintain ML models on fleets of smart cameras, robots, personal computers, and mobile devices.
[]Edge Manager provides a software agent that runs on edge devices. The agent comes with an ML model optimized with Neo automatically, so you don’t need Neo runtime installed on your devices to take advantage of the model optimizations. The agent also can collect prediction data and send a sample of the data to the cloud for monitoring, labeling, and retraining so you can keep models accurate over time. You can view all the data on the Edge Manager dashboard, which reports on the operation of deployed models.
[]Because Edge Manager enables you to manage models separately from the rest of the application, you can update the model and the application independently, reducing costly downtime and service disruptions. Edge Manager also cryptographically signs your models so you can verify that it wasn’t tampered with as it moves from the cloud to edge devices.
Deploy the models to the edge
[]You can then deploy the packaged models and their business applications that use these models using AWS IoT Greengrass, an open-source IoT edge runtime and cloud service that lets you quickly and easily build intelligent device software.
[]AWS IoT Greengrass Version 2 is a new major version release of AWS IoT Greengrass. You can add or remove pre-built software components based on your use cases, configured specifically for your target device’s CPU, GPU, and memory resources. For example, you can choose to include only prebuilt AWS IoT Greengrass components, such as stream manager, when you need to process data streams with your application. When you want to perform ML inference locally on your devices, you can also include ML components, such as the public Edge Manager component provided by AWS, or a custom component containing your ML model. By decoupling your ML model and inference client code, you can quickly swap out different model versions without having to update application code or re-deploy your entire solution. The following GitHub repository shows an example of how to easily deploy an ML model, Edge Manager, and AWS IoT Greengrass Lambda function using AWS IoT Greengrass V2 custom components.
[]In this example, we deploy three components, as illustrated in the following diagram: the public Edge Manager component, a custom component that contains our Python inference client code, and another custom component wrapping our ML model.
[]The Edge Manager component downloads, installs, and runs an Edge Manager binary agent specific to our OS and platform architecture. When the agent is running, a gRPC-based service enables clients to manage models through a collection of APIs. With requests to the APIs, the client can load, unload, and describe models; run predictions with raw bytes or a SharedMemoryHandle of a multi-dimensional tensor array; and upload input and output tensors to the cloud. Clients can easily communicate with these APIs using the proto file available as part of the Edge Manager release artifacts.
[]
Build an AWS IoT Greengrass V2 application using a Lambda function
[]To run your business logic at the edge, we package the code as an AWS IoT Greengrass Lambda function that runs indefinitely on the edge device. For example, the following Lambda function counts the number of livestock in a still image. You can also extend this to do object tracking in videos using tracking algorithms like correlation tracker, CSRT, GOTURN, KCF, and so on. See the following code:
import grpc import cv2 import numpy as np import agent_pb2_grpc from agent_pb2 import (ListModelsRequest, LoadModelRequest, PredictRequest, UnLoadModelRequest, DescribeModelRequest, Tensor, TensorMetadata) import logging import sys import os import traceback import time # Setup logging to stdout logger = logging.getLogger(__name__) logging.basicConfig(stream=sys.stdout, level=logging.DEBUG) #use connected Camera 1 cap = cv2.VideoCapture(1, cv2.CAP_V4L) #Or pass in a video file cap = cv2.VideoCapture(““) model_url = ‘/greengrass/v2/work/com.model.SageMaker.resnet/’ #Path to the ML Model component model_name = ‘demo-SageMaker-ssd-resnet’ tensor_name = ‘data’ SHAPE = 512 tensor_shape = [1, 3, SHAPE, SHAPE] object_categories = [‘PigHead’] NMS_THRES = 0.2 print(“This requires SageMaker Edge Manager Agent to be running. We will wait for the component to be spun up!”) time.sleep(10) print(“Waking up..”) channel = grpc.insecure_channel(‘unix:///tmp/SageMaker_edge_agent_example.sock’) print(‘getting stubs!’) edge_manager_client = agent_pb2_grpc.AgentStub(channel) print(‘calling LoadModel!’) try: response = edge_manager_client.LoadModel( LoadModelRequest(url=model_url, name=model_name)) time.sleep(5) except Exception as e: print(e) print(‘model already loaded!’) print(‘calling ListModels!’) response = edge_manager_client.ListModels(ListModelsRequest()) print(‘calling DescribeModel’) response = edge_manager_client.DescribeModel( DescribeModelRequest(name=model_name)) def greengrass_pig_couting_application_run(): global edge_manager_client print(‘running now!’) try: if not cap.isOpened(): print(“Cannot open cameran”) exit(1) while True: ret, img = cap.read() print(‘calling PredictRequest on frames from Video stream !’) #resize input before serving frame = resize_short_within(img, short=512) nn_input_size = 512 frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) nn_input=cv2.resize(frame, (nn_input_size,nn_input_size)) copy_frame = nn_input[:] nn_input = np.swapaxes(nn_input, 0, 2) nn_input = np.swapaxes(nn_input, 1, 2) nn_input = nn_input[np.newaxis, :] copy_frame_display = frame nn_input=nn_input.astype(‘float32’) scaled_frame=nn_input print(“SHAPE:” + str(img.shape)) #Create PredictRequest request = PredictRequest(name=model_name, tensors=[Tensor(tensor_metadata=TensorMetadata( name=tensor_name, data_type=5, shape=tensor_shape), byte_data=scaled_frame.tobytes())]) #Call Predict response = edge_manager_client.Predict(request) #Parse outputs i = 0 test_detections = [] for t in response.tensors: print(“Flattened RAW Output Tensor : ” + str(i+1)) i += 1 deserialized_bytes = np.frombuffer(t.byte_data, dtype=np.float32) test_detections.append(np.asarray(deserialized_bytes)) test_detections = np.array(test_detections) #Resize the output flattened tensor based on the output shape of the model test_detections.resize(6132, 6) dets = test_detections width,height = 512,512 for i in range(len(dets)): cls_id = int(dets[i, 0]) if cls_id >= 0: score = dets[i, 1] if score > NMS_THRES: xmin = int(dets[i, 2] * width) ymin = int(dets[i, 3] * height) xmax = int(dets[i, 4] * width) ymax = int(dets[i, 5] * height) class_name = str(cls_id) response.append((class_name, xmin, ymin, xmax, ymax)) bounding_box = np.array( [xmin, ymin, xmax, ymax] ) viz_box = bounding_box.astype(‘int’) cv2.rectangle( img, (viz_box[0], viz_box[1]), (viz_box[2], viz_box[3]), (0, 255, 0), 2) #save outputs save_path = os.path.join(os.getcwd(), “./”, “output.jpg”) cv2.imwrite(save_path, copy_frame, [int(cv2.IMWRITE_JPEG_QUALITY), 100]) #Visualize if necessary window_handle = cv2.namedWindow(“Results”, cv2.WINDOW_AUTOSIZE) cv2.imshow(“Results”, copy_frame) keyCode = cv2.waitKey(30) & 0xFF # Stop the program on the ESC key if keyCode == 27: break cap.release() cv2.destroyAllWindows() except Exception as e: traceback.print_exc() channel.close() logger.error(“Failed to Run: ” + repr(e)) def _get_interp_method(interp, sizes=()): “””Get the interpolation method for resize functions. The major purpose of this function is to wrap a random interp method selection and a auto-estimation method. Parameters ———- interp : int interpolation method for all resizing operations Possible values: 0: Nearest Neighbors Interpolation. 1: Bilinear interpolation. 2: Area-based (resampling using pixel area relation). It may be a preferred method for image decimation, as it gives moire-free results. But when the image is zoomed, it is similar to the Nearest Neighbors method. (used by default). 3: Bicubic interpolation over 4×4 pixel neighborhood. 4: Lanczos interpolation over 8×8 pixel neighborhood. 9: Cubic for enlarge, area for shrink, bilinear for others 10: Random select from interpolation method metioned above. Note: When shrinking an image, it will generally look best with AREA-based interpolation, whereas, when enlarging an image, it will generally look best with Bicubic (slow) or Bilinear (faster but still looks OK). More details can be found in the documentation of OpenCV, please refer to http://docs.opencv.org/master/da/d54/group__imgproc__transform.html. sizes : tuple of int (old_height, old_width, new_height, new_width), if None provided, auto(9) will return Area(2) anyway. Returns ——- int interp method from 0 to 4 “”” if interp == 9: if sizes: assert len(sizes) == 4 oh, ow, nh, nw = sizes if nh > oh and nw > ow: return 2 elif nh < oh and nw < ow: return 3 else: return 1 else: return 2 if interp == 10: return random.randint(0, 4) if interp not in (0, 1, 2, 3, 4): raise ValueError('Unknown interp method %d' % interp) return interp def resize_short_within(img, short=512, max_size=1024, mult_base=32, interp=2): """ resizes the short side of the image so the aspect ratio remains the same AND the short side matches the convolutional layer for the network Args: ----- img: np.array image you want to resize short: int the size to reshape the image to max_size: int the max size of the short side mult_base: int the size scale to readjust the resizer interp: int see '_get_interp_method' Returns: -------- img: np.array the resized array """ h, w, _ = img.shape im_size_min, im_size_max = (h, w) if w > h else (w, h) scale = float(short) / float(im_size_min) if np.round(scale * im_size_max / mult_base) * mult_base > max_size: # fit in max_size scale = float(np.floor(max_size / mult_base) * mult_base) / float(im_size_max) new_w, new_h = ( int(np.round(w * scale / mult_base) * mult_base), int(np.round(h * scale / mult_base) * mult_base) ) img = cv2.resize(img, (new_w, new_h), interpolation=_get_interp_method(interp, (h, w, new_h, new_w))) return img # Start executing the function above greengrass_pig_couting_application_run() # This is a dummy handler and will not be invoked # Instead the code above will be executed in an infinite loop for our example def function_handler(event, context): return
Set up the edge device
[]For this demonstration, we use an NVIDIA Jetson Xavier NX development kit as our target edge device. To get our device communicating with AWS, we installed AWS IoT Greengrass V2 runtime software over SSH, which doesn’t require the device to have local AWS credentials, or the AWS Command Line Interface (AWS CLI) installed. Another major benefit of AWS IoT Greengrass V2 is that device provisioning to AWS IoT Core is built in.
[]From our development workstation (which has the AWS CLI installed along with a configured named profile), we can use the provided script to SSH into the target device, install necessary prerequisites (like OpenJDK), download the latest release of AWS IoT Greengrass V2 runtime software, and kick off the AWS IoT Greengrass installer.
[]The installer first installs the AWS IoT Greengrass nucleus component, the only mandatory component and the minimum requirement to run AWS IoT Greengrass V2 on a device. Next, the installer creates an IoT thing registered as an AWS IoT Core device in AWS IoT Greengrass V2, and subsequently downloads a root CA certificate, private key, and X509 device certificate used to communicate to the AWS Cloud over a TLS connection. The installer then associates the AWS IoT Greengrass Core device to an AWS IoT thing group, a group of AWS IoT things that we use as a target for our AWS IoT Greengrass deployments.
[]Finally, the installer creates a device role alias and associated AWS Identity and Access Management (IAM) role, which the AWS IoT Greengrass Core device can use to request temporary credentials for accessing permitted AWS resources not accessible through MQTT.
Run inference
[]After installation is complete on the device, we can configure a deployment of specified components to our target device running the AWS IoT Greengrass V2 runtime software. To try this out, follow the instructions in the README.md. Example code and setup scripts are available in the GitHub repo
Conclusion
[]In this post, we shared with you an art-of-the-possible computer vision solution that you can build using a combination of AWS services. There is so much potential for using such a solution to improve yield, throughput, unit margin, and customer satisfaction even in other industries like retail, automotive, manufacturing, and supply chain.
[]If your use case requires or is able to use computer vision in the cloud, check out Amazon Lookout for Vision and Amazon Rekognition Custom Labels. Reach out to us if you have any feedback or want to share how this helped your ML and IoT journey with the help of AWS.
Related resources
[]To related resources, see the following:
About the Authors
[] Pavan Kumar Sunder is a Senior Solutions Architect with the Envision Engineering team at Amazon Web Services. He provides technical guidance and helps customers accelerate their ability to innovate through showing the art of the possible on AWS. He has built multiple prototypes around AI/ML, IoT, and robotics for our customers.
[]
[]
[]Jon Slominski is a Sr. Prototyping Architect with the Americas Envision Engineering team at AWS. Building prototypes focused on IoT, AI/ML, and robotics, Jon helps customers innovate and envision the art of the possible. Outside of work, Jon enjoys spending time and traveling with his wife and daughters.
Accelerate custom labeling workflows in Amazon SageMaker Ground Truth without using AWS Lambda
Amazon SageMaker Ground Truth enables the creation of high-quality, large-scale training datasets, essential for fine-tuning across a wide range of applications, including large language models (LLMs) and generative AI. By integrating human annotators with machine learning, SageMaker Ground Truth significantly reduces the cost and time required for data labeling. Whether it’s annotating images, videos, or…
Amazon SageMaker Ground Truth enables the creation of high-quality, large-scale training datasets, essential for fine-tuning across a wide range of applications, including large language models (LLMs) and generative AI. By integrating human annotators with machine learning, SageMaker Ground Truth significantly reduces the cost and time required for data labeling. Whether it’s annotating images, videos, or […]
Amazon Aurora PostgreSQL Limitless Database is now generally available
Aurora PostgreSQL Limitless enables massive horizontal scaling for write throughput and storage by distributing workloads across multiple Aurora instances while using standard PostgreSQL queries and syntax. Source
Aurora PostgreSQL Limitless enables massive horizontal scaling for write throughput and storage by distributing workloads across multiple Aurora instances while using standard PostgreSQL queries and syntax.
Create a generative AI–powered custom Google Chat application using Amazon Bedrock
AWS offers powerful generative AI services, including Amazon Bedrock, which allows organizations to create tailored use cases such as AI chat-based assistants that give answers based on knowledge contained in the customers’ documents, and much more. Many businesses want to integrate these cutting-edge AI capabilities with their existing collaboration tools, such as Google Chat, to…
AWS offers powerful generative AI services, including Amazon Bedrock, which allows organizations to create tailored use cases such as AI chat-based assistants that give answers based on knowledge contained in the customers’ documents, and much more. Many businesses want to integrate these cutting-edge AI capabilities with their existing collaboration tools, such as Google Chat, to […]