New – Additional Checksum Algorithms for Amazon S3
Amazon Simple Storage Service (Amazon S3) is designed to provide 99.999999999% (11 9s) of durability for your objects and for the metadata associated with your objects. You can rest assured that S3 stores exactly what you PUT, and returns exactly what is stored when you GET. In order to make sure that the object is…
Amazon Simple Storage Service (Amazon S3) is designed to provide 99.999999999% (11 9s) of durability for your objects and for the metadata associated with your objects. You can rest assured that S3 stores exactly what you PUT, and returns exactly what is stored when you GET. In order to make sure that the object is transmitted back-and-forth properly, S3 uses checksums, basically a kind of digital fingerprint.
S3’s PutObject function already allows you to pass the MD5 checksum of the object, and only accepts the operation if the value that you supply matches the one computed by S3. While this allows S3 to detect data transmission errors, it does mean that you need to compute the checksum before you call PutObject or after you call GetObject. Further, computing checksums for large (multi-GB or even multi-TB) objects can be computationally intensive, and can lead to bottlenecks. In fact, some large S3 users have built special-purpose EC2 fleets solely to compute and validate checksums.
New Checksum Support Today I am happy to tell you about S3’s new support for four checksum algorithms. It is now very easy for you to calculate and store checksums for data stored in Amazon S3 and to use the checksums to check the integrity of your upload and download requests. You can use this new feature to implement the digital preservation best practices and controls that are specific to your industry. In particular, you can specify the use of any one of four widely used checksum algorithms (SHA-1, SHA-256, CRC-32, and CRC-32C) when you upload each of your objects to S3.
Here are the principal aspects of this new feature:
Object Upload – The newest versions of the AWS SDKs compute the specified checksum as part of the upload, and include it in an HTTP trailer at the conclusion of the upload. You also have the option to supply a precomputed checksum. Either way, S3 will verify the checksum and accept the operation if the value in the request matches the one computed by S3. In combination with the use of HTTP trailers, this feature can greatly accelerate client-side integrity checking.
Multipart Object Upload – The AWS SDKs now take advantage of client-side parallelism and compute checksums for each part of a multipart upload. The checksums for all of the parts are themselves checksummed and this checksum-of-checksums is transmitted to S3 when the upload is finalized.
Checksum Storage & Persistence – The verified checksum, along with the specified algorithm, are stored as part of the object’s metadata. If Server-Side Encryption with KMS Keys is requested for the object, then the checksum is stored in encrypted form. The algorithm and the checksum stick to the object throughout its lifetime, even if it changes storage classes or is superseded by a newer version. They are also transferred as part of S3 Replication.
Checksum Retrieval – The new GetObjectAttributes function returns the checksum for the object and (if applicable) for each part.
Checksums in Action You can access this feature from the AWS Command Line Interface (CLI), AWS SDKs, or the S3 Console. In the console, I enable the Additional Checksums option when I prepare to upload an object:
Then I choose a Checksum function:
If I have already computed the checksum I can enter it, otherwise the console will compute it.
After the upload is complete I can view the object’s properties to see the checksum:
The checksum function for each object is also listed in the S3 Inventory Report.
From my own code, the SDK can compute the checksum for me:
with open(file_path, ‘rb’) as file: r = s3.put_object( Bucket=bucket, Key=key, Body=file, ChecksumAlgorithm=’sha1′ )
Or I can compute the checksum myself and pass it to put_object:
with open(file_path, ‘rb’) as file: r = s3.put_object( Bucket=bucket, Key=key, Body=file, ChecksumSHA1=’fUM9R+mPkIokxBJK7zU5QfeAHSy=’ )
When I retrieve the object, I specify checksum mode to indicate that I want the returned object validated:
r = s3.get_object(Bucket=bucket, Key=key, ChecksumMode=’ENABLED’)
The actual validation happens when I read the object from r[‘Body’], and an exception will be raised if there’s a mismatch.
Watch the Demo Here’s a demo (first shown at re:Invent 2021) of this new feature in action:
Available Now The four additional checksums are now available in all commercial AWS Regions and you can start using them today at no extra charge.
Reducing hallucinations in LLM agents with a verified semantic cache using Amazon Bedrock Knowledge Bases
This post introduces a solution to reduce hallucinations in Large Language Models (LLMs) by implementing a verified semantic cache using Amazon Bedrock Knowledge Bases, which checks if user questions match curated and verified responses before generating new answers. The solution combines the flexibility of LLMs with reliable, verified answers to improve response accuracy, reduce latency,…
This post introduces a solution to reduce hallucinations in Large Language Models (LLMs) by implementing a verified semantic cache using Amazon Bedrock Knowledge Bases, which checks if user questions match curated and verified responses before generating new answers. The solution combines the flexibility of LLMs with reliable, verified answers to improve response accuracy, reduce latency, and lower costs while preventing potential misinformation in critical domains such as healthcare, finance, and legal services.
Orchestrate an intelligent document processing workflow using tools in Amazon Bedrock
This intelligent document processing solution uses Amazon Bedrock FMs to orchestrate a sophisticated workflow for handling multi-page healthcare documents with mixed content types. The solution uses the FM’s tool use capabilities, accessed through the Amazon Bedrock Converse API. This enables the FMs to not just process text, but to actively engage with various external tools…
This intelligent document processing solution uses Amazon Bedrock FMs to orchestrate a sophisticated workflow for handling multi-page healthcare documents with mixed content types. The solution uses the FM’s tool use capabilities, accessed through the Amazon Bedrock Converse API. This enables the FMs to not just process text, but to actively engage with various external tools and APIs to perform complex document analysis tasks.