Preview: Amazon Security Lake – A Purpose-Built Customer-Owned Data Lake Service
To identify potential security threats and vulnerabilities, customers should enable logging across their various resources and centralize these logs for easy access and use within analytics tools. Some of these data sources include logs from on-premises infrastructure, firewalls, and endpoint security solutions, and when utilizing the cloud, services such as Amazon Route 53, AWS CloudTrail,…
To identify potential security threats and vulnerabilities, customers should enable logging across their various resources and centralize these logs for easy access and use within analytics tools. Some of these data sources include logs from on-premises infrastructure, firewalls, and endpoint security solutions, and when utilizing the cloud, services such as Amazon Route 53, AWS CloudTrail, and Amazon Virtual Private Cloud (Amazon VPC).
The Amazon Simple Storage Service (Amazon S3) and AWS Lake Formation simplify the creation and management of a data lake on AWS. But, some customers’ security teams still struggle to define and implement security domain–specific aspects, such as data normalization, which requires them to analyze each log source’s structure and fields, define schemas and mappings, and pull in data enrichment such as threat intelligence.
Today we are announcing the preview release of Amazon Security Lake, a purpose-built service that automatically centralizes an organization’s security data from cloud and on-premises sources into a purpose-built data lake stored in your account. Amazon Security Lake automates the central management of security data, normalizing from integrated AWS services and third-party services and managing the lifecycle of data with customizable retention and also automates storage tiering.
Here are the key features of Amazon Security Lake:
Data transformation and normalization – Security Lake automatically partitions and converts incoming log data to a storage and query-efficient Apache Parquet and OCSF format, making the data broadly and immediately usable for security analytics without the need for post-processing. Security Lake supports integrations with analytics partners such as IBM, Splunk, Sumo Logic, and more to address a variety of security use cases such as threat detection, investigation, and incident response.
Customizable data access levels – You can configure the level of subscribers consuming data stored in the Security Lake, such as specific data sources for data access to all new objects or directly querying data stored. You can also specify a rollup Region that the Security Lake is available in and multiple AWS accounts across your AWS Organizations. This can help you comply with data residency compliance requirements.
By reducing the operational overhead of security data management, you can make it easier to gather more security signals from across your organization and analyze that data to improve the protection of your data, applications, and workloads.
Configure Your Security Lake for Collection Data To get started with Amazon Security Lake, choose Get started in the AWS console. You can enable log and event sources for all Regions and all accounts.
You can select log and event sources such as CloudTrail logs, VPC flow logs, and Route53 resolver logs into your data lake. Select Regions will contribute their data to your data lake with the Amazon S3-managed encryption that Amazon S3 will create and manage all encryption keys, as well as the specific AWS accounts in your organizations.
Next, you can select rollup and contributing Regions. All aggregated data from contributing Regions reside in the rollup Region. You can create multiple rollup Regions, which can help you comply with data residency compliance requirements. Optionally, you can define the Amazon S3 storage classes and the retention period you want the data to transition from the standard Amazon S3 storage classes used in Security Lake.
After initial configuration, choose Sources in the left pane of the console if you can add or remove log sources in your Regions or account.
You can also collect data from custom sources, such as Bind DNS logs, endpoint telemetry logs, on-premise Netflow logs, and so on. Before adding a custom source, you need to create AWS IAM role to grant permissions for AWS Glue.
To create a custom data source, choose Create custom source in the left menu of Custom sources.
It requires you to enter the IAM role Amazon Resource Names (ARNs) to write data to Security Lake and invoke AWS Glue on your behalf. Then, you can provide details about your custom source.
For efficient data processing and querying, objects from your custom sources should be partitioned by AWS Region, AWS account, year, month, day, and hour with a Parquet-formatted object.
Consume Your Data from Security Lake Now you can create a subscriber, a service that consumes logs and events from Security Lake. To add or see your subscribers, choose Subscribers in the left pane of the console.
The Security Lake supports two types of subscriber data access methods:
Data access (Amazon S3) – Subscribers are notified of new objects for a source as the data is written to your Security Lake S3 bucket. You can choose to notify subscribers of new objects with an Amazon Simple Queue Service (Amazon SQS) queue or through messaging to an HTTPS endpoint provided by the subscriber. This type is useful to ingest selected data in your analytics application—good for use cases that require frequent access to data.
Query access (Lake Formation) – Subscribers can consume data by directly querying AWS Lake Formation tables in your S3 bucket through services like Amazon Athena. This type is useful to provide on-demand query access to data without the need to pre-ingest anything and for use cases that require infrequent access or on large volume sources too expensive to ingest upfront or retain in analytics tools.
When you add a subscriber, you can choose Amazon S3 to create data access for the subscriber. If you select the default method of notification, you can receive the following object notification message in either an HTTPS endpoint or Amazon SQS.
Subscribers with query access can directly query data that is stored in Security Lake by using services like Amazon Athena and other services that can read from AWS Lake Formation. The following are sample queries of CloudTrail data.
SELECT time, api.service.name, api.operation, api.response.error, api.response.message, src_endpoint.ip FROM ${athena_db}.${athena_table} WHERE eventHour BETWEEN ‘${query_start_time}’ and ‘${query_end_time}’ AND api.response.error in ( ‘Client.UnauthorizedOperation’, ‘Client.InvalidPermission.NotFound’, ‘Client.OperationNotPermitted’, ‘AccessDenied’) ORDER BY time desc LIMIT 25
Subscribers only have access to source data in the AWS Region that you’ve selected when you create the subscriber. To give a subscriber access to data from multiple Regions, you can set the Region where you create your subscriber as a rollup Region.
Third-Party Integrations For supported third-party integrations, there are a number of sources as well as subscribing services integrated with Amazon Security Lake.
You can also use third-party security, automation, and analytics tools supporting Security Lake, including Datadog, IBM, Rapid7, Securonix, SentinelOne, Splunk, Sumo Logic, and Trellix. There are also service partners such as Accenture, Atos, Deloitte, DXC, Kyndryl, PWC, Rackspace, and Wipro that can work with you and Amazon Security Lake.
Join the Preview The preview release of Amazon Security Lake is now available in the US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), and Europe (Ireland) Regions.
Create your fashion assistant application using Amazon Titan models and Amazon Bedrock Agents
In this post, we implement a fashion assistant agent using Amazon Bedrock Agents and the Amazon Titan family models. The fashion assistant provides a personalized, multimodal conversational experience. Source
In this post, we implement a fashion assistant agent using Amazon Bedrock Agents and the Amazon Titan family models. The fashion assistant provides a personalized, multimodal conversational experience.
Implement model-independent safety measures with Amazon Bedrock Guardrails
In this post, we discuss how you can use the ApplyGuardrail API in common generative AI architectures such as third-party or self-hosted large language models (LLMs), or in a self-managed Retrieval Augmented Generation (RAG) architecture. Source
In this post, we discuss how you can use the ApplyGuardrail API in common generative AI architectures such as third-party or self-hosted large language models (LLMs), or in a self-managed Retrieval Augmented Generation (RAG) architecture.
Visier’s data science team boosts their model output 10 times by migrating to Amazon SageMaker
In this post, we learn how Visier was able to boost their model output by 10 times, accelerate innovation cycles, and unlock new opportunities using Amazon SageMaker. Source
In this post, we learn how Visier was able to boost their model output by 10 times, accelerate innovation cycles, and unlock new opportunities using Amazon SageMaker.