In this blog post, we explore how centralized AWS Security Lake data can be transformed and streamed into Microsoft Sentinel using a Lambda-based pipeline for near real-time ingestion.
A story of cost, control, and custom engineering at cloud scale
Every organization strives to design its security architecture in a way that can be designed to help address architectural complexity and support cost‑aware, scalable designs, depending on workload characteristics and implementation choices.
Across multiple customer environments, we increasingly see a consistent pattern emerge—security telemetry from various AWS services is not ingested in isolation, but is deliberately centralized into Amazon Security Lake. This approach reflects a maturity in design, where organizations move beyond service-level integrations and instead adopt a unified data strategy.
Amazon Security Lake enables this by aggregating security data from services such as Route53, WAF, Kubernetes, and others into a centralized, governed repository. The data is normalized and stored in an open, analytics-friendly format, often leveraging Apache Parquet, a columnar storage format optimized for large-scale processing and cost-efficient storage. This can allow organizations to retain high volumes of security data while maintaining performance and may help optimize storage efficiency in certain scenarios, depending on data volume, retention policies, and analytics patterns.
However, this architectural choice introduces a new consideration.
Microsoft Sentinel, when integrated into such environments, typically expects ingestion through connector-driven pipelines and streaming event models. In contrast, Security Lake represents a batch-oriented, schema-driven data platform. Rather than treating this as a constraint, it becomes an opportunity to rethink how data should flow between these systems.
In this blog, we explore how a streaming bridge architecture can be implemented to align Amazon Security Lake with Microsoft Sentinel’s ingestion model.
The approach leverages a combination of AWS Lambda and event-driven patterns to process data as it lands in Amazon S3, transforms it into a Sentinel-compatible format, and streams it through Azure Event Hub into Microsoft Sentinel using Data Collection Rules (DCRs) and Data Collection Endpoints (DCEs).
This approach can support lower‑latency ingestion patterns when configured appropriately and compared to batch‑only processing models while preserving the lake-first architecture, allowing organizations to support analytics, visualization, and threat‑hunting activities using the ingested data.
For demonstration, this implementation focuses on ingesting the following data sources from Amazon Security Lake:
- Amazon EKS audit and runtime events
- Route 53 DNS query logs
- AWS WAF access logs
- AWS Lambda execution activity
- Amazon S3 access events
Note: The solution and code provided in this blog are not an officially supported Microsoft solution and do not guarantee performance, reliability, availability, or support. No service-level agreements (SLAs) are included. Readers are responsible for validating suitability for their environment.
Before we begin, let us briefly discuss Amazon Security Lake and the Parquet format.
Amazon Security Lake and the Parquet Constraint
Amazon Security Lake provides a centralized, S3-backed repository for security telemetry, addressing fragmentation across services, accounts, and regions. Instead of service-level ingestion pipelines, logs from sources such as EKS, Route 53, WAF, Lambda, and S3 are aggregated into a single, governed data layer, enabling consistent visibility, separation of duties, and cost-efficient storage at scale.
This data is stored in Apache Parquet, a columnar format optimized for analytics—delivering high compression, schema evolution, and efficient, selective reads across engines like Athena and Spark.
Microsoft Sentinel operates on a streaming ingestion model expecting JSON payloads, source-specific pipelines, and continuous event flows.
In lake-first architectures, reintroducing service-level ingestion is neither practical nor efficient. The requirement, therefore, is to bridge the two models—preserving Parquet for storage while enabling event-driven ingestion at the point of consumption.
In the following sections we will create an Event Hub, Configure the Lambda function and once the streaming is configured in event hub we will configure the Data Collection Rules, Data Collection endpoints, Data Collection Rule associations to configure the ingestion pipeline from Event hub to Sentinel.
Pre-requisites:
- Log Analytics workspace where you have at least contributor rights.
- Your Log Analytics workspace needs to be linked to a dedicated cluster or to have a commitment tier.
- Event Hubs namespace that permits public network access. If public network access is disabled, ensure that "Allow trusted Microsoft services to bypass this firewall" is set to "Yes."
- Event hubs with events flowing in. In this implementation, events are sent to Event Hubs by the AWS Lambda function configured in the steps below - no manual event sending is required.
- Appropriate roles in AWS accounts to configure SQS queue, Lambda function, IAM policies, etc.
The Event Hub
To begin, we need to create an Event Hub.
Azure Event Hubs is a fully managed, high-throughput event ingestion and streaming platform, designed to support high event volumes within documented service limits, subject to configuration and tier selection. Azure Event Hubs Documentation
SKUs (Standard vs Premium)
- Standard tier is a throughput-unit (TU) based model, where capacity is explicitly controlled and shared across the namespace.
- Premium tier provides isolated compute and memory via processing units (PU), which can provide more consistent performance characteristics and higher throughput capacity compared to shared models, depending on workload
- . Event Hubs Scalability, Event Hub Tiers
Additionally, Azure Event Hubs offers a Dedicated tier, which is a fully isolated, single-tenant cluster for enterprise-scale workloads with higher throughputs (at significantly higher cost). Event hub Dedicated Tier
Throughput Characteristics (Standard Tier)
- A single Throughput Unit (TU) provides:
- Ingress: up to 1 MB/s
- Egress: up to 2 MB/s Event Hubs Scalability
- A Standard namespace can scale to a maximum of 40 TUs, giving:
- Max ingress: 40 MB/s
- Max egress: 80 MB/s Event Hubs Scalability
All event hubs, partitions, and consumers within the namespace share this TU capacity, making it a central ingestion buffer for streaming pipelines rather than a per-source scaling model. Event Hubs Scalability
Which SKU to choose: For ingress/egress up to 40/80 MB/s, a Standard SKU may be suitable. Higher volumes may warrant consideration of Premium, depending on workload requirements.
Azure Event Hubs Concepts:
Event Hub Namespace: A logical container that provides the endpoint, networking boundary, and shared throughput capacity (TUs/PUs) for all event hubs within it.
Event Hub: An individual event stream (topic) within the namespace where events are ingested, stored, and read in a partitioned, ordered manner for parallel processing.
Reference: Event Hubs features and terminology - Azure Event Hubs
Creating Event Hub namespace, Event Hub entities, and considerations:
- Create the Azure Event Hub Namespace, in this example, we create a Standard Event Hub with minimum 1 TU with Auto Inflate on enabling scaling upto the maximum 40 TUs. Note the maximum Throughput Units should be based on the size of the logs expected per second from Amazon Security Lake. Since the Event Hub will be used to ingest data in Azure Monitor (Sentinel Workspace) please check the Supported Regions.
Ensure region alignment with Sentinel. Configure TU based on expected ingestion rate.
- Once the Event Hub is created, enable the Local Authentication, since the AWS Lambda code we are using (discussed in the next section) uses Shared access signature to connect to the Azure Event Hub. See Shared Access Signatures.
- Create individual Event Hubs within the namespace created in Step 1. One Event Hub is required per log type—for example, if Amazon Security Lake includes EKS, Route 53, and WAF logs, then three Event Hub entities should be created.
Why this matters:
-
-
Easier DCR mapping
- Avoids schema conflicts
-
- Create a consumer group in each event hub we created in the previous step. Consumer Group is an independent view of an event stream that allows multiple applications to read the same events separately, each maintaining its own position (offset) in the stream. Event Hub Features
Avoid using the $Default consumer group.
With the Event Hub namespace, entities, and consumer groups in place, the receiving end of the pipeline is ready. The next step is to configure the AWS Lambda function that will translate Security Lake's Parquet files into the JSON events that Event Hub expects.
The AWS Lambda Function (SQS-driven Parquet → JSON)
In this architecture, AWS Lambda is the translation layer, not an ingestion source. Instead of being invoked directly by S3, Security Lake emits S3 object‑creation notifications into Amazon SQS, and SQS becomes the Lambda trigger. This decoupling is intentional: SQS absorbs bursts of newly written Parquet objects which can help increase resilience of the ingestion pipeline under bursty or variable workloads.
Once triggered, the Lambda function processes each queued S3 notification end‑to‑end: it derives the log type from the S3 object key, downloads the Parquet file, converts each row into a discrete JSON event, and forwards the resulting events to Azure Event Hub in batches for downstream ingestion via DCR/DCE.
A practical consideration is event size management. Some sources—especially EKS audit telemetry—can carry large, nested fields that are not always useful for Sentinel analytics. For those log types, the Lambda function drops non‑essential fields during transformation to keep each event within Azure ingestion constraints; oversized fields can exceed the 64 KB Azure Monitor field size limit and disrupt ingestion (Fields more than 64 KB will be truncated in Log Analytics Workspace).
Solution Architecture
Data flows from AWS → converted (Lambda) → sent (EventHub)→ stored in SentinelThe end-to-end flow operates as follows:
- Amazon Security Lake writes Parquet files to a centralized S3 bucket as logs arrive from source services (CloudTrail, EKS, VPC Flow, WAF, Route 53, and others).
- Amazon SQS receives S3 event notifications from Security Lake and queues them as Lambda triggers.
- AWS Lambda picks up each SQS message, identifies the log source from the S3 object key, downloads the Parquet file, converts each row to JSON, and forwards the events to Azure Event Hub.
- Azure Event Hub receives the JSON events and makes them available for ingestion into Microsoft Sentinel via a Data Collection Rule (DCR).
- Microsoft Sentinel ingests the data into a custom log table, where it is available for detection rules, hunting queries, and dashboards.
The full Lambda function code is available in the GithubRepository-LambdaFunction. Refer to the readme for more details.
How the Lambda Function works
At a high level, the Lambda function performs four things in sequence for every file it processes:
- Identify the log source: Security Lake organises files under a structured S3 key path that includes the log type (for example, CLOUD_TRAIL_MGMT, EKS_AUDIT, VPC_FLOW). The function reads this key path to determine which log source the file belongs to, and routes it to the corresponding Azure Event Hub entity. Files that cannot be matched to a known log type are skipped and logged as warnings.
- Download and decompress the Parquet file: The function streams the Parquet file from S3 directly to local Lambda storage rather than loading it entirely into memory. This keeps memory consumption bounded regardless of file size. Where Security Lake uses gzip compression, the function decompresses automatically before processing.
- Convert Parquet rows to JSON: Each row in the Parquet file is read in batches using PyArrow and converted to a JSON object. Parquet columns can carry data types — such as NumPy scalars, nested arrays, and high-precision timestamps — that are not natively serialisable to JSON. The function handles these type conversions before serialisation, ensuring clean output that Sentinel's ingestion pipeline can accept without errors.
- Forward events to Azure Event Hub: Converted JSON events are sent to the respective Azure Event Hub entity in batches, with each row becoming a discrete event. The function respects Event Hub's payload size ceiling, handles throttling responses gracefully using retry logic with exponential backoff, and marks each processed file in S3 metadata to prevent duplicate ingestion on retry.
Tuning the Lambda
Messages at cloud scale are unforgiving.
Memory, timeout, batch size, and retry behavior—each decision determined whether the messenger would keep up or fall behind.
Several configuration changes are required beyond the default Lambda settings to make this function production-ready.
Runtime and Dependencies — Lambda Layer
The function depends on three libraries not available in Lambda's default Python runtime: pyarrow (for Parquet reading), pandas (for type handling), and azure-eventhub (for Event Hub connectivity). These are packaged as an AWS Lambda Layer and attached to the function, keeping the deployment package clean and the layer reusable across function versions.
Step-by-step instructions for packaging the dependencies, creating the S3 bucket, publishing the Layer, and deploying the function are available in the GithubRepository-LambdaLayer.
Secrets Management — AWS Secrets Manager
The Azure Event Hub connection strings — one per log type — are sensitive credentials that must not be stored in environment variables in plaintext. The function retrieves them at cold start from AWS Secrets Manager, using a single secret ARN passed as a Lambda environment variable (SECRET_ARN).
The secret is stored as a JSON object with each log type as a key. The full secret structure and configuration steps are available in the GithubRepository-SecretsManager.
IAM Permissions
The Lambda execution IAM role requires scoped permissions across S3, Secrets Manager, SQS, and CloudWatch Logs. Full IAM policy JSON files following the principle of least privilege are available in the GithubRepository-IAMPolicies.
Deployment instructions for the IAM Policies are available in the IAM Policies README.
Memory Configuration
A starting configuration of 1,792 MB is recommended — this is the threshold at which Lambda may allocate a full vCPU. For environments with high log volumes or large Parquet files, increasing to 2,048 MB provides headroom for concurrent batch processing. Tune further based on observed execution durations in CloudWatch Metrics.
Timeout Configuration
The default Lambda timeout of 3 seconds is insufficient for Parquet processing at scale. The function must download a file from S3, process it in batches, and flush all events to Event Hub — a sequence that can take tens of seconds for larger Security Lake files.
A timeout of 5 minutes (300 seconds) is recommended as a starting point, with adjustment based on observed execution durations in CloudWatch Metrics.
SQS Trigger Configuration
The SQS queue connected to Security Lake S3 event notifications is configured as the trigger for the Lambda function. This enables automatic invoke of the Lambda function whenever Security Lake writes a new Parquet file to S3.
Validate Event Hub Ingestion
At this point, events should be streaming into each Event Hub. To validate, open a specific Event Hub from the Azure Portal and navigate to the Overview page. You should see active metrics across Requests, Messages, and Throughput, confirming that the Lambda function is successfully forwarding Security Lake events.
“Throughput spikes mean events are flowing correctly”In the upcoming sections, we will follow the documentation to ingest events from Eventhub to Azure Monitor.
Before we begin, please collect the required information as stated here to have resource ID's and other details ready for configuration of DCR and Data Collection Endpoint.
Also create a user assigned managed identity (UAMI), since the DCRs in this setup use a UAMI that should be granted the required permissions on the Event Hub Namespace to Receive Events.
To grant the Azure Event Hubs Data Receiver role to the user-assigned managed identity, follow the instructions here.
Creating Tables in Log Analytics Workspace
As mentioned in the Overview, this blog covers the following data sources:
- Amazon EKS audit and runtime events
- Route 53 DNS query logs
- AWS WAF access logs
- AWS Lambda execution activity
- Amazon S3 access events
The PowerShell scripts to create these tables are provided here: GithubRepo CreateLAWTables.
For assistance with executing these scripts, refer to: Readme.md.
Note: In the Microsoft Documentation to Ingest logs from EventHub into Azure Monitor only 3 fields are created in the tables (TimeGenerated, RawData, Properties). In this case, the entire Json event from the Event Hub is sent to the RawData field. However the scripts we are running are creating additional fields since we will be parsing the RawData field and extracting/parsing the information from the complete event into individual fields. This makes it easier to search and analyze logs later and schema improves detection rules and KQL efficiency.
Create a Data Collection Endpoint
To collect data with a data collection rule, you need a data collection endpoint:
- Create a data collection endpoint.
Note: Create the data collection endpoint in the same region as your Log Analytics workspace. - From the data collection endpoint's Overview screen, select JSON View.
- Copy the Resource ID for the data collection rule. You use this information in the next step while creating Data Collection Rules.
Create Data Collection Rules
Since we have 5 sources in scope for this example, we need to create 5 DCRs using the collected information as stated here, the user assigned managed identity resource ID, and the Data Collection Endpoint resource ID we created in the previous step.
DCR Deployment via ARM Templates
The Data Collection Rules ARM templates can be found in the GithubRepo-DataCollectionRules.
The instructions to create via ARM templates can be found in the readme.md (Microsoft article reference with manual steps here).
DCR Mapping (AWS Sources → DCR Templates)
ARM Templates in GitHub Repo to AWS Source mapping
|
Data Source |
DCR Template Name |
|
Amazon EKS Logs |
DCR-awseks.json |
|
AWS S3 Access Logs |
DCR-awsS3access.json |
|
AWS WAF Logs |
DCR-awswaf.json |
|
AWS Lambda Execution |
DCR-lambdaexecution.json |
|
Amazon Route 53 Logs |
DCR-Route53.json |
Final Step: Associating the Event Hub with the Data Collection Rule
At this stage, the core building blocks of the ingestion pipeline are already in place. We have successfully configured streaming to Event Hubs, created dedicated Event Hubs for each Amazon Security Lake source, provisioned destination tables in the Microsoft Sentinel workspace, and defined Data Collection Rules (DCRs) — leveraging the Azure Monitor pipeline and a user-assigned managed identity to securely read incoming events.
The final step is to stitch this entire architecture together by establishing associations between the Event Hubs and their corresponding Data Collection Rules.
This association acts as the connection that links Event Hub to Sentinel that enables Azure Monitor to actively pull data from the Event Hubs and route it into the defined destination tables. Without this linkage, the pipeline remains incomplete — data may continue to flow into Event Hubs, but it will not be picked up or ingested into Sentinel.
Each Event Hub must be explicitly mapped to its respective Data Collection Rule, ensuring:
- The correct stream is processed by the intended transformation logic
- Events are routed to the appropriate custom tables
- The ingestion pipeline operates in a deterministic and scalable manner helping ensure data is routed to the intended destination in a consistent and predictable manner.
Once these associations are configured, the end‑to‑end pipeline is intended to operate as designed, subject to configuration accuracy and ongoing operational monitoring— enabling automated ingestion workflows of Amazon Security Lake data into Microsoft Sentinel with minimal manual intervention once configured.
Steps to be followed: Associate the data collection rule with the event hub
To complete the setup, we now associate each Event Hub with its corresponding Data Collection Rule (DCR). This creates the link that allows Azure Monitor to read data from the Event Hub and send it to Microsoft Sentinel.
Important: You must create one association per Data Collection Rule.
Example: If an Event Hub is receiving AWS Route 53 logs, it must be associated with the DCR created for AWS Route 53.
Copy the template in the template from the above link and create the Data Collection Rule associations. We have to create 1 association per Data Collection Rule.
What You Need
- Event Hub Resource ID
Follow these steps to get the Event Hub Resource ID:
- Open the Event Hub Namespace in the Azure Portal
- Go to Entities → Event Hubs
- Select the Event Hub that is receiving the logs (e.g., Route 53 logs)
- In the Overview page, click on JSON View
- Copy the Resource ID
- Resource Group, Region, and Association Name
- The Resource Group must be the same as the one where the Event Hub is deployed
- Provide the Azure region where the resources are deployed
- Define a unique name for the association
- Data Collection Rule (DCR) Resource ID
- Open the corresponding Data Collection Rule
- Go to the Overview page
- Click on JSON View
- Copy the Resource ID
Key Note: Make sure that
- Each Event Hub is mapped to the correct DCR
- The data source and DCR template are aligned (e.g., Route 53 → Route 53 DCR)
Validate End-to-End Ingestion
Once this configuration is complete, we can validate the logs in the destination table, fields, and parsing accuracy.
Seeing logs here indicates that ingestion issuccessfully flowing through the pipeline.If you are building on a lake-first security architecture and running into ingestion challenges with Sentinel, feel free to share your experience in the comments or raise an issue in the GitHub repository.