As organizations scale their security monitoring, a key challenge is maintaining visibility while controlling costs. High‑volume logs—such as firewall, proxy, and endpoint data—are essential for achieving full threat visibility and must be ingested to understand the complete threat environment.
With Microsoft Sentinel data lake, you can ingest high‑volume logs directly into the data lake tier—significantly reducing storage costs while maintaining full visibility. After ingestion, you can extract, enrich, summarize, or normalize events to highlight what matters most for security. Only the enriched, high-value events are then promoted to the Analytics tier for correlation, detection, and investigation.
This approach offers several advantages:
- Higher ROI : Organizations get more value from their security data by storing infrequently queried or low‑detection‑value logs in the lower‑cost data lake tier—boosting ROI while preserving complete visibility. With raw log storage dramatically cheaper in the data lake tier, teams can retain more data, uncover deeper insights, and optimize spend without compromising security.
- Performance optimization: Send only high‑value security data to the Analytics tier to keep performance fast and efficient
- Targeted insights: Post-ingestion processing allows analysts to classify and enrich logs, removing noise and enhancing relevance.
- Flexible retention: The data lake enables long-term retention with 6x compression rate enabling a historical analysis and deeper insights
In this post, we will demonstrate how to leverage the Sentinel data lake by ingesting data into the CommonSecurityLog table. We will classify source and destination IPs as public or private, enrich the logs with IP classification, and then group and summarize repeated SourceIP-DestinationIP- RequestURL pairs to highlight outbound communication patterns. We will highlight how post-ingestion processing can reduce costs while improving the analytical value of your data. You can achieve this via KQL jobs in Microsoft Sentinel data lake.
What are KQL jobs?
KQL jobs in Sentinel data lake are automated one-time or scheduled jobs that run Kusto Query Language (KQL) queries directly on data lake. These jobs help security teams investigate and hunt for threats more easily by automating processes like checking logs against known threats, and enriching and grouping network logs before they are promoted to the Analytics tier. This automation reduces storage costs while producing high-value, investigation-ready datasets.
Scenario
Firewall and network logs typically contain a wide range of fields, such as SourceIP, DestinationIP, RequestURL, and DeviceAction. While these logs provide extensive information, much of it may not be required for analytics detections, nor does every log entry need to be available in the Analytics tier. Additionally, these logs often lack contextual details, such as traffic is internal or external, and do not inherently highlight repeated communication patterns or spikes that may warrant investigation.
For high-volume logs, organizations can leverage post-ingestion enrichment and summarization within the data lake tier. This process transforms raw logs into structured, context-rich datasets, complete with IP classification and grouped connection patterns, making them ready for targeted investigation or selective promotion to the Analytics tier. This approach ensures that only the most relevant and actionable data is surfaced for analytics, optimizing both cost and operational efficiency.
Step 1 - Filter and enrich logs
Test and refine your query using lake explorer before automating. The following KQL query filters out empty IPs, classifies each source and destination IP as Public, Private, or Unknown, and groups repeated SourceIP-DestinationIP-RequestURL combinations to reveal meaningful traffic patterns:
CommonSecurityLog
| where isnotempty(SourceIP) or isnotempty(DestinationIP)
| extend
SrcIPType = iff(ipv4_is_private(SourceIP), "Private", iff(isempty(SourceIP), "Unknown", "Public")),
DstIPType = iff(ipv4_is_private(DestinationIP), "Private", iff(isempty(DestinationIP), "Unknown", "Public"))
| summarize Count = count()
by SourceIP, SrcIPType,
DestinationIP, DstIPType,
RequestURL
This classification adds context to high-volume network logs, making it easier to identify traffic between internal and external networks, as well as significantly reducing the volume of data surfaced into the Analytics tier.
Step 2 - Automate post-ingestion processing using KQL jobs
Once the query logic is validated in the lake explorer, you can automate it to run continuously as a KQL job:
- Scheduled KQL jobs can process new logs and store the results in a custom table in Analytics tier.
- Using Custom detection rules on prompted data, you can setup alerts for any anomalies.
- Using Microsoft Sentinel Workbooks you can visualize the enriched data for monitoring and analysis.
Automating this workflow ensures that every log batch arriving in the data lake is enriched and included in summarized log in Analytics tier, while maintaining balance between cost and visibility.
How to schedule this process using a KQL job
To run post-ingestion processing query and retain results periodically, we would like to schedule this job to run every hour to summarize network log and store results in a custom table in Analytics tier.
To avoid missing any logs, we recommend adding a delay of 15 minutes in the query to make sure all logs are available in lake and included in the job runs.
let dt_lookBack = 1h; // Look back window duration
let delay = 15m; // Delay to allow late-arriving data
let endTime = now() - delay;
let startTime = endTime - dt_lookBack;
CommonSecurityLog
| where TimeGenerated >= startTime and TimeGenerated < endTime
| where isnotempty(SourceIP) or isnotempty(DestinationIP)
| extend
SrcIPType = iff(ipv4_is_private(SourceIP), "Private", iff(isempty(SourceIP), "Unknown", "Public")),
DstIPType = iff(ipv4_is_private(DestinationIP), "Private", iff(isempty(DestinationIP), "Unknown", "Public"))
| summarize Count = count()
by SourceIP, SrcIPType, DestinationIP, DstIPType, RequestURL
| order by Count desc
KQL jobs can run ad-hoc or in a schedule (by minutes, hourly, daily, weekly or monthly). To obtain the enriched log continuously in Analytics tier, we will schedule this hourly:
Results are automatically available in the Analytics tier and can be used to set up a new custom detection rule.
Cost of this KQL job in Sentinel data lake
The cost of running KQL jobs in Sentinel data lake depends on the volume of data scanned and how frequently the jobs run. Data lake KQL queries and jobs are priced at Analyzed data. Estimated cost analysis details of data lake:
|
Item |
Calculation / Notes |
Cost ($) |
|
Data lake ingestion and processing (1TB/month) |
1TB × ($0.05 + $0.1) per GB |
$153.60 |
|
Hourly KQL jobs for 1-hour lookback (1.4 GB/hour) |
1.4 GB × $0.005 × 24 × 30 |
$5.04 |
|
Enriched data sent to Analytics (10% of 1TB) |
102.4 GB × $4.3 per GB |
$440.32 |
|
Total data lake approach |
$153.6 + $440.32 + 5.04 |
$598.96 |
Note: This sample pricing model applies to East US region.
This pricing model allows organizations to perform large-scale threat hunting and intelligence matching without the high expenses typically associated with traditional SIEMs.
Summary of monthly costs
|
Approach |
Estimated Monthly Cost ($) |
|
Raw data ingestion to Analytics tier (1TB) |
$4,386 |
|
Raw data ingestion to data lake tier + enriched summary to Analytics |
$598.96 |
|
Savings |
$3,787.04 |
For more details around Microsoft Sentinel data lake costs for KQL queries and jobs, see https://azure.microsoft.com/en-us/pricing/calculator.
Summary and next steps
High-volume logs are critical for security visibility, but ingesting all raw data directly into the Analytics tier can be expensive and unwieldy. By storing logs in Sentinel data lake and performing post-ingestion enrichment, organizations can classify and contextualize events, reduce costs, and maintain a lean, high-performing analytics environment.
This approach demonstrates that smarter processing, is the key to maximizing both efficiency and security insight in Microsoft Sentinel.
Get started with Microsoft Sentinel data lake today.
Microsoft Sentinel is a cloud-native SIEM, enriched with AI and automation to provide expansive visibility across your digital environment.