Blog Post

Microsoft Sentinel Blog
10 MIN READ

Leave no data behind: Using summary rules to store data cost effectively in Microsoft Sentinel

AryaG's avatar
AryaG
Icon for Microsoft rankMicrosoft
Nov 19, 2024

Introduction

A special thank you note to  MariaSousaValadas  Yael_Bergman  for contributing to the content of this blog.

Security Operations teams all over the world use SIEMs and security tools such as Microsoft Sentinel and Microsoft Defender XDR to defend their IT and OT state against attackers.  Larger organizations tend to amass huge amounts of data ranging from hundreds of gigabytes to terabytes of data. Not all this data has security value at first sight, but by considering common patterns and trends, this data can still allow defenders to detect incidents that might have otherwise gone undetected.

This leads to a conundrum on two levels:

  1. How do we effectively detect threats on these data sets that may look unimportant
  2. How can we do this in a cost-effective way

Summary rules in combination with proper log tiering are the ideal solution for these kinds of situations. They allow us to look for anomalies and trends in large amounts of data and in addition they also work on all the tiers of data storage in the unified SOC platform. 

In this blog, we will walk you through the setup and a couple of use cases that you can use in your own environment as well. We’ll show you how to derive detection value from noisy or high-volume data by using auxiliary logs with summary rules. This approach helps you manage large datasets efficiently, extract valuable insights, and detect threats without overwhelming your system with noise.

Data tiering strategy

Before we dive deeper into the use cases, it's important that you understand and consider a good data tiering strategy. As much as security practitioner would love to ingest all data types and keep them for as much as possible, that is neither feasible from a cost perspective, nor from a practical standpoint. It is for this reason that we need to think about how we can combine these conditions without jeopardizing our security.

 

To optimize your Sentinel setup, we must consider that not all data is created equal. In fact, there are some systems that are either very verbose, such as firewall or do not have security value at first glance, such as webserver request logs. This is also why Microsoft recommends that you perform an exercise on the data types that you are ingesting to decide the right tier.

 

Microsoft makes a recommendation of discerning primary security data from secondary security data, one way of doing so is by considering if you need that data to create alerts and incidents and how critical that information is. If you need to monitor this data and create incidents based on that on a daily or hourly basis, then chances are that it is primary security data.

 

Once you have decided what kind of data is secondary, Sentinel allows you to save that data in your workspace in a cost-efficient way. Your primary security data should always live in the Analytic tiers of your workspace. Your secondary data on the other hand can be saved in Auxiliary logs. Which of these two you need depends on your specific use case with that data. The main differences come down to whether you need to be able to retrieve or export data stored. A good overview is given on this page.

 

If you are struggling to understand what kind of data falls under which category, you can find more information and examples of this on this page. Another good solution for those of you who have already started ingesting data into the Microsoft Sentinel and our unified SOC portal would be use the SOC optimization feature which is beyond the scope of today's blog.

For both data categories (primary and secondary), preservation happens in two stages:

  1. The interactive retention state is the initial state into which the data is ingested. This state allows different levels of access to the data, depending on the plan, and costs for this state vary widely, depending on the plan.
  2. The long-term retention state preserves older data in its original tables for up to 12 years, at an extremely low cost, regardless of the plan.

To learn more about retention states, see Manage data retention in a Log Analytics workspace.

So, you had a look at your data, decided that on what is secondary and have been able to ingest that data in your unified SOC platform (if you are looking for ways to do this, you can find more information here). Then comes the crux of the matter, which is how can we make sure that we use our secondary data to enrich our primary security data or to start investigations based on anomalies or trends within that data.

Summary Rules

What are they

Summary rules aggregate large data sets in the background and provide you with the results in separate analytics tables. This allows you to look at large data sets in cheaper tiers and yet query the results fast and visualize them using workbooks and so on. However, summary rules are not tied only to these tiers, it is perfectly possible to run them on analytics tables as well. Summary data is then stored in a custom log table in the analytic tier, which provides fast query performance. On Microsoft Sentinel and on the unified SOC platform you can easily create summary rules that run on a frequency between 20 minutes and 24 hours.

This has the following advantages:

  1. You can easily create aggregated summarized data, which in turns allows for easy analysis and reporting towards internal stakeholders
  2. You can save cost as you can store larger amounts of data in non-analytics tiers as discussed above
  3. You can summarize data you would like to keep for longer periods of time, saving costs in retention

So, revisiting our data tiering strategy from above, we can map our summary rules as follows

 

 

How it works

Summary rules work directly on your log analytics workspace, they can be used on both auxiliary and analytics logs and at the core they are KQL queries that are performed on a frequency that you define during setup of the rule. The rule frequency implicitly defines how much data is aggregated (bin size) e.g. if your rule is set to run every 3 hours, the rule will evaluate all the data that was ingested during the last 3 hours.

 

It is also important to take into consideration that there are certain limits to summary rule queries which are described here. If you are hitting these limits or getting close to hitting them, you can adjust your rule frequency which in effect changes your bin size and allows you to stay within the limits.

 

The results of your summary rules are then re-ingested into your workspace in a custom analytics table, which in turn can be used to create incidents or run automations based on the results.

 

 

 

For a full rundown on how to create a rule, have a look at our documentation pages.

One last thing to consider is that although there are no costs associated with summary rules themselves, there are costs associated to querying and re-ingesting the data back in your Log Analytics Workspace:

  • Scan query cost: this does not apply if your summary rule runs on the analytic tier, as querying data in this tier is free. However, if you are querying an auxiliary log table, there is a query cost which is region bound (please see the Query section on the pricing page). For example, for a workspace hosted in East US, the query cost is 0.005 USD per GB scanned, if you were to create a summary rule on an auxiliary log table where you ingest 1TB/logs per day, you will pay $5 per day. 
    Note: At the time this blog post was published, auxiliary log ingestion and querying remain free.
  • Ingestion of summary dataset into the analytic tier: as results of your summary rule run are ingested into a summary table, you will pay accordingly. If you only ingest 5% of the 1TB/day mentioned in the example above into your summarized data set, this means you will pay for 50GB/day. We recommend using summarize in your queries to prevent large chunks of data being re-ingested.

In the next section, we will delve a bit more into some use cases where you could use summary rules.

Use cases

In this section, we will cover a few use cases that can help you get started with summary rules, they are by no means exhaustive but rather meant to exemplify the uses of summary rules.

  • Detecting threats
    • TI IOCs found in our environment
  • Anomalies
  • Trends
  • Retention

Before you get started in your own environment, make sure you have checked the documentation and pricing implications (described in the How it works and how to use sections above).

Use case: Threat Detection - Lookups to detect Threat Intelligence indicators of compromise (TI IOC)

Firewall Data

Firewall data are a great candidate for auxiliary logs. Firewalls generate a lot of valuable security data, but they also generate a lot of noise. By splitting the data and storing them in separate tables, you can still detect threats in the secondary security data.

 

The following KQL will check your firewall data (in CEF format) and match the IP addresses against any TI IoCs and if and only if there is a match, the result will be added to the summary results which is re-ingested into the workspace.

CommonSecurityLog_CL | extend sourceAddress = tostring(parse_json(Message).sourceAddress), destinationAddress = tostring(parse_json(Message).destinationAddress), destinationPort = tostring(parse_json(Message).destinationPort) | lookup kind=inner (ThreatIntelligenceIndicator | where Active == true ) on $left.sourceAddress == $right.NetworkIP | project TimeGenerated, Activity, Message, DeviceVendor, DeviceProduct, sourceMaliciousIP =sourceAddress, destinationAddress, destinationPort | summarize count() by sourceMaliciousIP, destinationAddress, destinationPort

 

As you can see by the last line, we summarize by the source IP address, destination address and port. If you have a watchlist in Sentinel that keeps track of for example, your honeypots or critical infrastructure, you could use that to create an incident whenever there is a hit in your watchlist, or alternatively you could always create an incident.

Azure Logs

Another use case could be to scan TI in your Azure logs. While not all the standard tables support the auxiliary tier, most of them do support the basic logs tier which allows you to save cost and run summary rules on them. In the following KQLs we can look at storage blob logs and graph activity to do the same correlation against TI indicators.

Storage Blogs

StorageBlobLogs | lookup kind=inner (ThreatIntelligenceIndicator| where Active == true) on $left.CallerIpAddress == $right.NetworkSourceIP

Microsoft Graph Activity

MicrosoftGraphActivityLogs | lookup kind=inner (ThreatIntelligenceIndicator | where Active == true) on $left.IPAddress == $right.NetworkSourceIP

 

Use case: Trends

Another use case that can be achieved easily is looking for trends in your secondary data. In the following query, we can use the Microsoft Graph Activity to get a list of apps and service principals that call the users endpoint. We can then use this to visualize trends in applications and principals that are suddenly starting to call certain endpoints more in a workbook.

MicrosoftGraphActivityLogs | where RequestUri has "users" | summarize NumRequests=count() by AppId, ServicePrincipalId, UserId | sort by NumRequests desc

 

Use case: Anomalies

While Microsoft Sentinel has built-in anomaly detection, you can expand this to your own niche data sources. Summary rules can be used to create a baseline and then trigger an alert when you have outliers in your data. In the following example, which we adapted from Ashwin Patil's  blogpost, we are analyzing Palo Alto Logs to detect data exfiltration.

let PrivateIPregex = @'^127\.|^10\.|^172\.1[6-9]\.|^172\.2[0-9]\.|^172\.3[0-1]\.|^192\.168\.'; CommonSecurityLog_CL | extend sourceAddress = tostring(parse_json(Message).sourceAddress), destinationAddress = tostring(parse_json(Message).destinationAddress), SentBytes = tolong(parse_json(Message).bytesOut) | where Activity == "TRAFFIC" | where isnotempty(destinationAddress) and isnotempty(sourceAddress) | extend DestinationIpType = iff(destinationAddress matches regex PrivateIPregex,"private" ,"public" ) | where DestinationIpType =="public" | summarize TotalBytesSent = sum(SentBytes)

 

Here, we are summarizing the total sent bytes to public IP addresses. Then, we could create an analytic rule that will trigger an alert if we see a significant increase in this number as this might indicate that data exfiltration is happening.

Once you have your summary dataset (e.g. SummarizedBytesOut_CL), you can create an analytic rule that will run on this data, as it is an analytic table. Your analytic rule can look back 14 days to create a baseline, and it can trigger an alert if your ingestion over the last 24 hours was 50% or higher than your baseline, as in the example below:

let lookbackPeriod = 14d; let detectionWindow = 1d; // Calculate the baseline as the average TotalBytesOut over the last 30 days (excluding the last 24 hours) let baseline = toscalar(SummarizedBytesOut_CL | where TimeGenerated between (ago(lookbackPeriod) .. ago(detectionWindow)) | summarize avg(TotalBytesOut)); // Retrieve data for the last 24 hours and check against the baseline SummarizedBytesOut_CL | where TimeGenerated >= ago(detectionWindow) | extend Baseline = baseline | extend Threshold = Baseline * 1.5 // 50% above baseline | where TotalBytesOut > Threshold | project TimeGenerated, TotalBytesOut, Baseline, Threshold

In this example, we would get an alert as the IncreasedPercentage was 77.5% compared to the baseline:

Use case: Retention

Depending on local regulations, you may need to retain the entirety of your data for a specific period. Additionally, you might choose to keep summarized data for extended periods to support future investigations or trend analysis.

When not to use summary rules?

Summary rules are excellent when you want to save cost, but do not want to jeopardize security. They are efficient in a vast number of use cases, but there are also cases when you should not use them because there are better solutions available, for example:

  • For granular retention, you can fork the data you want to retain for a longer period to a _CL table, without using summarization.
  • For data masking or anonymization, you can use a DCR transformation as well, you can have a look at this  transformation created by Javier Soriano
  • If you want to remove data that you do not need, you can also use a DCR transformations for example only storing StorageBlogLogs when there was anonymous authentication
    StorageBlobLogs | where TimeGenerated > ago(7d) and AuthenticationType == "Anonymous"

Conclusion

In this blog post, we explained how you can store as much security data as possible in Microsoft Sentinel in a cost-efficient way using auxiliary logs. We also explained what summary rules are and how they can be used to still get value out of your auxiliary logs given the limitations of that data tier. Lastly, we provided a few use cases and queries to get you started with summary rules and when not to use summary rules.

Updated Nov 18, 2024
Version 1.0
No CommentsBe the first to comment