Using Azure Data Explorer for long term retention of Microsoft Sentinel logs
Published Nov 13 2020 02:59 AM 46.4K Views
Microsoft

 

** In February 2022, Azure Monitor Logs announced the new Archive tier, which is the new preferred way to store logs for long term retention in Microsoft Sentinel. Visit this link for more details about the new archive tier. **

 

In this blog post, we will explain how you can use Azure Data Explorer (will be referred to in this blog post as ADX from now on) as a secondary log store and when this might be appropriate for your organization.

 

One of the common questions that we get from customers and partners is how to save money on their Microsoft Sentinel bill, retention costs being one of the areas that can be optimized. As you may know, data retention in Sentinel is free for 90 days, after that, it is charged.

 

Customers normally need to keep data accessible for longer than three months. In some cases, this is

just due to regulatory requirements, but in some other cases they need to be able to run investigations on older data. ADX can be a great service to leverage in these cases, where the need to access older data exists, but at the same time customers want to save some costs on data retention.

 

What is Azure Data Explorer (ADX)?

ADX is a big data analytics platform that is highly optimized for all types of logs and telemetry data analytics. It provides low latency, high throughput ingestions with lightning speed queries over extremely large volumes of data. It is feature rich in time series analytics, log analytics, full text search, advanced analytics (e.g., pattern recognition, forecasting, anomaly detection), visualization, scheduling, orchestration, automation, and many more native capabilities.

 

Under the covers, it is a cluster comprised of engine nodes that serve out queries and data management service nodes that perform/orchestrate a variety of data related activities including ingestion and shard management. Some important features include:

 

  • It offers configurable hot and cold caches backed by memory and local disk, with data persistency on Azure Storage.
  • Data persisted in ADX is durably backed by Azure Storage that offers replication out of the box, locally within an Azure Data Center, zonally within an Azure Region.  
  • ADX uses Kusto Query Language (KQL) as the query language, which is what we also use in Microsoft Sentinel. This is a great benefit as we can use the same queries in both . Also, and as we will explain later in this article, we can perform cross-platform queries that aggregate/correlate data sitting across ADX and Sentinel/Log Analytics.

From a cost perspective, it offers reserved instance pricing for the compute nodes and autoscaling capabilities, which adjust the cluster size based on workload in execution. Azure Advisor also integrates with ADX to offer cost optimization recommendations.

 

You can learn much more about ADX in the official documentation pages and the ADX blog.

 

When to use ADX vs Azure for long term data

Microsoft Sentinel is a SaaS service with full SIEM+SOAR capabilities that offers very fast deployment and configuration times plus many advanced out-of-the-box security features needed in a SOC, to name a few: incident management, visual investigation, threat hunting, UEBA, detection rules engine powered by ML, etc.

 

That being said, security data stored in Sentinel might lose some of its value after a few months and SOC users might not need to access it as frequently as newer data. Still, they might need to access it for some sporadic investigations or audit purposes but remaining mindful of the retention cost in Sentinel. How do we achieve this balance then? ADX is the answer :smile:

 

With ADX we can store the data at a lower price but still allowing us to explore the data using the same KQL queries that we execute in Sentinel. We can also use ADX proxy feature, which enables us to perform cross platform queries to aggregate and correlate data spread across ADX, Application Insights, and Sentinel/Log Analytics. We can even build Workbooks that visualize data spread across these data stores. ADX also opens new ways to store data that provide us with better control and granularity (see Management Considerations section below).

 

As a summary, in the context of a SOC, ADX provides the right balance of cost/usability for aged data that might not benefit anymore from all the security intelligence built on top of Azure .

 

Log Analytics Data Export feature

Let’s take a look at the new Data Export feature in Log Analytics that we will use in some of the architectures below (full documentation on this feature here).

 

data-export-overview.png

 

This feature lets you export data of selected tables (see supported tables here) in your Log Analytics workspace as it reaches ingestion, and continuously send it to Azure Storage Account and/or Event Hub.

 

Once data export is configured in your workspace, any new data arriving at Log Analytics ingestion endpoint and targeted to your workspace for the selected tables, is exported to your Storage Account hourly or to EventHub in near-real-time. NOTE: There isn't a way to filter data and limit the export to certain events – for example: when configuring data export rule for SecurityEvent table, all the data sent to SecurityEvent table is exported starting at configuration time.

 

Take the following into account:

  • Both the workspace and the destination (Storage Account or Event Hub) must be located in the same region.
  • Not all log types are supported as of now. See supported tables here.
  • Custom log types are not supported as of now. They will be supported in the future.

Currently there’s no cost for this feature during preview, but in the future, there will be a cost associated with this feature based on the number of GB transferred. Pricing details can be seen here.

 

Architecture options

There are a couple of options to integrate Microsoft Sentinel and ADX.

  • Send the data to Sentinel and ADX in parallel
  • Sentinel data sent to ADX via Event Hub

Send the data to Sentinel and ADX in parallel

 

parallel.png

 

This architecture is also explained here. In this case, only data that has security value is sent to Microsoft Sentinel, where it will be used in detections, incident investigations, threat hunting, UEBA, etc. The retention in Microsoft Sentinel will be limited to serve the purpose of the SOC users, typically 3-12 months retention is enough. All data (regardless of its security value) will be sent to ADX and be retained there for longer term as this is cheaper storage than Sentinel/Log.

 

An additional benefit of this architecture is that you can correlate data spread across both data stores. This can be especially helpful when you want to enrich security data (hosted in Sentinel), with operational data stored in ADX (see details here). 

 

Adopting this architecture means that there will be some data that is stored in both Log Analytics and ADX, because what we send to Log Analytics is a subset of what is sent to ADX. Even with this data duplication, the cost savings are significant as we reduce the retention costs in Sentinel.

 

Microsoft Sentinel data sent to ADX via Event Hub

Combining the Data Export feature and ADX, we can choose to stream our logs to Event Hub and then ingest them into ADX. The high-level architecture would look like this:

 

 

EventHub.png

 

With this architecture we have the full Sentinel SIEM experience (Incident management, Visual investigation, Threat hunting, advanced visualizations, UEBA,…) for data that needs to be accessed frequently (for example, 6 months) and then the ability to query long term data accessing directly the data stored in ADX. These queries to long term data can be ported without changes from Sentinel to ADX.

 

Similar to the previous case, with this architecture there will be some data duplication as the data is streamed to ADX as it arrives into Log Analytics.

 

How do you set this up you might ask? Here you have a high-level list of steps:

 

1. Configure Log Analytics Data Export to Event Hub. See detailed instructions here.

 

Steps 2 through 6 are documented in detail in this article: Ingest and query monitoring data in Azure Data Explorer.

 

2. Create ADX cluster and database. The database is basically a workspace in Log Analytics terminology. Detailed steps can be found here. For guidance around ADX sizing, you can visit this link.

 

3. Create target tables. The raw data is ingested first to an intermediate table where the raw data is stored. At that time, the data will be manipulated and expanded. Using an update policy (think of this as a function that will be applied to all new data), the expanded data will then be ingested into the final table that will have the same schema as the original one in Log Analytics/Sentinel. We will set the retention on the raw table to 0 days, because we want the data to be stored only in the properly formatted table and deleted in the raw data table as soon as it’s transformed. Detailed steps for this step can be found here.

 

4. Create table mapping. Because the data format is json, data mapping is required. This defines how records will land in the raw events table as they come from Event Hub. Details for this step can be found here.

 

5. Create update policy and attach it to raw records table. In this step we create a function (update policy) and we attach it to the destination table so the data is transformed at ingestion time. See details here. This step is only needed if you want to have the tables with the same schema and format as in Log Analytics.

 

6. Create data connection between EventHub and raw data table in ADX. In this step, we tell ADX where and how to get the data from. In our case, it would be from EventHub, specifying the target raw table, what is the data format (json) and the mapping to be applied (created in step 4). Details on how to perform this here.

 

7. Modify retention for target table. The default retention policy is 100 years, which might be too much in most cases. With the following command we will modify the retention policy to be 1 year: .alter-merge table <tableName> policy retention softdelete = 365d recoverability = disabled

 

The good news is that all these steps can be easily automated with this script by @sreedharande. Visit the script page for more details on how to use it. 

 

Additional cost components of this architecture are:

  • Log Analytics Data Export – charged per exported GBs
  • Event Hub – charged by Throughput Unit (1 TU ~ 1 MB/s)

 

Management considerations in ADX

 

These are some of the areas where ADX offers additional controls:

 

  1. Cluster size and SKU. You must carefully plan for the number of nodes and the VM SKU in your cluster. These factors will determine the amount of processing power and the size of your hot cache (SSD and memory). The bigger the cache, the more data you will be able to query at a higher performance. We encourage you to visit the ADX sizing calculator, where you can play with different configurations and see the resulting cost. ADX also has an auto-scale capability that makes intelligent decisions to add/remove nodes as needed based on cluster load (see more details here)

 

  1. Hot/cold cache. In ADX you have greater control on what data tables will be in hot cache and therefore will return results faster. If you have big amounts of data in your ADX cluster, it might be advisable to break down tables by month, so you have greater granularity on which data is present in hot cache. See here for more details.

 

  1. Retention. In ADX we have a setting that defines when will the data be removed from a database or from an individual table. This is obviously an important setting to limit our storage costs. See here for more details.

 

  1. Security. There are several security settings in ADX that help you protect your data. These range from identity management, encryption, , etc. (see here for details). Specifically around RBAC, there are ways in ADX to restrict access to databases, tables or even rows within a table. Here you can see details about row level security.

 

  1. Data sharing. ADX allows you to make pieces of data available to other parties (partner, vendor), or even buy data from other parties (more details here).

 

Summary

As you have seen throughout this article, you can stream your telemetry data to ADX to be used as a long-term storage option with lower cost than Sentinel/Log Analytics, and still have the data available for exploration using the same KQL queries that you use in Sentinel :smile:

 

Please post below if you have any questions or comments.

 

Thanks to all the reviewers @Sarah_Young , @Jeremy Tan , @Tiander Turpijn , @Inwafula , @Matt_Lowe and special thanks to @Uri Barash  and @minniw from the ADX team for the collaboration!

 

 

 

 

 

 

 

 

 

 

 

 

 

24 Comments
Bronze Contributor

Do you know if Jupyter notebooks can access data in a Data Explorer instance?

Microsoft
Brass Contributor

@Javier Soriano Behind the scene is Log Analytics formed of clusters like a managed small scale ADX and if so can the retention properties of ADX be not exposed to users to manage them directly.

Just checking if we can manage the all the data in one place and not have replicated clusters.

Microsoft

HI @Joseph-Abraham , you will be able to access all the data from one place very soon when we add support for running queries from Log Analytics targeting ADX tables. So you will be able to explore all your data estate from Log Analytics using the same KQL queries regardless of data location.

 

Regards

Copper Contributor

Thankyou for this post @Javier Soriano - Out of interest, are you able to provide some indicative pricing data for this? No matter how I re-arrange my pricing calculator/ADX Calc, I can't really see how the ADX/Sentinel combo works out cheaper over a two year period. Does this only really apply to longer retention frames? (24 Month +) Of course, the discrepancy may just be my calculator usage. 

Cheers,

JW

Microsoft

HI @jameswestallll , sure. Here are some calculations for a 200GB/day volume with 1 year retention:

 

Assumptions:

- ADX Hot data days: 15

- ADX compression: 7

- Region: US East 2

 

ADX cost:

- $2,838/month (PAYG)

- $2,191/month (1YRI)

- $1,745/month (3YRI)

 

Sentinel retention cost: $6,570/month

 

Over a two year period, it would be even cheaper.

 

Regards

Brass Contributor

@Javier Soriano  Just to be sure that i get this right , for 1TB per day with retention for 2 years the ADX Annual cost would be ~47K ?

Assumptions:

- ADX Hot data days: 0 ( Events from immediate past should be in log analytics )  

- ADX compression: 7

- Region: US East 2

 

 

Microsoft

Yes, it would be around that number just for the ADX part of the bill. And that's if using PAYG, if you use reserved instance pricing it would less.

 

You can make your own calculations in the official ADX calculator: https://dataexplorer.azure.com/AzureDataExplorerCostEstimator.html 

Brass Contributor

@Javier Soriano Isn’t the first architecture different from the other two, in the sense that in the first one all source data is sent to ADX, and in the other two only the Sentinel data?

I.e. for instance Defender for Endpoint connected to Sentinel means only the DfE alerts are sent to Sentinel, and not all DfE raw data (i.e. all events).

In the first architecture, all raw data is sent to ADX and the (in this case DfE-) data connector only sends the alerts to Sentinel. In the latter two architectures, only the data already ingested into Sentinel (i.e. only DfE alerts in this example) will be sent to ADX.

Also, somewhere it is stated that for retention periods of over two years ADX would be feasible, but to me it seems that from day 91 this is benificial/cost effective, given that 90 days of data in Sentinel is sufficient for the SOC.

Brass Contributor

By “Somewhere it is stated”, I mean Sentinel Ninja training Part 2, Section 5, then under Retention.

Microsoft

Hi @AndrePKI , when we draw data sources in the diagrams, we are talking about any data source, doesn't matter if it's a VM on prem, in the cloud, a PaaS or a SaaS service (like DfE). So Defender for Endpoint still could be split at source and sent to both Sentinel and ADX. You can also use a hybrid approach, where some sources are split at the sources and some others are only sent to Sentinel and data export is then used to forward into ADX. The architectures in this article are just conceptual and you can build different solutions that use multiple approaches for different data sources....sometimes you're even forced to do so due to technical limitations.

 

Whether 90 days is sufficient for a SOC or not is a bigger discussion...there are some things that you lose when you send the data over 90 days old to ADX, like entity history and trend or easier navigation of logs through our visual investigation. In any case, every customer can choose whatever they think it's best for their SOC in terms of retention.

 

Regards

Silver Contributor

@Javier Soriano I am seeing reports in the news that are highly critical about the log retention policies for O365. How does the licensing of O365 get factored into the architectural decisions? Is there a similar article for using O365 logs with ADX?

Microsoft

Hi @Dean Gross , not that I'm aware of. In any case, O365 activity is already free to ingest in Sentinel, and you could then use data export/event hub to send to ADX.

 

Regards

Copper Contributor

The last process, which uses Azure Data Factory, the template provided by Microsoft for lastmodifieddate properties allows to copy data to another storage account but not to Azure Data Explorer. Is there any other template which can be used?

Copper Contributor

Greetings @Javier Soriano !

 

We are using Sentinel and most of our business-critical logs are maintained in Custom Tables. What would be the best way out to implement ADX for long term retention? Would we need to duplicate our Data Connectors to send them over to ADX via Event Hub?  So far, from what we've seen, the most interesting scenario would be to send Sentinel data (including Custom Tables) over Event Hub to ADX.

 

Thank you for your attention,

 

Microsoft

Hi @paulolana , as of now, you could send the data from custom tables to EventHub (or directly to ADX), using Logic Apps. The method to use would be similar to what is explained here: Archive data from Log Analytics workspace to Azure storage using Logic App - Azure Monitor | Microso...

 

Data export support for custom logs is in the roadmap.

 

Regards

Iron Contributor

Hi @Javier Soriano are there any updates to using ADX with Sentinel?

Specifically what's the most cost effective way to do this, the logic app method you mention above?

Is it necessary to run the ADX instance if you're just storing data?

Can you power up the ADX service to read data only when needed and thus get storage cheaper?

What would it cost to just store the data and only use the ADX service to read when needed?

Or if there's something in preview for this I'm interested.

Excuse me if I'm misunderstanding this..

 

Microsoft

Hi @SocInABox , did you watch this webinar? Using Azure Data Explorer as Your Long Term Retention Platform of AS Logs - Azure Sentinel webinar -... We try to explain all available options there.

 

In any case, if you're in the private preview community, there's a new feature there that will add new options for long term retention. If you're not part of the private preview ring, stay tuned as it will be public in a few weeks.

 

Regards

Iron Contributor

Hey @Javier Soriano that video is exactly why I'm asking the question :).

I just signed up for preview but I don't see an obvious update that's related to this, can you point out the name of the preview feature?

I'm simply looking for the cheapest way to backup sentinel to ADX where I can just use ADX very rarely to pull back data when needed for forensics, governance etc.

Thank you.

 

Microsoft

@SocInABox for that use case, I would recommend the approach where you send to storage and just just ADX to query the data through an external table. Described here: Query exported data from Azure Monitor using Azure Data Explorer - Azure Monitor | Microsoft Docs

 

The feature in the private preview channel is called "Search, Log Restoration and Archive Logs"

 

Regards

Brass Contributor

And new exciting private preview under way: "SOC Accelerator" leveraging a.o. ADX and making deployment and use much easier.

Brass Contributor

@Javier Soriano This article is about Sentinel data. That is of the same structure and format for all customers. Is there any resource or repository which contains the target table definition and corresponding mapping?
We are struggling with the creation of the tables and appropriate table mappings for the tables we have in Sentinel (about 100) and can't believe this hasn't been done before. Or maybe there is some automated way to create the correct ADX control statements?

Microsoft
Brass Contributor

Although not complete (e.g not all data type covered), this was an excellent starting point for us. We elaborated on that script for our own purposes and now have an easy way to create the whole chain from log analytics continuous data export via event hub to ADX data connectors.

If anyone is interested in our extended scripts, DM me.

Thank you for the pointer to this script, @Javier Soriano !

Co-Authors
Version history
Last update:
‎Mar 30 2022 08:02 AM
Updated by: