Blog Post

Microsoft Sentinel Blog
6 MIN READ

Automation: Integrate Azure Data Explorer as Long-Term Log Retention for Microsoft Sentinel

Sreedhar_Ande's avatar
Sreedhar_Ande
Icon for Microsoft rankMicrosoft
Oct 26, 2021

Out of the box, Microsoft Sentinel provides 90 days of data retention for free. In some parts of the world and within certain industries, there are regulations that organizations must adhere to which require data retention up to 7 years or longer. The current challenge is that the max retention for Log Analytics workspaces is 2 years. There has been a need for a solution that will allow for more time and cost saving for long-term retention.

 

This blog is going to detail how logs from Log Analytics workspace can easily be migrated into long-term storage using Azure Data Explorer (ADX) to comply with retention standards as well as reduce costs with the help of PowerShell script.

 

Thanks to Javier-Soriano for his inputs into this blog post

 

Azure Data Explorer (ADX)

Azure Data Explorer is a big data analytics platform that is highly optimized for log and data analytics. Since Azure Data Explorer uses Kusto Query Language (KQL) as its query language, it's a good alternative for Microsoft Sentinel data storage. Using Azure Data Explorer for your data storage enables you to run cross-platform queries and visualize data across both Azure Data Explorer and Microsoft Sentinel.

For more information, see the Azure Data Explorer documentation and blog

Log Analytics Data Export Architecture

The following image shows a sample flow of exported data through the Azure Monitor ingestion pipeline. Your data is directed to Log Analytics by default, but you can also configure it to export to an Azure Storage Account or Event Hub

 

 

When configuring data for export, note the following considerations:

Data export architecture

Consideration

Details

Scope of data exported

Once export is configured for a specific table, all data sent to that table is exported, with no exception. Exported a filtered subset of your data, or limiting the export to specific events, is not supported.

Location requirements

Both the Azure Monitor / Microsoft Sentinel workspace, and the destination location (an Azure Storage Account or Event Hub) must be located in the same Azure region.

Supported tables

Not all tables are supported for export, such as custom log tables, which are not supported.

For more information, see Log Analytics workspace data export in Azure Monitor and the list of supported tables.

Combining the Data Export feature and ADX, we can choose to stream our logs to Event Hub and then ingest them into ADX. The high-level architecture would look like this:

With this architecture, we have the full Microsoft Sentinel SIEM experience (Incident management, Visual investigation, Threat hunting, advanced visualizations, UEBA, etc.,) for data that needs to be accessed frequently (for example, 6 months) and then the ability to query long term data accessing directly the data stored in ADX. On top of this, we can also access ADX data directly from within the Microsoft Sentinel logs screen, using adx keyword in KQL, so we don't have to leave the Azure portal; see details about this feature here

Challenges:

  1. Use the Azure Data Explorer Web UI to create the target tables in the Azure Data Explorer database. For each table you need to get the schema and run the following commands which consumes lot of time for production workloads
    • Create target tables: Table that will have the same schema as the original one in Log Analytics/Microsoft Sentinel.
    • Create table raw: The data coming from EventHub is ingested first to an intermediate table where the raw data is stored. At that time, the data will be manipulated and expanded. Using an update policy (think of this as a function that will be applied to all new data), the expanded data will then be ingested into the final table that will have the same schema as the original one in Log Analytics/Microsoft Sentinel. We will set the retention on the raw table to 0 days, because we want the data to be stored only in the properly formatted table and deleted in the raw data table as soon as it’s transformed. Detailed steps for this step can be found here
    • Create table mapping. Because the data format is json, data mapping is required. This defines how records will land in the raw events table as they come from Event Hub. Details for this step can be found here
    • Create update policy and attach it to raw records table. In this step, we create a function (update policy) and we attach it to the destination table so the data is transformed at ingestion time. See details here. This step is only needed if you want to have the tables with the same schema and format as in Log Analytics.
    • Modify retention for target table. The default retention policy is 100 years, which might be too much in most cases. With the following command we will modify the retention policy to be 1 year: 
      • .alter-merge table <tableName> policy retention softdelete = 365d recoverability = disabled
    • Thanks Javier-Soriano for automating the Step #1
  2. To stream Log Analytics logs to Event Hub and then ingest them into ADX, you need to create EventHub Namespaces,. For small to medium deployments, you would normally use Event Hub Standard SKU. This SKU has a limit of 10 event hub topics per namespace, so you would need to create more namespaces if you need to export more than 10 tables.
    • Log Analytics Data Export rule also support 10 tables per each rule.
    • You can create 10 Data Export rules targeting 10 different EventHub Namespaces.
    • Note: For medium to large deployments dedicated EventHub Cluster might be cheaper.
  3. Once Raw & Mappings tables and EventHub Namespaces are in place, you need to create “Data Export” rule using Azure CLI or REST API manually.
    Note:
    • Azure portal or PowerShell are not supported yet
    • Data Export rules creates “EventHubTopics” in EventHub Namespaces - ~20 min
    • You will see EventHub Topics for active stream of logs in those tables – tables which don’t have logs, EventHub Topic will not create – It will be created as soon as it has fresh data
  4. Once EventHub Topics (am-<<tablename>>) are available – you need to create “Data Ingestion or Data Connection” rules in ADX Cluster for each table by selecting appropriate EventHub Topic, TableRaw and TableRawMappings
  5. Once Data Connection is success – you will see the data flowing from Log Analytics to ADX ~ 15 min.

Note: If Data Export rule is created today, only the data from the time of Data Export rule creation will be moved to ADX. Data before Data Export rule creation will not be moved.

 

Automation: Integrate Azure Data Explorer (ADX)

  1. Script prompts user to input values for the following parameters:
    • Log Analytics Workspace Name
    • Log Analytics Resource Group
    • ADX Cluster URL
    • ADX Resource Group
    • ADX DB Name
  2. Once user is authorized, it provides two options:
    • Export all the tables from Log Analytics.
    • Enter Log Analytics Table Names (Case-Sensitive) separated by commas (,)
  3. Script verifies whether tables from Log Analytics or User Input is supported by “Data Export” feature, for all the un-supported tables it will skip the next steps.
  4. Create target table, Raw & Mappings Script will create Tables along with Raw and Mappings with retention and update policies described in Challenges #1.
  5. Create EventHub Namespaces In this step, script will create EventHub Namespaces by dividing the total number of tables by 10
    • Note: Event Hub Standard tier has limitation of having 10 EventHub Topics.

  6. Create Data Export Rule. In this step, the script will create Data Export rules for each EventHub Namespace with 10 Tables each.

    • Note:

      • Based on the output from Step #4, script will create “Data Export” rules for each 10 Tables

      • LA supports 10 Data Export rules targeting 10 different EventHub Namespaces i.e., you can export 100 tables using 10 Data Export rules

  7. Create data connection between EventHub and raw data table in ADX. In this step, the script will iterate all the EventHub Namespaces and retrieve EventHub Topics and creates ADX Data connection rules specifying the target raw table, mapping table and EventHub Topic

Download

To download this integration Tool, navigate here and click on Download button.

 

Summary

We just walked through the automation process of how to integrate Azure Data Explorer by using the Azure Log Analytics Data Export feature and Azure EventHub’s to build successful long-term storage retention for your security data to comply with retention standards as well as reduce costs.

Updated Nov 09, 2023
Version 4.0
  • Clive_Watson's avatar
    Clive_Watson
    Bronze Contributor

    You need to read the "Archive and Restore" section, here Pricing - Azure Monitor | Microsoft Azure 

    If you are archiving for compliance and are very unlikely or infrequently going to restore the minimum of 2TB of data at a time, then archive maybe right for you, if you need frequent access to the data you need to calculate how often and how much data you are likely to restore (use the Azure Pricing Calculator) vs. holding it in ADX.  

     

     

  • Consultant1520's avatar
    Consultant1520
    Copper Contributor

    With the Log analytics providing the "Archival" functionality. Do we really need to store the data in ADX ? Or Log analytics "Archival" functionality can be used for storing the data for long time retention? 

    We can store the data in "analytics table" for 3-6 months and for compliance purpose telemetry can be stored in "Archival" tables? 

    Would it be cost effective solution then exporting the logs to "ADX" ?

  • SocInABox's avatar
    SocInABox
    Iron Contributor

    Hi there, thanks for the information!

    What if I want to forward all of my syslog data directly to ADX and not to Sentinel?

    i.e. feed syslog to ADX and just use kql to query it occasionally from Sentinel?
    Can this be done simply with a DCR pointing to the ADX cluster? (assuming AMA is set up on a syslog server already).

    What about CEF?

    Thanks very much.