**UPDATE: Microsoft Sentinel now offers an easier and more efficient method to achieve long term storage. This solution is now out of date. Please refer to the recommended solution: What's New: Search, Basic Ingestion, Archive, and Data Restoration are Now in Public Preview - Micro...**
*Attention: There is an official solution that achieves the goal of this solution. Please refer to these documents on the current feature that can be used:
Logic App: https://docs.microsoft.com/en-us/azure/azure-monitor/platform/logs-export-logic-app
Continuous export: https://docs.microsoft.com/en-us/azure/azure-monitor/platform/logs-data-export?tabs=portal
Out of the box, Microsoft Sentinel provides 90 days of data retention for free. In some parts of the world and within certain industries, there are regulations that organizations must adhere to which require data retention up to 7 years or longer. The current challenge is that the max retention for Log Analytics workspaces is 2 years. There has been a need for a solution that will allow for more time and cost saving by moving logs to cold storage. This blog is going to detail how logs from a Log Analytics workspace can easily be moved into long-term cold storage in order to comply with retention standards as well as reduce costs using this Playbook.
This post is going to be in-depth as it breaks down how the Playbook is going to operate.
TL:DR The end result once the Playbook is run is a folder system with Blobs within a storage account and container. The folders are labeled with and contain the data types that have been chosen for back up. Each Blob in the folder will contain a backup of logs for each data type in hourly blocks. Each Blob can be queried in a Log Analytics workspace using the externaldata operator and a SAS token URL generated for the Blob.
Link for the Playbook if needed: https://github.com/Azure/Azure-Sentinel/tree/master/Playbooks/Move-LogAnalytics-to-Storage
*Note: If your Blobs are a size of 2B, there was no information for that time block or data type.
For auditing and investigative purposes, raw data and logs may need to be stored long term for regulatory compliance. This can be achieved through a Playbook that queries the logs from the workspace that are about to expire and moves them to a storage account of your choosing. The Playbook utilizes Log Analytics API when performing the query. An important piece of information to note is that the API has a limitation of 500,000 rows of data per request. This means that if any data type contains more than 500,000 rows of information, it will require the Playbook to run a Log Analytics API pull more than once to get the data. In order to avoid this issue, the Playbook breaks up data for each day into hourly blocks in order to efficiently and carefully back up all desired data. Each time that the Playbook is run, it reviews each data type within your workspace that should be backed up, goes through the logs for those data types, and moves the logs to storage, labeling each backup with the data type, the date of the log, and the hour for the time block.
The app is comprised of many steps and variables in order for it to work. The pieces are the following:
Depending on how much data is within each table, the Playbook can take 2 to 10 minutes per table depending on how much each table has.
Deployment of the Template:
There are two options for deploying the template:
If using the manual option, please refer to the README for the Playbook.
Using an Existing Storage Account:
In the event that an existing storage account is preferred, it can be set up within the Playbook. The following must be done:
The Playbook is going to need proper permissions to run. It requires at least Blob Storage Contributor in order to create new Blobs within the data container. In order to provide the permissions:
Query the data:
Once the data is in storage, it is still possible to query the day but in a smaller capacity. Querying the data now uses an operator called ‘externaldata’ which requires you to use a SAS token URL generated by the Blob in order to pull the data from it. The process also requires that each column be defined so that it can properly map the data to the correct column. An example of what the query would look like would be:
This query is going to pull the AzureActivity information from the Blob and maps the data to the associated columns. It is recommended to base your schema on the existing tables that were backed up in order to avoid any issues with parsing the data. In this case, AzureActivity was used as the reference for the associated logs that were in storage.
More information about external data: https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/externaldata-operator
With this Playbook, you now have control over logs being moved into long-term storage. This will be handy for existing data that is in the workspace that Sentinel is using when the retention period for the data runs out. This, in combination with the ability to query data from cold storage, allows for regulatory compliance and reduced costs while maintaining Sentinel log and Log Analytics usage for business operations.
This solution would not have been possible without the effort and great help from @Matt Egen, @Chris Boehm , and Rin Ure.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.