Incident triage is a core component of security monitoring operations and ensuring triage processes are efficient and effective is key to detecting security threats. Recent high profile security incidents have shown that detecting threats is insufficient unless effective triage and investigation of them is conducted. In this blog we detail how to deploy and use a solution that allows for the automatic execution of Jupyter Notebooks to provide enrichment to incidents within Azure Sentinel. This process allows security analysts to triage incidents more quickly and effectively, as well as ensuring a consistent, quality approach is taken.
The objective of this solution is to reduce the time and effort required for a Security Operations Center (SOC) analyst to triage an incident within Azure Sentinel and help ensure a consistent approach is taken to each incident. This is done by automatically executing Jupyter notebooks that perform a set of pre-defined actions on the incident like those conducted by an analyst when triaging an incident. It provides three main benefits:
We refer to this approach of using notebook patterns to define and execute these processes as Software Defined Monitoring. To learn more about this approach please watch this recent webinar we presented on the subject
This document covers the end-to-end process to deploying this solution within an Azure subscription including all the requisite components for the automated notebook elements. Along with this document are two separate Jupyter Notebooks and an ARM template for deploying the required VM. The first notebook is the ‘AutomatedNotebooks-Manager.ipynb’ and the other is ‘AutomatedNotebooks-IncidentTriage.ipynb’’. These notebooks, along with the ARM template and a Python requirements firl can be downloaded from GitHub:
In addition, these resources the following are pre-requisites:
The core of the solution is an Azure VM that runs several Jupyter notebooks. The Manager notebook programmatically gets details of incidents from Azure Sentinel, if these match a set of criteria it then runs another notebook that performs triage and enrichment based on the entities attached to that incident. That completed triage notebook is then written to an Azure ML workspace, with a link to the notebook added as a comment to the incident in Azure Sentinel. From there the SOC analyst can follow the link to view and interact with the completed triage notebook. In addition, depending on the findings in the notebook the severity of the Incident is updated in Azure Sentinel.
To deploy the solution, some configurable variables are first required:
1. The Azure Tenant ID where the resources being used are. More details.
2. The Subscription ID where your Azure Sentinel Workspace is deployed.
3. The Resource Group name where your Azure Sentinel Workspace is deployed.
4. The Workspace Name of your Azure Sentinel Workspace.
5. The Workspace ID of your Azure Sentinel Workspace.
6. The Subscription ID where you Azure ML Workspace is deployed.
7. The Resource Group name where your Azure ML workspace is deployed.
8. The Azure ML Workspace name.
Note: in this example the workspace name is “AzureMLWorkspace”
9. The name of the Key Vault being used (see pre-requisites).
10. Another variable required is an Access Key for the Storage Account used by your Azure ML Workspace. Due to the sensitivity of this key we will store it in KeyVault in order to keep it secure. To find the storage account associated with your Azure ML workspace, find the Azure ML resource in the Azure Portal and browse to the Overview tab. Listed here will be a storage resource:
Clicking on that resource will open it in the Azure Portal. From there select the Access Keys and you will be presented with two access keys. Select the Key value from one of these (it doesn’t matter which one you use), and then add that as a Secret in your KeyVault. When adding the Access Key make a note of the name you give the secret as it will be needed later in the set up. (Do not paste the storage key into notebook).
Once the above elements have been collected the “AutomatedNotebooks-Manager.ipynb” notebook needs to be updated to include these values. They are all set in a single cell near the top of the notebook, simply open the notebook[i] and replace the placeholder values with those collected above. The cell includes comments that detail where each value should go.
You will see sections in this cell for details of an Azure Storage Queue, these are optional setting which are covered later in this document (see the Queue Management section).
The technology used to run the automated notebooks is Papermill. A dedicated Azure IaaS VM will be deployed to run the Papermill tasks. For this documentation we will document how to deploy an Ubuntu Linux host; however, the solution could also be deployed on a Windows host.
To make deployment easier we have created an ARM template that deploys the VM and configures some of the required identity elements. If you want to deploy using this method the ARM template can be downloaded from GitHub, and you can find instructions on how to deploy it here.
During deployment you will be asked to provide a number of parameters, these include:
When deploying the ARM template you will be asked to provide these variables on the following page:
If you deploy the VM via ARM you can skip ahead to the ‘Deploying KeyVault Access’ section.
When deploying Azure VMs we recommend that you follow best practice and secure access to the VM using Just In Time Access and Azure Defender.
Detailed instructions on deploying a Linux VM in Azure can be found here. We recommend that a SKU with at least 2 vCPUs and 4 GiB memory is used.
Once the VM is deployed its needs assigning a Managed Identity. This identity will be used by the Papermill process to access Azure Sentinel, as well as secrets stored in Azure Key Vault. This can be configured by browsing to the Azure VM created previously in the Azure Portal and selecting the Identity tab. From here select System Assigned and set the Status to On. Once enabled, you need to grant some required permissions to the VM managed identity. To do this, select Azure role assignments
The first permission is to access Azure Sentinel. The automated notebooks need to access incident details, query logs to gather context, and update incidents based on the output of their analysis. As such Azure Sentinel Responder role permissions are required. More details about this role can be found here. Currently, this role cannot be set directly on the Azure Sentinel workspace, so the role must be scoped at the Subscription or Resource Group level. We recommend that the Resource Group is used as it’s the lowest level of access available. Ensure that you select the Resource Group that contains the Azure Sentinel workspace you want to use.
Finally, the papermill process needs to be able enumerate resources associated with the Azure ML Workspace being used. This is needed to locate the file store used by the Azure ML workspace so that executed notebooks can be written there. Therefore, the VM Managed Identity needs the Reader role assigned for the Resource Group that contains your Azure ML workspace.
Regardless of how you deployed the VM you will need to manually configure an additional managed identity to access the Key Vault that we are using so that it can retrieve the secrets stored there. To do this open the VM you deployed in the Azure Portal and select the `Identity` section. From there select `Azure Role Assignments` and `Add Role Assignment`. As Key Vault is a specific resource available for Managed Identity role provision, you can select the specific Key Vault you are using for this solution[i] once you have selected `Key Vault` as the scope. The `Key Vault Secrets User` role is required, more details on this role can be found here.
More details about Managed Identities can be found here.
Note: your KeyVault needs to be configured for Role Based Access Control – more details can be found here: https://docs.microsoft.com/en-us/azure/key-vault/general/rbac-guide?tabs=azure-cli
With the VM deployed and the correct permissions assigned via a Managed Identity the next step is to install Papermill and the other required packages on our VM host. These require Python 3 to be installed first, if you deployed a Linux Ubuntu VM this will be installed by default and packages can be installed immediately. If another OS was deployed, you may need to first install Python 3.
The installation of packages will be done using pip; this first needs to be installed if not present. On an Ubuntu VM this can be done by running the command:
sudo apt install python3-pip
Note: you may need to first run `sudo apt update`
python3 -m pip install -r autonb-requirements.txt
Once installed, you need to ensure you add the papermill package to your $PATH and restart the terminal to ensure its available via the CLI.
Once the packages are installed you will also need to configure a kernel to execute the notebooks with. This is done with ipykernel which you should have just installed with the above commands. You can create a new kernel called ‘papermill’ (this is the default kernel used the notebooks) with the following command:
python3 -m ipykernel install --user --name papermill
Once papermill and the required packages are installed and they kernel created, copy the two notebooks(‘AutomatedNotebooks-Manager.ipynb’ and ‘AutomatedNotebooks-IncidentTriage.ipynb’’) to the host. These should be stored in the same folder.
In order to use threat intelligence providers as part of the incident triage notebook a msticpyconfig.yaml file containing details of those threat intelligence providers is required on the VM deployed. This should be placed in the same folder as the ‘AutomatedNotebooks-Manager.ipynb’ notebook and only needs to contain keys for TI providers and the incident triage notebook will use all primary providers configured. If you are using KeyVault to store these secrets you will also need to ensure that you assign the VM `Key Vault Secrets User` access to the KeyVault these are stored in as well.
More details on the msticpyconf.yaml file and how to set it up can be found in the MSTICPy documentation.
Once the papermill configuration is complete and the notebooks set up, you can schedule the ‘AutomatedNotebooks-Manager.ipynb’ to be run on a regular basis. This notebook will check for new incidents and run the triage notebook against them. Scheduling is done by simply using the OS’s build-in scheduling service, in this case cron. The precise schedule to run this notebook can be tuned depending on your requirements, however, for the most immediate response to new incidents being created we suggest that this be set to run every 10 minutes.
The scheduled command needs to a) navigate to the folder containing the notebooks and b) execute the ‘AutomatedNotebooks-Manager.ipynb’ notebook. An example cron entry to run the notebook every 10 minutes would be:
*/10 * * * * cd <path to notebooks folder> & papermill “AutomatedNotebooks-Manager.ipynb” SchedulerOut$
Once the schedule is set up, you will start to see incidents with comments that provide a link to a notebook in Azure ML. Only notebooks that include a significant finding are attached to incidents, otherwise the notebook is simply discarded.
By default, the automated notebook process will run against all incidents raised in your Azure Sentinel Workspace. However, if you wish to only run the process against a subset of incidents, you can use a method that leverages Azure Storage Queues. Rather than pull all incidents from Azure Sentinel, the “AutomatedNotebooks-Manager.ipynb” notebook can be configured to pull selected incident IDs from the queue and run the automated notebooks against only those incidents. This following are entirely optional steps that are needed only if you want to use the queue method.
For ease of management, we suggest that you create a Storage Queue in the Storage Account used by your Azure ML Workspace (see details earlier in the document to find this). If you choose to use another storage account, ensure that the VM’s Managed Identity is granted access to the Storage Account.
To create a queue, navigate to the Storage Account in the Azure Portal and select Queues. From here you can add a new Queue.
Once the Queue is created you will need to add the Queue name and the Storage Account name to the “AutomatedNotebooks-Manager.ipynb” notebook in the same cell that other variables were added to. In addition, you will need to update the Managed Identity assigned to the Azure VM to add a role of Storage Queue Data Reader scoped to the storage account where the Queue is deployed. More details on this role can be found here.
The “AutomatedNotebooks-Manager.ipynb” notebook contains the code to enable the collection of Incidents from the Queue but it is commented out by default. To use the queue method, uncomment that code and comment out or delete the cell above it that gets all incidents via the Azure Sentinel API.
Once the queue is created you can filter the incidents from Azure Sentinel that you wish to be added to the queue and thus processed by the automated notebooks. This is done with Azure Sentinel Automation Playbooks. For details in creating Playbooks please refer to this documentation.
The required playbook needs only two steps:
Once the playbook is created, configure which analytics rules you want auto triaged and configure these to trigger this playbook when an incident is created. Details on how to configure this can be found here. This will then write the incident ID to the queue so that it will be picked up by the “AutomatedNotebooks-Manager.ipynb” notebook.
Once this step is complete and the playbook is attached to one or more analytics, then the solution is configured. The next time a specified incident is triggered the automated notebook solution will trigger.
Once the automated notebooks are configured you will start to see triaged incidents appearing in your Azure Sentinel instance. You can identify triaged incidents by the presence of a comment in the incident with a link to a notebook:
To access the completed triage notebook simply click the link in the comment and you will be directed to Azure ML.
Note: the analysts needing to access the triage notebooks will need access to the Azure ML workspace configured in this process.
Azure ML will open with the triage notebook automatically, the analyst can then browse the contents of the notebook without needing to interact with the notebook itself, they can simply scroll down to see the output:
Notebooks from other incidents are also accessible via the navigation pane on the left hand side of the Azure ML interface. Each notebook is stored with the name of the incident GUID they relate to.
The AutomatedNotebooks-IncidentTriage.ipynb and AutomatedNotebooks-Manager.ipynb notebooks can also be modified to include additional triage steps or update actions as required. By default the process enriches entities attached to the incident and only updates the incident severity, however it is possible to perform triage on other elements of the incident and update additional elements automatically. See MSTICpy for details of functions and features that could easily be added to these notebooks.
During execution of the “AutomatedNotebooks-Manager.ipynb” notebook a log of activity is written to the file “notebook_execution.log” in the same folder as the AutomatedNotebooks-Manager.ipynb. This provides details of execution flow and which incidents were processed, as such it should be the first thing you check when troubleshooting.
If not using the Queue method for incident triggers the mostly likely cause of issues is with the notebooks themselves. The easiest way to troubleshoot these is to run them individually and inspect the output. To trigger the “AutomatedNotebooks-Manager.ipynb” notebook you can manually invoke using Papermill via the CLI with the following command:
papermill “AutomatedNotebooks-Manager.ipynb” -
You will see the notebook output in the terminal to allow you to debug it, alternatively replace the `-` in the command with a file name to write out to the file specified.
If this notebook executes as expected, then you can also check the incident triage notebook itself. To do this select an incident in your Azure Sentinel workspace that has some entities attached and get the incident ID. This can be found in the incident view of the Sentinel portal as part of the Incident link. (The ID required is the GUID at the end of the full link text).
From there you can trigger that notebook from the CLI with:
papermill “AutomatedNotebooks-IncidentTriage.ipynb” debug.ipynb -p incident_id "<INCIDENT ID>" -p ws_id "<YOUR WORKSPACE ID>" -p ten_id "<YOUR TENANT ID>"
This will run the triage notebook with the Incident ID provided and will write the resulting notebook a file called `debug.ipynb`. This file is a complete copy of the original notebook but with all of the execution results (including any errors). You can open the debug file in Azure ML or other Jupyter notebook environment to check for any execution issues. You can view the file as raw text but the native JSON format makes it difficult to read. You can also convert the notebook to an HTML document using the nbcovert tool.
If you are using the Queue method, you will also want to ensure items are being properly passed to the Queue. This is easily done by browsing to the Queue resource in the Azure Portal. If functioning correctly and the attached incidents have occurred, then the queue should contain full incident links that should appear in the same format as: /subscriptions/796fca0e-7703-476a-9d66-a65d3a7825dd /resourceGroups/Sentinel/providers/Microsoft.OperationalInsights/workspaces/sentinelworkspace/providers/Microsoft.SecurityInsights/Incidents/f9e57a1f-8d1a-4efa-a165-4a48c2b2c46e.
Should you encounter issues with this solution please raise an Issue on the Azure-Sentinel-Notebooks GitHub repo.
In this blog we have seen how it is possible to use open source software and Azure services to easily automate the process of executing Jupyter Notebooks linked to Azure Sentinel. This approach can vastly improve the efficiency and effectiveness of SOC operations, as well as forming the core of a software defined monitoring approach. Whilst in this blog we have shown how this process can be used for triaging incidents and supporting first line SOC operations the same pattern could be applied to virtually any security monitoring scenario that involves the repetition of a set of analytical steps, whether it be enrichment of datasets, custom analytics using Python specific features, or threat intelligence processing.
[i] This should be the same KeyVault you set up as a pre-requisite to this solution.
[i] VSCode is recommended for this
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.