Azure Sentinel, among the most advanced SIEM solutions, is deeply infused with Machine Learning (ML), providing unparalleled richness of built-in, advanced ML analytics, covering the prevalent threats and data types connected to the SIEM. Now, the same richness of capabilities is made available to the data scientists in organizations, extending the reach to unique customer threats, providing Azure Sentinel customers the ability to build their own ML models.
Built-in analysis using machine learning, like ‘Fusion’ ML detections and entity enrichment, is already available in Azure Sentinel, identifying advanced threats on well-known data feeds, while maintaining a low level of alert fatigue. See this blog to learn more about Fusion.
Many organizations need to extend the advanced analysis capability to the myriad of threats applicable to their organization or industry vertical. Azure Sentinel makes it easier for data scientists in these organizations to unlock these insights with a BYO-ML framework.
Azure Sentinel provides a threat detection framework for custom ML, including platform, tools, and templates, to accelerate development of models for custom business analysis, leveraging Microsoft’s shared algorithms and best practices.
Azure Sentinel integrates its bring-your-own-ML feature with familiar development environments, tools, and programming languages, like Azure Databricks, Spark, Jupyter Notebooks and Python, to reduce the learning curve and development time to the minimum. The framework comes with cloud scale data pipelines, seamlessly integrating the ML development and runtime environment with Azure Sentinel service and user experience, significantly reducing the time to value from custom ML analysis.
Using the BYO-ML framework, customers can:
Train an ML algorithm shared by MS or community with your own data, to customize the ML model to your environment
Modify an ML algorithm and/ or features to customize the model, adapting it to your organization
Create new model from scratch, or import existing models, leveraging Azure Sentinel’s BYO ML framework and tools
Operationalize the model in Databricks/ Spark environment, leveraging the framework to seamlessly integrate your ML environment with Azure Sentinel user experience and operational flow (detections, investigation, hunting and response)
Share the ML algorithm for the community
This article provides an overview of BYO ML framework and recommended usage of its components. For detailed documentation please refer to Azure Sentinel feature documentation.
Azure Sentinel features of the BYO-ML framework
The BYO-ML framework includes the following features:
Anomaly detection model template: an example of an end to end template for a generic algorithm for ‘access to a network share’ anomaly. The example is provided with pre-trained data on Windows file share access log, that a customer is encouraged to retrain on their custom data.
Template for Model Training: an example of notebook to train the algorithm, build and save the model
Template for Model Scoring: an example of notebook to run the model on schedule, scoring new data
Data pipelines between Sentinel and ML environments: used for exporting Azure Sentinel data to the Spark execution environment for model training and scoring and importing the scored results back into Azure Sentinel tables
Utilities for integration with Sentinel: support for enabling the integration between the environments
Enrichments: making entity enrichment information like peer metadata and blast radius available to the custom ML models for refinement
The BYO ML package shared in Github, includes utilities and a model example showcasing the best Microsoft ML practices and research for security.
BYO-ML environment for Azure Sentinel
To build custom ML models on your data, you have 2 options.
For smaller amounts of data, like alerts and anomalies, you can use Azure ML to run models hosted in the Azure Sentinel Notebooks (new menu option currently in Preview).
The Azure ML experience is launched from this Azure Sentinel experience, with the selected Notebook loaded in the ML environment, operating to the data in the Azure Sentinel tables.
You can use this option on a small sample of large data as well, to ease the initial steps of development.
For development and operationalization of models built on larger data, like analyzing feeds of raw data, you will need to make this data accessible to the ML model in Azure Databricks.
Apache Spark™ provides a unified environment for building big data pipelines. Azure Databricks builds on this environment, providing a zero-management cloud platform, holistically addressing the platform needed for data analysts to develop their custom ML based security analysis.
You can either bring your raw data directly to the Azure Databricks ML environment, via EventHub or Azure Blobs or you can use the capabilities provided with Azure Sentinel, to export the data from Azure Sentinel Log Analytics tables. Regardless of the export methods used for raw data, you can use the libraries provided by BYO-ML framework to import the scoring of the ML model back into Sentinel Log Analytics tables for further processing and creating incidents.
On Azure Sentinel roadmap, we plan to support Azure Synapse in addition to Azure Databricks as the BYO-ML development environment.
You can either set up a new Azure Databricks environment or use one already set up for other use. To set up a new Databricks environment, please reference to the quickstarts document (note that MMLSpark used by our algorithm requires Spark 2.4.5).
Once the environment is set, export your data from Azure Sentinel into the BYO-ML environment, using the Azure Command Line Interface (CLI). See Azure Sentinel BYO-ML feature documentation for details.
BYO-ML Example Walkthrough: Anomalous File Share Access Detection
The BYO ML package includes an example of an ML model, leveraging the framework to deliver a customized ML detection.
The example is an ML algorithm template for “anomalous resource access” detection. It is based on collaborative filtering algorithm, trained and tested on Windows file share access logs, containing security events of users and resources accessed by them (Event ID 5140).
The example includes a notebook used to train the model (AnomalousRATraining.ipynb) on your specific organization data.
An additional notebook, AnomalousRAScoring.ipynb, allows you to adjust the model features to the anomalous behavior and accepted level of noise in your organization. You can use it to evaluate the scoring results, adjusting model thresholds, filtering out known benign users and ranking the anomalous ones.
Optionally use visualization capabilities in the Notebook, to check the results of the iterations and adjust to an acceptable noise level.
Once you are satisfied with the scoring results, use the same Notebook to write the results back to the Log Analytics workspace associated with your Azure Sentinel instance.
With the scored data in the Azure Sentinel tables, you can use the standard analytics development experience to create your custom rule(s), as you would do for the rest of your custom KQL queries.
BYO-ML is currently in Public Preview. If you have any questions or feedbacks, please reach out to firstname.lastname@example.org.