Security big data analytics with Azure Synapse and Microsoft Sentinel Notebooks!
Published Nov 02 2021 05:33 PM 7,400 Views
Microsoft

Until now, Jupyter notebooks in Microsoft Sentinel have been integrated with Azure Machine Learning. This functionality supports users who want to incorporate notebooks, popular open-source machine learning toolkits and libraries such as TensorFlow, as well as their own custom models, into security workflows.

 

We are delighted to announce that Microsoft Sentinel Notebooks now integrates with Azure Synapse Analytics for large-scale security analytics!

 

The new Azure Synapse integration provides additional analytic horsepower, enabling:

For example, you may want to use notebooks with Azure Synapse to:

  • Hunt for anomalous behaviors from large network firewall logs to detect potential network beaconing, or to
  • Train and build machine learning models on top of data collected from a Log Analytics workspace.

 

Synapse diagram.PNG

 

Why should you care?

In the era of Big Data, Artificial Intelligence, and Internet-of-Things, data collection volumes are ever growing. It’s common for many organizations to collect billions of security events and terabytes of logs per day for their daily security monitoring. These numbers are projected to only expand exponentially over time.

 

To stay ahead of cyber threats, leveraging big data analytics in cybersecurity is now more essential than ever. Big data analytics enables the fast processing of enormous amounts of log data. The combination of both scale and speed in data processing is critical for timely detection of anomalies and attack patterns. As a result, this reduces vulnerabilities and improves cyber resilience.

 

Building big data analytics can be challenging, or even impossible, without proper tools and infrastructure.

One requirement is having a data storage that is cost effective enough to store vast quantities of logs for long term. It should provide easy integration with multiple data stores and support different data formats, whether it’s structured, semi-structure or unstructured data. For this, we leverage Azure Data Lake Storage Gen2 which provides file system semantics, file-level security, and scale.

 

Another critical component is a parallel processing framework that supports in-memory processing to boost the performance of batch-based analytics. For this, we leverage Apache Spark in Azure Synapse Analytics, one of Microsoft's implementations of Apache Spark in the cloud. Azure Synapse makes it easy to create and configure a serverless Apache Spark pool in Azure. Spark pools in Azure Synapse are compatible with Azure Storage and Azure Data Lake Generation 2 Storage, so you can use Spark pools to process your security data stored in Azure. The Spark pools can be configured and enabled for data preparation and data processing directly via Microsoft Sentinel notebooks that run on Azure Machine Learning environment.

 

With deep integration with Spark technologies for Big Data and Azure Machine Learning, Azure Synapse enables you to build batch-based analytics on Microsoft Sentinel and external security data via Microsoft Sentinel notebooks. This capability customizes and complements the core Microsoft Sentinel hunting and investigation experience.

 

How does the integration work?

The Microsoft Sentinel Notebooks integration with Azure Synapse provides these capabilities:

 

CreateSynapse.png

 

  • A sample notebook that provides step-by-step, one-time configuration for the Azure Synapse environment. The notebook guides you to
    • set up a continuous data export pipeline from your Log Analytics workspace to Azure Data Lake
    • create a linked service between your Azure Synapse and Azure Machine Learning workspaces
    • and generate a Spark pool and register it to the linked service, the compute capabilities necessary for the data preparation and data processing steps.

ConfigureNotebook.PNG

 

This is just one example use case of how you can leverage the Synapse integration with Notebooks to hunt for anomalies on large historical dataset at scale.

 

BeaconingNotebook.PNG

 

Summary

I hope you find some inspiration from this sample security scenario and start building your own analytics using this integration.

Try out the new capability and let us know what you think!

 

Further reading resources:

Special thanks to @JulianGonzalez and @batamigfor reviewing the blog.

 

 

4 Comments
Iron Contributor

Have there been any updates to the integration of ML in Sentinel?

I'd love to use ML to identify outliers but I'm not too keen in getting deep into Jupyter notebooks at the python level.

Other platforms nicely bundle the ML part with auto-analysis and just leave it up to you to feed it data.

For the most part in security I would need a short list of pre-designed ML tools to identify and visualize outliers based on common entities (user, domain, signature, ip etc)
(but I wish I was as smart as you @Chi_Nguyen !)

Microsoft

@SocInABox There are already several areas in Sentinel where ML integration takes place. UEBA, Entity pages, and Fusion are a few built-in ML areas where you can just feed it with your own data and get the insights generated from the ML algorithms. SOC-ML is semi-customizable ML analytics that provides anomaly detection.

The integration with Azure Synapse and Sentinel Notebooks I detailed in the blog allows you to build even more custom analytics for use cases that aren't covered by the out-of-the-box analytics and may require data engineering and data science :). I hope that helps!

Iron Contributor

Thanks very much @Chi_Nguyen 

 

Question: What is the SOC-ML you're referring to? Or is that just a reference to the ML features in Sentinel you talked about?

 

Suggestion: It would be a great knowledge-accelerator for us non-data scientists to actually see someone walk through the above examples from beginning to end. If such videos exist or get created in the future I'd love to know about it.

 

Request: Also if you have any more examples of ML notebooks I'd like to know if there's a repo or something. I looked in the Sentinel repo and although there are a lot of notebooks I couldn't find any with good ML examples - likely I'm not looking correctly..

Thanks!

Microsoft

Thanks for the feedback, @SocInABox!

 

  1. SOC-ML refers to the customizable anomalies that are part of Microsoft Sentinel: here's an introductory blog and docs.
  2. That's a great suggestion and something we're working on! We already have video walkthroughs for some of our other Sentinel notebooks in some of our guided hunting blog posts (here's an example), and the process is much the same for the ML notebooks (it's just the code that is different, but the notebooks contain detailed instructions and contextual information).
  3. Another good suggestion - we've created an ML notebook folder in the Sentinel notebook repository: Azure-Sentinel-Notebooks/machine-learning-notebooks at master · Azure/Azure-Sentinel-Notebooks (gith... We're currently migrating existing ML notebooks into that folder, so more will appear soon!
    Alternatively, in the Sentinel UI, navigate to the template notebooks (under the Notebooks blade), then filter the notebooks by type to view the machine learning notebooks.
Co-Authors
Version history
Last update:
‎Nov 12 2021 11:21 AM
Updated by: