Blog Post

Microsoft Sentinel Blog
6 MIN READ

Unlocking the power of Notebooks with Microsoft Sentinel data lake

Zeinab Mokhtarian Koorabbasloo's avatar
Feb 18, 2026
Co-authors: Vandana Mahtani, Ashwin Patil

Security operations are rapidly evolving, driven by AI and the need for scalable, cost-effective analytics.  A key differentiator of the Sentinel data lake is its native integration with Jupyter Notebooks, which brings powerful data science and machine learning capabilities directly into security operations. Analysts can move beyond static queries to run interactive investigations, correlate large and diverse datasets, and prototype advanced analytics using familiar tools and languages.

By combining notebooks with Sentinel’s security context, teams can build custom detection logic, enrich investigations with ML models, and automate complex workflows. The result is faster insights, deeper analysis, and more efficient security operations, enabling SOC teams to innovate and respond at the speed required by today’s threat landscape.

Hunt with Sentinel notebooks

Notebooks in Sentinel data lake give security teams a powerful, interactive way to investigate and hunt across their security data at scale:

  • Query and analyze massive datasets: Run Spark queries across months or years of security telemetry (higher thresholds compared to KQL), uncovering slow-moving threats and persistent attack patterns.
  • Automate threat hunting: Schedule recurring jobs to scan for matches against newly ingested indicators of compromise (IOCs), enabling continuous detection and investigation.
  • Build and operationalize ML models: Use Python, Spark, and built-in libraries to create custom anomaly detection, alert enrichment, and predictive analytics workflows.
  • Enrich alerts and investigations: Correlate alerts with firewall, Netflow, and other logs—often stored only in the data lake—to reduce false positives and accelerate triage.
  • Collaborate and share insights: Notebooks provide a transparent, reproducible environment for sharing queries, visualizations through python libraries like plotly that is not natively available in Sentinel, and sharing findings across teams.

Cost-Efficient, Scalable Analytics

Sentinel data lake’s tiered storage and flexible retention means you can ingets and store all your raw telemetry—network logs, firewall data, and more—at a fraction of the cost of traditional solutions. Notebooks help you unlock the value of this data, transforming raw logs into actionable insights with minimal manual intervention.

Notebooks and KQL Jobs in Microsoft Sentinel data lake

Both notebooks and KQL jobs enable teams to query and analyze data within Microsoft Sentinel data lake, but they serve very different purposes.

Dimension

Notebooks (Spark Runtime)

KQL Jobs (Data lake KQL Engine)

Execution Model

Distributed compute using Apache Spark; ideal for heavy ETL, transformation, or ML workloads; supports programmatic querying

Query execution using the KQL engine optimized for analytical queries over structured datalake tier data.

Language & Flexibility

Full Python ecosystem (Pandas, PySpark, MLlib etc.) out of the box in the cluster; ideal for data wrangling, ML, and automation pipelines.

Familiar KQL syntax — purpose-built for log analytics, filtering, and aggregation. Best for converting your expensive queries in the pipelines

Data Access

Direct access to raw and curated tables stored in data lake tiers. Can join across multiple workspaces or schemas.

Access to data lake tier tables – which includes mirrored tables from Analytics tier as well as curated table from other jobs.

Performance & Scale

Highly scalable distributed compute and transformation-heavy workloads.

Optimized for low-latency query response and cost-efficient read operations. Ideal for investigative queries.

Use Case Fit

Advanced analytics, feature engineering, baselining, anomaly detection, enrichment pipelines.

Operational queries, scheduled detections, and validation jobs.

Automation

Can be orchestrated via scheduled Spark jobs within the VSCode extension. Supports end-to-end ETL + ML automation via Python and Spark notebooks.

Can be scheduled and parameterized for recurring jobs (e.g., daily data quality checks or detection lookups).

Collaboration & Reproducibility

Shared notebooks with code, outputs, and markdown for team review and reproducibility.

Shared job definitions and saved query templates can be maintained with version control; less narrative, more operational.

Visualization

Leverage advanced libraries (Plotly, Seaborn, Matplotlib) for rich visuals- all available in spark compute cluster.

Jobs will output to tables and then can be used via KQL rendering (timechart, barchart) for validation or quick insights.

Extensibility

Currently limited libraries (Azure Synapse libraries 3.4) but support will be extended to Bring-your-own libraries and Python dependencies post-Fabric integration.

Limited to native KQL functions; extensibility via job scheduling and data connections.

Skill Profile

Data scientist / advanced security analyst / data engineer.

Detection engineer / SOC analyst / operational analytics.

Cost model

Advanced data insights meter based on vcore compute consumed Microsoft Sentinel Pricing | Microsoft Security

Data lake query meter based on the GB processed Microsoft Sentinel Pricing | Microsoft Security

 

In practice, modern SOC teams increasingly treat notebooks and KQL jobs as complementary, not competing.

  1. KQL for signal discovery → Notebook for pattern analysis

Use Defender Out-of-the box rules or custom detections to surface an interesting low to medium fidelity signal (e.g., spike in hourly failed logons across tenants compared to baseline). Then pivot to the notebook for historical trend analysis across six months of data.

  1. Notebook for enrichment → KQL for operationalization

A notebook creates a behavioral baseline table stored in the data lake. A KQL rule consumes that dataset daily to trigger alerts when deviations occur.

  1. Notebook pipelines → data lake → analytics tier → dashboard

A scheduled notebook curates and filters raw logs into efficient partitioned data lake tables. These tables are then used via lake explorer for ad-hoc hunting campaigns. Once the workflow consistently provides good true positives, elevate to analytic tier and set up custom detection to operationalize it for near real time detection.

Together, these workflows close the loop between research, detection, and response.

Getting Started: From Query to Automation

The enhanced notebook experience in Sentinel data lake makes it easy to get started:

  1. Author Queries with IntelliSense: Benefit from syntax and table name suggestions for faster, error-free query writing.
  2. Schedule Notebooks as Jobs: Automate recurring analytics, such as hourly threat intelligence matching or daily alert summarization.
  3. Monitor Job Health: Use the Jobs dashboard to track job status, completions, failures, and historical trends.
  4. Leverage GitHub Copilot: Get intelligent assistance for code authoring and troubleshooting. Use GitHub copilot plan mode to come up with boiler plate code for complex workflow notebooks and collaborate with Copilot to refine further to you use case.

For a step-by-step guide, see the public documentation Exploring and interacting with lake data using Jupyter Notebooks - Microsoft Security | Microsoft Learn.

Watch how to quickly start using Notebooks on Sentinel data lake - Getting Started with Apache Spark for Microsoft Sentinel data lake

Real-world scenarios

Here are some impactful ways customers are using notebooks in Sentinel data lake:

  • Extended Threat Investigations: Query data older than 90 days to uncover slow-moving attacks like brute-force campaigns that span accounts and geographies.
  • Behavioral Baselining: Build time-series models to establish normal behavior and identify unusual patterns, such as credential abuse or lateral movement.
  • Retrospective Threat Hunting: React to emerging IOCs by running historical queries across the data lake, enabling rapid and informed response.
  • ML-Powered Insights: Operationalize custom machine learning models for anomaly detection and predictive analytics, directly within the notebook environment.

Real-World Example — Extending a Detection with a Notebook

Challenge

A security team wants to identify password spray activity occurring gradually over time — the same IPs attempting one or two logons per day across different users — without overwhelming the system with false positives or exceeding query limits.


Using a notebook (Spark runtime), analysts extend investigation across months of raw SigninLogs stored in the data lake:

  1. Load 6–12 months of sign-in data directly from lake storage.
  2. Aggregate failed logons by IP, ASN, and user over time.
  3. Apply logic to find IPs with repeated low-frequency failures spread across many accounts.
  4. Build complex statistics-based scoring model to find IP ranges of potential slow password spray.
  5. Visualize long-term trends with Plotly or Matplotlib to expose consistent patterns.
  6. Write results back as a curated dataset for downstream detection via analytic tier with aggregated threat intel from your own organization.

For complete walkthrough of Sentinel data lake Password Spray solution, check out our GitHub - Password Spray Detection – End-to-End Pipeline. The notebook uncovers subtle, recurring IP ranges performing slow password sprays - activity invisible to short-window rules.


The enriched dataset with statistical risk scoring and categorization of high, medium and low can then be used by a custom detection to generate proactive alerts when those IPs reappear.

Call to Action

Ready to transform your security operations? Get started with Microsoft Sentinel data lake today.

Explore the possibilities with notebooks in Sentinel data lake:

Updated Feb 17, 2026
Version 1.0

1 Comment

  • geouseo's avatar
    geouseo
    Copper Contributor

    olid breakdown of how Notebooks + Sentinel data lake actually work in practice. The part about using Spark for historical baselining and then feeding results back into KQL for operational alerts is a smart pattern—bridges the gap between data science and SOC workflows without overcomplicating things. Would love to see more real-world examples of teams balancing cost vs. compute when scheduling recurring notebook jobs.