Blog Post

Microsoft Security Copilot Blog
5 MIN READ

Empowering Security Copilot with NL2KQL: Transforming Natural Language into Insightful KQL queries

xinye-tang's avatar
xinye-tang
Icon for Microsoft rankMicrosoft
Mar 04, 2025

Imagine being at the forefront of a dynamic Security Operations Center (SOC), managing and analyzing millions of daily event logs. You're hunting for clues—perhaps unauthorized remote logins from a suspicious IP range—but translating that intent into a corresponding Kusto Query Language (KQL) query isn't always straightforward.

KQL is the powerhouse behind services like Microsoft Sentinel and Microsoft Defender, enabling security professionals to query and analyze vast amounts of data effectively. However, KQL comes with challenges: it requires precise syntax, detailed knowledge of table schemas, and an understanding of functions and grouping. For many security analysts, this learning curve can slow down threat investigation and response.

With Microsoft Security Copilot, an enterprise-grade AI assistant powered by GPT, that vision has become a reality. By leveraging NL2KQL, a framework that translates natural language into corresponding KQL queries, Security Copilot makes querying in KQL as intuitive as a conversation. In this article, we’ll explore the story behind NL2KQL, its potential to transform security operations, and why it matters for the future of cybersecurity.

The Problem: A Communication Barrier Between Analysts and Data

In the high-stakes environment of a SOC, time is critical. Yet, analysts often face a dual challenge: understanding the nuances of evolving threats while mastering the tools and languages required to analyze them. KQL, while powerful, is unforgiving of errors—a misplaced parenthesis or an incorrect function can lead to failed queries or inaccurate results. Additionally, KQL lacks features common in relational databases, such as primary or foreign keys, making it even more challenging for analysts to navigate unfamiliar data schemas.

This creates a communication barrier. Analysts think in terms of security scenarios (“I want to identify failed login attempts from this IP range”), but translating those scenarios into syntactically correct KQL commands requires time, effort, and expertise.

The NL2KQL Solution Within Microsoft Security Copilot

NL2KQL was designed to let security analysts communicate with data “as they are”—in natural language—while our system handles the translation into corresponding KQL commands. Leveraging the capabilities of large language models (LLMs), NL2KQL interprets an analyst’s natural language query (NLQ), understands its intent, and generates corresponding KQL commands. The pipeline includes the following key components:

  1. Semantic Data Catalog: A core component of NL2KQL is the semantic data catalog, encapsulating information about the structure, semantics, and contextual attributes of the database. This catalog includes annotations for tables, columns, type constraints, and enumerated values where applicable. Embeddings of table elements, column elements, and value elements are calculated using a similarity embedding model and stored in a vector database for quick retrieval during inference.
  2. Schema Refiner: Generating semantically valid KQL requires an relevant schema representation. The Schema Refiner dynamically selects the most relevant tables, columns, and potential values from the semantic data catalog to include in the model’s context. This approach addresses constraints such as limited context window sizes and varying user permissions.
  3. Few-Shot Selector: A synthetic few-shot database guides the LLM. The Few-Shot Selector dynamically identifies relevant examples based on the user’s NLQ and schema context. By leveraging precomputed embeddings stored in a vector database, the selector retrieves and ranks few shots using cosine similarity.
  4. Prompt Builder: Crafting an effective prompt is key to LLM performance. NL2KQL’s prompt integrates instructions, relevant schema details from the Schema Refiner, selected few-shots, essential Kusto syntax elements, and best practices for query optimization. This comprehensive context ensures efficient query generation.
  5. Query Refiner: The Query Refiner validates and repairs the generated KQL to ensure syntactic and semantic correctness. By leveraging the official KQL parser library, it identifies errors, handles undefined variables, fixes joins, and adds missing operators. This recursive process ensures reliable outputs even for complex queries.

Real-World Scenarios: NL2KQL in Security Copilot

In Microsoft Defender XDR

Microsoft Security Copilot in Microsoft Defender XDR comes with a query assistant capability in advanced hunting. By clicking the "generate query" button, threat hunters or security analysts can ask a question in natural language, and NL2KQL then generates a corresponding KQL query that corresponds to the request using the advanced hunting data schema. This query can be easily copied for further refinement or executed directly in the advanced hunting query pane—accelerating investigations and reducing manual effort. For more details on this feature, please visit Microsoft Security Copilot in advanced hunting - Microsoft Defender XDR

In the Standalone Security Copilot Experience

In the standalone Security Copilot interface, after you pose a natural language query, NL2KQL generates a corresponding KQL query and presents it for your review. Users have the option to enable Security Copilot to also execute the query and display the results. However, it's essential to note that while the system is designed to produce corresponding queries, the output may not always be perfect. It's important to review the generated query to ensure it meets your requirements.

Note: You can prompt Security Copilot to generate advanced hunting queries for both Defender XDR and Microsoft Sentinel tables. Not all Microsoft Sentinel tables are currently supported, but support for these tables can be expected in the future.

Why This Matters in the Security Domain
  • Empowering Analysts

NL2KQL shifts the focus from learning and troubleshooting query syntax to investigating and mitigating threats. By reducing the cognitive load, analysts can spend more time on high-value tasks, such as identifying attack patterns and responding to incidents.

  • Improving Efficiency

In scenarios where speed is essential, NL2KQL accelerates query creation, enabling analysts to retrieve insights faster. Whether investigating malware propagation or monitoring insider threats, NL2KQL reduces time-to-insight.

  • Democratizing Security Analysis

With NL2KQL, less experienced analysts can contribute meaningfully to investigations without needing extensive training in KQL. This democratization is particularly valuable as the demand for skilled cybersecurity professionals continues to outpace supply.

  • Future-Ready Security

The security landscape is constantly evolving, with new threats and data sources emerging daily. NL2KQL’s adaptability ensures it can scale to support diverse use cases and data schemas, making it a future-ready tool for SOCs.

 

Where to Learn More

If you use the NL2KQL code, data, results, or any related resources in your research, please cite:

@article{tang2025nl2kqlnaturallanguagekusto,

  title={NL2KQL: From Natural Language to Kusto Query},

  author={Xinye Tang and Amir H. Abdi and Jeremias Eichelbaum and Mahan Das and Alex Klein and Nihal Irmak Pakis and William Blum and Daniel L Mace and Tanvi Raja and Namrata Padmanabhan and Ye Xing},

  journal={arXiv preprint arXiv:2404.02933},

  year={2025}

}

Updated Mar 07, 2025
Version 2.0
No CommentsBe the first to comment