Microsoft Security Copilot is revolutionizing cybersecurity operations by leveraging generative AI to streamline security investigations, threat response, and data analysis. As part of this innovation, NL2KQL enables security professionals to translate natural language into corresponding KQL queries, making threat hunting and security analytics more accessible.
In this blog, we explore how MAGIC —our novel Multi-AGent system that automates the creatIon of the self-Correction guideline—can enhance AI-generated query accuracy within NL2KQL. Recently accepted at the Association for the Advancement of Artificial Intelligence (AAAI) 2025, MAGIC introduces a breakthrough in automated self-correction for natural language-to-code systems.
Why Self-Correction Matters
AI-generated queries can sometimes be inaccurate or non-executable due to complex syntax requirements, ambiguous language inputs, or missing context. Traditionally, correcting these errors relies on human-engineered self-correction guidelines. Experts analyze training datasets, identify common patterns of mistakes, and craft prompts to help LLMs avoid these errors. While effective, this process is labor-intensive, time-consuming, and inherently limited by human ability to anticipate all error patterns.
MAGIC revolutionizes this approach. Instead of relying on manual intervention, MAGIC leverages a multi-agent system to automatically generate self-correction guidelines, iteratively improving model performance without requiring extensive human oversight.
How MAGIC Works
At the heart of MAGIC is an iterative feedback-correction cycle, driven by three key agents:
- Manager Agent: Oversees the process, orchestrating interactions between the feedback and correction agents.
- Feedback Agent: Analyzes mistakes in the generated code by comparing it to the ground truth.
- Correction Agent: Revises the code based on feedback, producing a new output for evaluation.
This process unfolds as follows:
- Identify Failures: The manager identifies KQL queries that are incorrect—either because they are non-executable or produce incorrect results compared to the ground truth.
- Feedback-Correction Cycle: The manager initiates a cycle where the feedback agent explains the mistakes, and the correction agent revises the output. This loop continues until the query is corrected or a maximum number of iterations is reached.
- Adaptive Prompting: If the initial prompts for the agents do not yield successful corrections, the manager dynamically revises these instructions based on past interactions and outcomes, optimizing the agents’ performance.
- Guideline Generation: Successful feedback from the cycles is stored and aggregated into a self-correction guideline. This guideline evolves batch by batch, capturing a comprehensive set of error patterns and correction strategies.
Example: How MAGIC Enhances Query Accuracy
To illustrate how MAGIC guidelines improve query accuracy, consider the following simplified example:
User Query:
"How many unique closed incidents have we had in Microsoft Sentinel in the last month?"
Initial AI-Generated Query:
SecurityIncident
| where Status == "Closed"
| where ClosedTime >= startofmonth(now(), -1)
| where ClosedTime < startofmonth(now())
| summarize ClosedIncidents = count()
Here, the query attempts to count unique closed incidents but incorrectly uses count(), leading to an inflated count instead of distinct incidents.
MAGIC's Self-Correction Guideline:
MAGIC generates structured self-correction guidelines to refine the query. One such guideline is:
Use the dcount aggregation function to count distinct values when summarizing by a specific field.
Revised Query After Applying MAGIC’s Guidance:
SecurityIncident
| where Status == "Closed"
| where ClosedTime >= startofmonth(now(), -1)
| where ClosedTime < startofmonth(now())
| summarize UniqueClosedIncidents = dcount(IncidentNumber)
By using dcount(), the revised query correctly counts unique closed incidents instead of mistakenly aggregating all rows.
Note: this is a simplified example illustrating MAGIC’s self-correction approach. The exact corrections produced by NL2KQL may vary depending on query context and other factors.
Generalizing Beyond NL2KQL
While MAGIC is being explored for enhancing NL2KQL within Security Copilot, its methodology extends to various natural language-to-code generation tasks. By applying MAGIC’s self-correction capabilities, Security Copilot can improve AI-generated code in multiple security domains, including NL to SQL for structured data queries, NL to GraphQL for API interactions, and NL to scripting languages for workflow automation.
This adaptability highlights MAGIC’s potential to become a foundational framework for improving accuracy and efficiency across diverse applications in LLM-powered systems.
Why This Matters for Security
In security, the stakes are higher than ever. Analysts depend on accurate and efficient tools to detect, analyze, and respond to threats. MAGIC aligns seamlessly with these needs by enabling:
- Higher Accuracy: Reducing errors in automated query generation ensures reliable insights and faster decision-making.
- Efficiency at Scale: Automating self-correction eliminates bottlenecks caused by manual intervention.
- Adaptability: The ability to generalize MAGIC’s approach to diverse tasks empowers security teams to stay ahead in an evolving threat landscape.
MAGIC represents a paradigm shift in how we approach LLM limitations. By automating self-correction, it not only enhances the reliability of AI systems but also empowers researchers and practitioners to focus on innovation rather than troubleshooting.
Where to Learn More
- Code: https://github.com/microsoft/SynQo
- Data: https://huggingface.co/datasets/microsoft/MAGIC
- Paper: https://arxiv.org/abs/2406.12692
- For more details on NL2KQL, please visit Empowering Security Copilot with NL2KQL
If you use the MAGIC code, data, results or any related resources in your research, please cite the following reference:
@article{askari2024magic,
title={MAGIC: Generating Self-Correction Guideline for In-Context Text-to-SQL},
author={Askari, Arian and Poelitz, Christian and Tang, Xinye},
journal={arXiv preprint arXiv:2406.12692},
year={2024}
}
Updated Mar 10, 2025
Version 1.0xinye-tang
Microsoft
Joined September 30, 2020
Microsoft Security Copilot Blog
Microsoft Security Copilot is a generative AI-powered assistant for daily operations in security and IT that empowers teams to manage and protect at the speed and scale of AI.
When evaluating various solutions, your peers value hearing from people like you who’ve used the product. Review Microsoft Security Copilot by filling out a Gartner Peer Insights survey and receive a $25 USD gift card (for customers only). Here are the Privacy/Guideline links: Microsoft Privacy Statement, Gartner’s Community Guidelines & Gartner Peer Insights Review Guide.