Author(s): Vinay Yadav, Animesh Jain (Windows Servicing)
Security analysis shouldn’t be an afterthought—it should be a continuous, auditable, and intelligence-driven process built directly into the engineering workflow. This work introduces an agentic security analysis pipeline that uses reasoning models and tool-based agents to detect variant vulnerabilities across large, fast-changing codebases. By combining automation with explainability, it transforms security validation from a manual, point-in-time task into a repeatable and trustworthy part of every build.
Why are variants the hard part?
Security flaws rarely occur in isolation. Once a vulnerability is fixed, its logical or structural pattern often reappears elsewhere in the codebase—hidden behind different variables, layers, or call paths. These recurring patterns are variants—the quiet echoes of known issues that can persist across millions of lines of code.
Finding them manually is slow, repetitive, and incomplete. As engineering velocity increases, so does the likelihood of variant drift—the same vulnerability class re-emerging in a slightly altered form. Each missed variant carries a downstream cost: regression, re-servicing, or, in the worst cases, re-exploitation. Modern large systems like Windows are too large, too interconnected, and ship too frequently for manual vulnerability discovery to keep pace.
Traditional static analyzers and deterministic class-based scanners struggle to generalize these patterns or create too much noise, while targeted fuzzing campaigns often fail to trigger the nuanced runtime conditions that expose them. To stay ahead, automation must evolve. We need systems that reason—not just scan—systems capable of understanding relationships between code regions and applying logical analogies instead of brute-force enumeration.
Reasoning Models: A Turning Point in Security Research
Recent advances in AI reasoning have demonstrated that large language models can uncover vulnerabilities previously missed by deterministic tools.
For example, Google’s Big Sleep agent surfaced an exploitable SQLite flaw (CVE-2025-6965) that bypassed traditional fuzzers due to configuration-sensitive logic. Similarly, an o-series reasoning model helped identify a critical Linux SMB logoff use-after-free (CVE-2025-37899), proving that reasoning-driven automation can detect complex, context-dependent flaws in mature kernel code.
These breakthroughs show what’s possible when systems can form, test, and refine hypotheses about software behavior. The challenge now is scaling that intelligence into repeatable, auditable, enterprise-grade workflows—where every result is traceable, reviewable, and integrated into the developer’s daily workflow.
A Framework for Agentic Security Analysis
To address this challenge, we’ve developed an agentic security analysis framework that applies reasoning models within structured, enterprise grade workflow pattern. It combines large language model agents, specialized analysis tools, and structured artifact generation to make vulnerability discovery continuous, explainable, and auditable. It is interfaced as a first-class Azure DevOps (ADO) pipeline and can be integrated natively into enterprise CI/CD processes. For security analysis, it continuously reasons over large, evolving codebases to identify and validate variant vulnerabilities earlier in the release cycle.
Together, these components form a repeatable workflow that helps surface variant patterns with greater consistency and clarity.
Core Technical Pillars
- Scale – Autonomous Code Reasoning
- Long-context models extend analysis across massive, evolving codebases.
- They infer analogies, relationships, and behavioral patterns between code regions, enabling scalable reasoning that adapts as systems grow.
- Tool–Agent Collaboration
- Specialized agents coordinate to perform semantic search, graph traversal, and both static and dynamic interpretation.
- This distributed reasoning approach ensures resilience and precision across diverse enterprise environments.
- Structured Artifact Generation
- Every step produces versioned, auditable artifacts that document the reasoning process.
- These artifacts help provide reproducibility, compliance, and transparency—critical for enterprise governance and regulated industries.
Together, these pillars enable scalable, explainable, and repeatable vulnerability discovery across large software ecosystems such as Windows.
Every stage—from reasoning to validation—is logged and traceable, designed to make each discovery reproducible and reviewable.
Inside the framework
Agent-Led, Human-Reviewed
The system is agent-led from start to finish and human-reviewed only at decision boundaries. Agents form hypotheses from recent fixes or vulnerability classes, test them against context, perform validation passes, and generate evidence-backed reports for reviewer confirmation. The workflow mirrors how seasoned security engineers operate—only faster and continuously.
Figure 1: The Base Agent that represents a unit of worker in the pipeline. The agent can take on tasks based on templatized prompts.Tool Specialists as Agents
Each analytical tool functions as a domain-specific agent—performing semantic search, file inspection, or function-graph traversal. These agents collaborate through structured orchestration, maintaining specialization without sacrificing coherence.
Figure 2: Tool specialist agents as tool calls from Base Agent. Helps managing context length and retain focus for the Base Agent.Agentic Patterns and Orchestration
The framework employs reusable reasoning patterns—reflective reasoning, actor–validator loops, and parallel tool dialogues—for accuracy and scale. A central conductor agent governs task coordination, context flow, and artifact persistence across runs.
Figure 3: The reflective reasoning loop for Variant Identification with Base Agents and the personas. Each arrow is a logged agentic conversation.Auditability Through Artifacts
Every investigation yields a transparent chain of artifacts:
- Analysis Notes – summarize candidate issues
- Critique Notes – document reasoning and counter-evidence
- Synthesis Reports – provide developer-ready summaries, diffs, call graphs, and exploitability insights
- Agentic Conversation Logs - provides conversation logs so developers can backtrack on reasoning and get more context
This structure makes each discovery fully traceable and auditable.
CI/CD-Native Integration
The interface operates as a first-class Azure DevOps pipeline, attachable to pull requests, nightly builds, or release triggers. Each run publishes versioned artifacts and validation notes directly into the developer workflow—making reasoning-driven security a seamless part of software delivery.
What It Can Do Today
- Seeded Variant Hunts: Start from a recent fix or known pattern to enumerate analogous cases, analyze helper functions, and test reachability.
- Evidence-First Reporting: Every finding includes reproducible evidence—code snippets, diffs, and caller graphs—delivered within the PR or work item.
- Scalable Coverage: Runs across servicing branches, producing consistent and auditable validation artifacts.
- Improved Precision: A reasoning-based validation pass has significantly reduced false positives in internal testing.
Case Study: CVE-2025-55325
During a sweep of “*_DEFAULTS” deserializers, the agentic pipeline independently identified GetPoolDefaults trusting a user-controlled size field and copying that many bytes from a caller buffer. The missing runtime bounds check—guarded only by an assertion in debug builds—enabled a potential read access violation and information disclosure. The mitigation mirrored a hardened sibling helper: enforcing runtime bounds on Size versus BytesAvailable/Version before allocation and copy.
The finding was later validated by the servicing teams, confirming it matched an issue already under active investigation—illustrating how the automated reasoning process can independently surface real-world vulnerabilities that align with expert analysis.
Beyond Variant Analysis
The underlying architecture of this framework extends naturally beyond variant detection:
- Net-new vulnerability discovery through cross-binary pattern matching
- Model-assisted fuzzing & static analysis orchestrated through CI/CD integration
- Regression detection via historical code comparisons
- Security Development Lifecycle (SDL) enforcement and reproducibility checks
The agentic patterns and tooling can support net-new vulnerability discovery through cross-binary pattern matching, regression detection using historical code comparisons, reproducibility checks aligned with SDL requirements, and model-assisted fuzzing orchestrated through CI/CD processes. These capabilities open the door to applying reasoning-driven workflows across a broader range of security & validation tasks.
The Road Ahead
Looking ahead, this trajectory naturally leads toward autonomous cybersecurity pipelines powered by reasoning agents that apply reflective analysis, validation loops, and structured tool interactions to complex codebases. By structuring each step as an auditable artifact, the approach supports security & validation analysis that is both explainable and repeatable. These agents could help validate security posture, analyze historical and real-time signals, and detect anomalous patterns early in the lifecycle.
References
- Google Cloud Blog – Big Sleep and AI-Assisted Vulnerability Discovery
“A summer of security: empowering cyber defenders with AI.”
https://blog.google/technology/safety-security/cybersecurity-updates-summer-2025 - The Hacker News – Google AI ‘Big Sleep’ Stops Exploitation of Critical SQLite Flaw
https://thehackernews.com/2025/07/google-ai-big-sleep-stops-exploitation.html - NIST National Vulnerability Database – CVE-2025-6965 (SQLite)
https://nvd.nist.gov/vuln/detail/CVE-2025-6965 - Sean Heelan – “Reasoning Models and the ksmbd Use-After-Free”
https://simonwillison.net/2025/May/24/sean-heelan - The Cyber Express – AI Finds CVE-2025-37899 Zero-Day in Linux SMB Kernel
https://thecyberexpress.com/cve-2025-37899-zero-day-in-linux-smb-kernel - NIST National Vulnerability Database – CVE-2025-37899 (Linux SMB Use-After-Free)
https://nvd.nist.gov/vuln/detail/CVE-2025-37899 - NIST National Vulnerability Database – CVE-2025-55325 (Windows Storage Management Provider Buffer Over-read)
https://nvd.nist.gov/vuln/detail/CVE-2025-55325 NVD - Microsoft Security Response Center – Vulnerability Details for CVE-2025-55325
https://msrc.microsoft.com/update-guide/vulnerability/CVE-2025-55325