security development lifecycle (sdl)
3 TopicsTransforming Security Analysis into a Repeatable, Auditable, and Agentic Workflow
Author(s): Animesh Jain, Vinay Yadav Shaped by investigations into the strategic question of what it takes for Windows to achieve world-leading security—and the practical engineering needed to explore agentic workflows at scale and their interfaces. Our work in Windows Servicing & Delivery (WSD) is shaped by two guiding prompts from leadership: "what does it take for Windows to achieve world-leading security", and "how do we responsibly integrate AI into systems as large and high-churn as Windows?". Reasoning models open new possibilities on both fronts. As we continue experimenting, one issue repeatedly surfaces as the bottleneck for scalable security assurance: variant vulnerabilities. They are subtle, recurring, and easy to miss—making them an ideal proving ground for the enterprise-grade workflow we present here. Security Analysis at Windows Scale Security analysis shouldn’t be an afterthought—it should be a continuous, auditable, and intelligence-driven process built directly into the engineering workflow. This work introduces an agentic security analysis pipeline that uses reasoning models and tool-based agents to detect variant vulnerabilities across large, fast-changing codebases. By combining automation with explainability, it transforms security validation from a manual, point-in-time task into a repeatable and trustworthy part of every build. Why are variants the hard part? Security flaws rarely occur in isolation. Once a vulnerability is fixed, its logical or structural pattern often reappears elsewhere in the codebase—hidden behind different variables, layers, or call paths. These recurring patterns are variants—the quiet echoes of known issues that can persist across millions of lines of code. Finding them manually is slow, repetitive, and incomplete. As engineering velocity increases, so does the likelihood of variant drift—the same vulnerability class re-emerging in a slightly altered form. Each missed variant carries a downstream cost: regression, re-servicing, or, in the worst cases, re-exploitation. Modern large systems like Windows are too large, too interconnected, and ship too frequently for manual vulnerability discovery to keep pace. Traditional static analyzers and deterministic class-based scanners struggle to generalize these patterns or create too much noise, while targeted fuzzing campaigns often fail to trigger the nuanced runtime conditions that expose them. To stay ahead, automation must evolve. We need systems that reason—not just scan—systems capable of understanding relationships between code regions and applying logical analogies instead of brute-force enumeration. Reasoning Models: A Turning Point in Security Research Recent advances in AI reasoning have demonstrated that large language models can uncover vulnerabilities previously missed by deterministic tools. For example, Google’s Big Sleep agent surfaced an exploitable SQLite flaw (CVE-2025-6965) that bypassed traditional fuzzers due to configuration-sensitive logic. Similarly, an o-series reasoning model helped identify a critical Linux SMB logoff use-after-free (CVE-2025-37899), proving that reasoning-driven automation can detect complex, context-dependent flaws in mature kernel code. These breakthroughs show what’s possible when systems can form, test, and refine hypotheses about software behavior. The challenge now is scaling that intelligence into repeatable, auditable, enterprise-grade workflows—where every result is traceable, reviewable, and integrated into the developer’s daily workflow. A Framework for Agentic Security Analysis To address this challenge, we’ve developed an agentic security analysis framework that applies reasoning models within structured, enterprise grade workflow pattern. It combines large language model agents, specialized analysis tools, and structured artifact generation to make vulnerability discovery continuous, explainable, and auditable. It is interfaced as a first-class Azure DevOps (ADO) pipeline and can be integrated natively into enterprise CI/CD processes. For security analysis, it continuously reasons over large, evolving codebases to identify and validate variant vulnerabilities earlier in the release cycle. Together, these components form a repeatable workflow that helps surface variant patterns with greater consistency and clarity. Core Technical Pillars Scale – Autonomous Code Reasoning Long-context models extend analysis across massive, evolving codebases. They infer analogies, relationships, and behavioral patterns between code regions, enabling scalable reasoning that adapts as systems grow. Tool–Agent Collaboration Specialized agents coordinate to perform semantic search, graph traversal, and both static and dynamic interpretation. This distributed reasoning approach ensures resilience and precision across diverse enterprise environments. Structured Artifact Generation Every step produces versioned, auditable artifacts that document the reasoning process. These artifacts help provide reproducibility, compliance, and transparency—critical for enterprise governance and regulated industries. Together, these pillars enable scalable, explainable, and repeatable vulnerability discovery across large software ecosystems such as Windows. Every stage—from reasoning to validation—is logged and traceable, designed to make each discovery reproducible and reviewable. Inside the framework Agent-Led, Human-Reviewed The system is agent-led from start to finish and human-reviewed only at decision boundaries. Agents form hypotheses from recent fixes or vulnerability classes, test them against context, perform validation passes, and generate evidence-backed reports for reviewer confirmation. The workflow mirrors how seasoned security engineers operate—only faster and continuously. n tasks based on templatized prompts. Tool Specialists as Agents Each analytical tool functions as a domain-specific agent—performing semantic search, file inspection, or function-graph traversal. These agents collaborate through structured orchestration, maintaining specialization without sacrificing coherence. Agentic Patterns and Orchestration The framework employs reusable reasoning patterns—reflective reasoning, actor–validator loops, and parallel tool dialogues—for accuracy and scale. A central conductor agent governs task coordination, context flow, and artifact persistence across runs. Auditability Through Artifacts Every investigation yields a transparent chain of artifacts: Analysis Notes – summarize candidate issues Critique Notes – document reasoning and counter-evidence Synthesis Reports – provide developer-ready summaries, diffs, call graphs, and exploitability insights Agentic Conversation Logs - provides conversation logs so developers can backtrack on reasoning and get more context This structure makes each discovery fully traceable and auditable. CI/CD-Native Integration The interface operates as a first-class Azure DevOps pipeline, attachable to pull requests, nightly builds, or release triggers. Each run publishes versioned artifacts and validation notes directly into the developer workflow—making reasoning-driven security a seamless part of software delivery. What It Can Do Today Seeded Variant Hunts: Start from a recent fix or known pattern to enumerate analogous cases, analyze helper functions, and test reachability. Evidence-First Reporting: Every finding includes reproducible evidence—code snippets, diffs, and caller graphs—delivered within the PR or work item. Scalable Coverage: Runs across servicing branches, producing consistent and auditable validation artifacts. Improved Precision: A reasoning-based validation pass has significantly reduced false positives in internal testing. Case Study: CVE-2025-55325 During a sweep of “*_DEFAULTS” deserializers, the agentic pipeline independently identified GetPoolDefaults trusting a user-controlled size field and copying that many bytes from a caller buffer. The missing runtime bounds check—guarded only by an assertion in debug builds—enabled a potential read access violation and information disclosure. The mitigation mirrored a hardened sibling helper: enforcing runtime bounds on Size versus BytesAvailable/Version before allocation and copy. The finding was later validated by the servicing teams, confirming it matched an issue already under active investigation—illustrating how the automated reasoning process can independently surface real-world vulnerabilities that align with expert analysis. Beyond Variant Analysis The underlying architecture of this framework extends naturally beyond variant detection: Net-new vulnerability discovery through cross-binary pattern matching Model-assisted fuzzing & static analysis orchestrated through CI/CD integration Regression detection via historical code comparisons Security Development Lifecycle (SDL) enforcement and reproducibility checks The agentic patterns and tooling can support net-new vulnerability discovery through cross-binary pattern matching, regression detection using historical code comparisons, reproducibility checks aligned with SDL requirements, and model-assisted fuzzing orchestrated through CI/CD processes. These capabilities open the door to applying reasoning-driven workflows across a broader range of security & validation tasks. The Road Ahead Looking ahead, this trajectory naturally leads toward autonomous cybersecurity pipelines powered by reasoning agents that apply reflective analysis, validation loops, and structured tool interactions to complex codebases. By structuring each step as an auditable artifact, the approach supports security & validation analysis that is both explainable and repeatable. These agents could help validate security posture, analyze historical and real-time signals, and detect anomalous patterns early in the lifecycle. References Google Cloud Blog – Big Sleep and AI-Assisted Vulnerability Discovery “A summer of security: empowering cyber defenders with AI.” https://blog.google/technology/safety-security/cybersecurity-updates-summer-2025 The Hacker News – Google AI ‘Big Sleep’ Stops Exploitation of Critical SQLite Flaw https://thehackernews.com/2025/07/google-ai-big-sleep-stops-exploitation.html NIST National Vulnerability Database – CVE-2025-6965 (SQLite) https://nvd.nist.gov/vuln/detail/CVE-2025-6965 Sean Heelan – “Reasoning Models and the ksmbd Use-After-Free” https://simonwillison.net/2025/May/24/sean-heelan The Cyber Express – AI Finds CVE-2025-37899 Zero-Day in Linux SMB Kernel https://thecyberexpress.com/cve-2025-37899-zero-day-in-linux-smb-kernel NIST National Vulnerability Database – CVE-2025-37899 (Linux SMB Use-After-Free) https://nvd.nist.gov/vuln/detail/CVE-2025-37899 NIST National Vulnerability Database – CVE-2025-55325 (Windows Storage Management Provider Buffer Over-read) https://nvd.nist.gov/vuln/detail/CVE-2025-55325 NVD Microsoft Security Response Center – Vulnerability Details for CVE-2025-55325 https://msrc.microsoft.com/update-guide/vulnerability/CVE-2025-55325Hacking Made Easy, Patching Made Optional: A Modern Cyber Tragedy
In today’s cyber threat landscape, the tools and techniques required to compromise enterprise environments are no longer confined to highly skilled adversaries or state-sponsored actors. While artificial intelligence is increasingly being used to enhance the sophistication of attacks, the majority of breaches still rely on simple, publicly accessible tools and well-established social engineering tactics. Another major issue is the persistent failure of enterprises to patch common vulnerabilities in a timely manner—despite the availability of fixes and public warnings. This negligence continues to be a key enabler of large-scale breaches, as demonstrated in several recent incidents. The Rise of AI-Enhanced Attacks Attackers are now leveraging AI to increase the credibility and effectiveness of their campaigns. One notable example is the use of deepfake technology—synthetic media generated using AI—to impersonate individuals in video or voice calls. North Korean threat actors, for instance, have been observed using deepfake videos and AI-generated personas to conduct fraudulent job interviews with HR departments at Western technology companies. These scams are designed to gain insider access to corporate systems or to exfiltrate sensitive intellectual property under the guise of legitimate employment. Social Engineering: Still the Most Effective Entry Point And yet, many recent breaches have begun with classic social engineering techniques. In the cases of Coinbase and Marks & Spencer, attackers impersonated employees through phishing or fraudulent communications. Once they had gathered sufficient personal information, they contacted support desks or mobile carriers, convincingly posing as the victims to request password resets or SIM swaps. This impersonation enabled attackers to bypass authentication controls and gain initial access to sensitive systems, which they then leveraged to escalate privileges and move laterally within the network. Threat groups such as Scattered Spider have demonstrated mastery of these techniques, often combining phishing with SIM swap attacks and MFA bypass to infiltrate telecom and cloud infrastructure. Similarly, Solt Thypoon (formerly DEV-0343), linked to North Korean operations, has used AI-generated personas and deepfake content to conduct fraudulent job interviews—gaining insider access under the guise of legitimate employment. These examples underscore the evolving sophistication of social engineering and the need for robust identity verification protocols. Built for Defense, Used for Breach Despite the emergence of AI-driven threats, many of the most successful attacks continue to rely on simple, freely available tools that require minimal technical expertise. These tools are widely used by security professionals for legitimate purposes such as penetration testing, red teaming, and vulnerability assessments. However, they are also routinely abused by attackers to compromise systems Case studies for tools like Nmap, Metasploit, Mimikatz, BloodHound, Cobalt Strike, etc. The dual-use nature of these tools underscores the importance of not only detecting their presence but also understanding the context in which they are being used. From CVE to Compromise While social engineering remains a common entry point, many breaches are ultimately enabled by known vulnerabilities that remain unpatched for extended periods. For example, the MOVEit Transfer vulnerability (CVE-2023-34362) was exploited by the Cl0p ransomware group to compromise hundreds of organizations, despite a patch being available. Similarly, the OpenMetadata vulnerability (CVE-2024-28255, CVE-2024-28847) allowed attackers to gain access to Kubernetes workloads and leverage them for cryptomining activity days after a fix had been issued. Advanced persistent threat groups such as APT29 (also known as Cozy Bear) have historically exploited unpatched systems to maintain long-term access and conduct stealthy operations. Their use of credential harvesting tools like Mimikatz and lateral movement frameworks such as Cobalt Strike highlights the critical importance of timely patch management—not just for ransomware defense, but also for countering nation-state actors. Recommendations To reduce the risk of enterprise breaches stemming from tool misuse, social engineering, and unpatched vulnerabilities, organizations should adopt the following practices: 1. Patch Promptly and Systematically Ensure that software updates and security patches are applied in a timely and consistent manner. This involves automating patch management processes to reduce human error and delay, while prioritizing vulnerabilities based on their exploitability and exposure. Microsoft Intune can be used to enforce update policies across devices, while Windows Autopatch simplifies the deployment of updates for Windows and Microsoft 365 applications. To identify and rank vulnerabilities, Microsoft Defender Vulnerability Management offers risk-based insights that help focus remediation efforts where they matter most. 2. Implement Multi-Factor Authentication (MFA) To mitigate credential-based attacks, MFA should be enforced across all user accounts. Conditional access policies should be configured to adapt authentication requirements based on contextual risk factors such as user behavior, device health, and location. Microsoft Entra Conditional Access allows for dynamic policy enforcement, while Microsoft Entra ID Protection identifies and responds to risky sign-ins. Organizations should also adopt phishing-resistant MFA methods, including FIDO2 security keys and certificate-based authentication, to further reduce exposure. 3. Identity Protection Access Reviews and Least Privilege Enforcement Conducting regular access reviews ensures that users retain only the permissions necessary for their roles. Applying least privilege principles and adopting Microsoft Zero Trust Architecture limits the potential for lateral movement in the event of a compromise. Microsoft Entra Access Reviews automates these processes, while Privileged Identity Management (PIM) provides just-in-time access and approval workflows for elevated roles. Just-in-Time Access and Risk-Based Controls Standing privileges should be minimized to reduce the attack surface. Risk-based conditional access policies can block high-risk sign-ins and enforce additional verification steps. Microsoft Entra ID Protection identifies risky behaviors and applies automated controls, while Conditional Access ensures access decisions are based on real-time risk assessments to block or challenge high-risk authentication attempts. Password Hygiene and Secure Authentication Promoting strong password practices and transitioning to passwordless authentication enhances security and user experience. Microsoft Authenticator supports multi-factor and passwordless sign-ins, while Windows Hello for Business enables biometric authentication using secure hardware-backed credentials. 4. Deploy SIEM and XDR for Detection and Response A robust detection and response capability is vital for identifying and mitigating threats across endpoints, identities, and cloud environments. Microsoft Sentinel serves as a cloud-native SIEM that aggregates and analyses security data, while Microsoft Defender XDR integrates signals from multiple sources to provide a unified view of threats and automate response actions. 5. Map and Harden Attack Paths Organizations should regularly assess their environments for attack paths such as privilege escalation and lateral movement. Tools like Microsoft Defender for Identity help uncover Lateral Movement Paths, while Microsoft Identity Threat Detection and Response (ITDR) integrates identity signals with threat intelligence to automate response. These capabilities are accessible via the Microsoft Defender portal, which includes an attack path analysis feature for prioritizing multicloud risks. 6. Stay Current with Threat Actor TTPs Monitor the evolving tactics, techniques, and procedures (TTPs) employed by sophisticated threat actors. Understanding these behaviours enables organizations to anticipate attacks and strengthen defenses proactively. Microsoft Defender Threat Intelligence provides detailed profiles of threat actors and maps their activities to the MITRE ATT&CK framework. Complementing this, Microsoft Sentinel allows security teams to hunt for these TTPs across enterprise telemetry and correlate signals to detect emerging threats. 7. Build Organizational Awareness Organizations should train staff to identify phishing, impersonation, and deepfake threats. Simulated attacks help improve response readiness and reduce human error. Use Attack Simulation Training, in Microsoft Defender for Office 365 to run realistic phishing scenarios and assess user vulnerability. Additionally, educate users about consent phishing, where attackers trick individuals into granting access to malicious apps. Conclusion The democratization of offensive security tooling, combined with the persistent failure to patch known vulnerabilities, has significantly lowered the barrier to entry for cyber attackers. Organizations must recognize that the tools used against them are often the same ones available to their own security teams. The key to resilience lies not in avoiding these tools, but in mastering them—using them to simulate attacks, identify weaknesses, and build a proactive defense. Cybersecurity is no longer a matter of if, but when. The question is: will you detect the attacker before they achieve their objective? Will you be able to stop them before reaching your most sensitive data? Additional read: Gartner Predicts 30% of Enterprises Will Consider Identity Verification and Authentication Solutions Unreliable in Isolation Due to AI-Generated Deepfakes by 2026 Cyber security breaches survey 2025 - GOV.UK Jasper Sleet: North Korean remote IT workers’ evolving tactics to infiltrate organizations | Microsoft Security Blog MOVEit Transfer vulnerability Solt Thypoon Scattered Spider SIM swaps Attackers exploiting new critical OpenMetadata vulnerabilities on Kubernetes clusters | Microsoft Security Blog Microsoft Defender Vulnerability Management - Microsoft Defender Vulnerability Management | Microsoft Learn Zero Trust Architecture | NIST tactics, techniques, and procedures (TTP) - Glossary | CSRC https://learn.microsoft.com/en-us/security/zero-trust/deploy/overview