Microsoft Defender for Cloud Blog

16 MIN READ

Secure AI by Design Series: Embedding Security and Governance Across the AI Lifecycle

Hesham_Saad

Microsoft

Sep 30, 2025

Problem Statement

Securing AI in the Age of Generative Intelligence

Executive Summary

The rapid adoption of Generative AI (GenAI) is transforming industries—unlocking new efficiencies, accelerating innovation, and reshaping how enterprises operate. However, this transformation introduces significant security risks, novel attack surfaces, and regulatory uncertainty. This white paper outlines the key challenges, supported by Microsoft’s public research and guidance, and presents actionable strategies to mitigate risks and build trust in AI systems.

The Dual Edge of GenAI

While GenAI enhances productivity and decision-making, it also expands the threat landscape. Microsoft identifies key enterprise concerns including data exfiltration, adversarial attacks, and ethical risks associated with AI deployment.

Security Risks in GenAI Adoption

2.1 Data Leakage

According to Microsoft’s security insights, 80% of business leaders cite data leakage as their top concern when adopting AI. Additionally, 84% of organisations want greater confidence in managing data input into AI applications (https://www.microsoft.com/security/blog/2024/06/18/mitigating-insider-risks-in-the-age-of-ai-with-microsoft-purview/).

Microsoft’s white paper on secure AI adoption recommends a four-step strategy: Know your data, Govern your data, Protect your data, and Prevent data loss (Data Security Foundation for Secure AI).

2.2 Prompt Injection & Jailbreaks

Microsoft reports that 88% of organizations are concerned about prompt injection attacks—where malicious inputs manipulate AI behavior. These attacks are particularly dangerous in Retrieval-Augmented Generation (RAG) systems.

2.3 Hallucinations & Model Trust

Hallucinations—AI-generated false or misleading outputs—pose reputational and operational risks. Microsoft’s Cloud Security Alliance blog highlights the need for robust GenAI models to reduce epistemic uncertainty and maintain trust.

2.4 Regulatory Uncertainty

52% of leaders express uncertainty about how AI is regulated. Microsoft recommends aligning AI security controls with frameworks such as ISO 42001 and the NIST AI Risk Management Framework.

Trustworthiness & Governance Imperatives

Trust in AI systems is paramount. Microsoft advocates for layered governance and secure orchestration, including real-time monitoring, agent governance, and red teaming (Microsoft Learn: Preventing Data Leakage to Shadow AI).

Enterprise Recommendations

Secure by Design: Integrate security controls across the AI stack—from model selection to deployment. Use Microsoft Defender for AI, Purview DSPM, and Azure AI Content Safety for threat detection and data protection.
Monitor & Mitigate: Employ red teaming and continuous evaluation to simulate adversarial attacks and validate defenses.
Align with Regulatory Frameworks: Map AI security controls to ISO 42001, NIST AI RMF, and leverage Microsoft Purview for compliance.

Security and risk leaders at companies using GenAI said their top concerns are data security issues, including leakage of sensitive data (~63%), sensitive data being overshared, with users gaining access to data they’re not authorized to view or edit (~60%), and inappropriate use or exposure of personal data (~55%). Other concerns include insight inaccuracy (~43%) and harmful or biased outputs (~41%).

In companies that are developing or customizing GenAI apps, security leaders’ concerns were similar but slightly varied. Data leakage along with exfiltration (~60%) and the inappropriate use of personal data (~50%) were again top concerns. But other concerns emerged, including the violation of regulations (~42%), lack of visibility into AI components and vulnerabilities (~42%), and over permissioned access granted to AI apps (~36%).

Overall, these concerns can be divided into two categories: Amplified and emerging security risks.

Secure AI Guidelines

Securing AI by Design is a comprehensive approach that integrates security at every stage of AI system development and deployment. Given the evolving threat landscape of generative AI, organizations must implement robust frameworks, follow best practices, and utilize advanced tools to protect AI models, data, and applications. This blog provides structured guidelines for secure AI, covering emerging risks, defense strategies, and practical implementation scenarios.

Introduction: The Need for Secure AI

The rapid adoption of AI, especially Generative AI (GenAI), brings transformative benefits but also introduces new security risks and attack surfaces. In recent surveys, 80% of business leaders cited data leakage as a primary AI concern, 55% expressed uncertainty about AI regulations, and 88% worried about AI-specific threats like hallucinations and prompt injection. These statistics underscore that trustworthiness in AI systems is paramount.

Microsoft’s approach to AI safety and security is guided by core principles of responsible AI and Zero Trust, ensuring that security, privacy, and compliance are built-in from the ground up. We recognize AI systems can be abused in novel ways, so organizations must be vigilant in embedding security by design, by default, and in operations. This involves both organizational practices (frameworks, policies, training) and technical measures (secure model development lifecycle, threat modeling for AI, continuous monitoring).

Key Objectives of Secure AI Guidelines:

Understand the AI Threat Landscape: Identify how attackers might target AI workloads (e.g. prompt injections, model theft) and the potential impacts
Adopt an AI Security Framework: Implement structured governance aligning with existing standards (e.g. NIST AI RMF, MCSB, Zero Trust) to systematically address identity, data, model, platform, and monitoring aspects
Strengthen Defenses (Blue Team): Leverage advanced threat protection and posture management tools (Microsoft Defender for Cloud with AI workload protection, Purview data governance, Entra ID Conditional Access, etc.) to detect and mitigate attacks in real time
Anticipate Attacks (Red Team): Conduct adversarial testing of AI (prompt red teaming, adversarial ML simulation) to uncover vulnerabilities before attackers do
Integrate AI-Specific Measures: Use AI Shielding (content filters), AI model monitoring for misuse, and continuous risk assessments specialized for AI contexts

Contextual Example: Microsoft’s own journey reflects these priorities. From establishing Trustworthy Computing (2002) and publishing the Security Development Lifecycle (2004), to forming a dedicated AI Red Team (2018) and defining AI Failure Mode taxonomies (2019), to developing open-source AI security tools (Counterfit in 2021, PyRIT in 2024), Microsoft has consistently evolved its security practices to address AI risks. This historical commitment – “thinking in decades and executing in quarters” – serves as a model for organizations securing AI systems for the long run.

AI Security Threat Landscape and Challenges

Generative AI systems introduce unique vulnerabilities beyond traditional IT threats. It’s critical to map out these new risk areas:

2.1 Emerging AI Threats

Prompt Injection Attacks (Direct & Indirect): Adversaries can manipulate an AI model’s input prompts to execute unauthorized actions or leak confidential data. A direct prompt injection (UPIA) is when a user intentionally crafts input to override the system’s instructions (akin to a “jailbreak” of the model). Indirect prompt injection (XPIA) involves embedding malicious instructions in content the AI will process unknowingly – for example, hiding an attack in a document that an AI assistant summarizes. Both can lead to harmful outputs or unintended commands, bypassing content filters. These attacks exploit the lack of separation between instructions and data in LLMs

Data Leakage & Privacy Risks: AI systems often consume sensitive data. Data oversharing can occur if models inadvertently reveal proprietary information (e.g. including training data in responses). 80% of leaders worry about sensitive data leakage via AI. Additionally, insufficient visibility into AI usage can cause compliance failures if sensitive info flows to unauthorized channels. Ensuring strict data governance and monitoring is essential.

Model Theft and Tampering: Trained AI models themselves become targets. Attackers may attempt model extraction (stealing model parameters or behavior by repeated querying) or model evasion, where adversarial inputs cause models to fail at classification or detection tasks. There’s also risk of data poisoning: injecting bad data during model training or fine-tuning to subtly skew the model’s outputs or introduce backdoors. This could degrade reliability or embed hidden triggers in the model.

Resource Abuse (Wallet Attacks): Generative AI requires significant compute. Attackers might exploit AI services to run heavy workloads (cryptomining with GPU abuse, a.k.a wallet abuse). This not only incurs cost but can serve as a DoS vector. AI orchestration components (like agent plugins or tools) could also be abused if not securely designed – e.g., a malicious plugin performing unauthorized operations.

Hallucinations and Misinformation: While not a malicious attack per se, AI models can produce convincing false outputs (“hallucinations”). Attackers may weaponize this by feeding disinformation and using AI to propagate it. Also, model errors can lead to incorrect business decisions. 55% of leaders lack clarity on AI regulation and safety, highlighting the need for caution around AI-generated content.

2.2 Attack Surfaces in Generative AI

GenAI applications incorporate multiple components that expand the traditional attack surface:

Natural Language Interface: LLMs process user prompts and any embedded instructions as one sequence, creating opportunities for prompt injections since there’s no explicit separation of code vs data in prompts.
High Dependency on Data: Data is the fuel of AI. GenAI apps rely on vast datasets: model training data, fine-tuning data, grounding data for retrieval-augmented generation, etc. Each of these is a potential entry point. Poisoned or corrupted data can compromise model integrity. Also, the outputs (newly generated content) may themselves need protection and classification.
Plugins and External Tools: Modern AI assistants often use plugins, APIs, or “skills” to extend capabilities (e.g., web browsing plugin, database query tool). These are additional code modules which, if vulnerable, provide a path for exploitation. Insecure plugin design can allow unauthorized operations or serve as a vector for supply chain attacks.

Orchestration & Agents: GenAI solutions often rely on agent orchestrators to determine how to fulfill user requests—this may involve chaining multiple steps such as web searches, API calls, and LLM interactions. However, these orchestrators and agents themselves can be vulnerable to corruption or manipulation. If compromised, they may execute unintended or harmful actions, even when the individual components are secure. A key risk is agents “going rogue,” such as misinterpreting ambiguous instructions or acting on unvalidated external content. This was evident in the Contoso XPIA scenario, where hidden instructions embedded in an email triggered a data leak—highlighting how flawed orchestration logic can be exploited to bypass safeguards.

AI Infrastructure: The cloud VMs, containers, or on-prem servers running AI services (like Azure OpenAI endpoints, or ML model hosting) become direct targets. Misconfigurations (like permissive network access, disabled authentication on endpoints) can lead to model hijacking or unauthorized use. We must treat the AI infrastructure with the same rigor as any critical cloud workload, aligning with the Microsoft Cloud Security Benchmark (MCSB) controls.

In summary, generative AI’s combination of natural language flexibility, extensive data touchpoints, and complex multi-component workflows means the defensive scope must broaden. Traditional security concerns (like identity, network, OS security) still apply and are joined by AI-specific concerns (prompt misuse, data ethics, model behavior).

Microsoft outlines three broad AI Threat Impact Areas to focus defenses:

AI Application Security – protecting the app code and logic (e.g., preventing data exfiltration via the UI, securing AI plugin integration).
AI Usage Safety & Security – ensuring the outputs and usage of AI meet compliance and ethical standards (mitigating bias, disinformation, harmful content).
AI Platform Security – securing the underlying AI models and compute platform (preventing model theft, safeguarding training pipelines, locking down environment).

By understanding these threats and surfaces, one can implement targeted controls which we discuss next.

Approaches to Secure AI Systems

Mitigating AI risks requires a multi-layered approach combining frameworks and governance, secure engineering practices, and modern security tools. Microsoft recommends the following key strategies:

3.1 Security Development Lifecycle (SDL) for AI and Continuous Practices

Leverage established secure development best practices, augmented for AI context:

Threat Modeling for AI: Extend existing threat modeling (STRIDE, etc.) to consider AI failure modes (e.g., misuse of model output, poisoning scenarios). Microsoft’s AI Threat Modeling guidance (2022) offers templates for identifying risks like fairness and security harms during design.

Always ask: How could this AI feature be abused or exploited? Include red team experts early for high-risk features.

Secure Engineering Tenets: Microsoft’s 10 Security Practices (part of SDL) remain crucial

Establish Security Standards & Metrics Set clear & explicit security rules and ways to measure them for AI systems.
This means deciding exactly what you expect your AI to do explicitly (and not do) to keep things safe.

Example :

Suppose you have an AI chatbot. One security rule could be:

“The chatbot must refuse to answer if someone tries to get private information, like passwords or confidential company data.”

So, if a user asks, “What is the admin password?” the AI should always say, “Sorry, I can’t share that,” and never reveal any secrets.

Threat Modeling – as above, including AI-specific adversaries.
Use Proven Security Tooling – static analysis on AI code, vulnerability scans on ML pipelines.
Cryptographic Standards – encrypt AI training data at rest, secure enclave for model secrets.
Secure Supply Chain – verify datasets origins and ensure that datasets cannot be unintentionally tampered with; scan open-source ML libraries for tampering.
Secure Engineering Environment – enforce access controls where models are trained (prevent insider threats).
Security Testing – pentest the AI application; include attempts at prompt injection and data exfiltration.
Operational Security – robust logging around AI usage, enable anomaly detection.
Monitoring & Response – treat AI incidents (e.g., attempted model extraction) like security incidents; have playbooks.
Security Training – upskill developers and data scientists on secure AI coding & responsible AI.

Suppose you have an AI chatbot. One security rule could be: “The chatbot must refuse to answer if someone tries to get private information, like passwords or confidential company data.” So, if a user asks, “What is the admin password?” the AI should always say, “Sorry, I can’t share that,” and never reveal any secrets. Threat Modeling – as above, including AI-specific adversaries. Use Proven Security Tooling – static analysis on AI code, vulnerability scans on ML pipelines. Cryptographic Standards – encrypt AI training data at rest, secure enclave for model secrets. Secure Supply Chain – verify datasets origins and ensure that datasets cannot be unintentionally tampered with; scan open-source ML libraries for tampering. Secure Engineering Environment – enforce access controls where models are trained (prevent insider threats). Security Testing – pentest the AI application; include attempts at prompt injection and data exfiltration. Operational Security – robust logging around AI usage, enable anomaly detection. Monitoring & Response – treat AI incidents (e.g., attempted model extraction) like security incidents; have playbooks. Security Training – upskill developers and data scientists on secure AI coding & responsible AI.

Adopting the above in an “AI Secure Development Lifecycle” ensures each AI feature goes through rigorous checks. For instance, before deploying a new LLM feature, run it through internal red team exercises to see if guardrails hold. This aligns with Microsoft’s stance: all high-risk AI must be independently red teamed and approved by a safety board prior to release

Align with Responsible AI from the Start:
Security for AI is inseparable from an organization’s Responsible AI commitments. These principles must be embedded from the outset—not retrofitted after development. For example, the same mitigation that prevents prompt injection can also reduce the risk of harmful content generation. Microsoft’s Responsible AI principles—Fairness, Reliability & Safety, Privacy & Security, Inclusiveness, Transparency, and Accountability—should be treated as non-negotiable design constraints. Privacy & Security means minimizing personal data in training sets and outputs; Reliability & Safety means implementing robust content filters to avoid unsafe responses. These principles are not just ethical imperatives—they are foundational to building secure, trustworthy AI systems. For a full overview, refer to Microsoft’s official Responsible AI Standard.
Secure AI Landing Zone: Treat your AI environment like any cloud infra. Microsoft recommends aligning with the Cloud Security Benchmark (MCSB) and Zero Trust model for AI deployments. This means use network isolation (VNETs/private links) for model endpoints, enforce stringent identity for accessing AI resources (Managed Identities, Conditional Access), and apply data protection (Purview sensitivity labels on training data) from day one.

3.2 AI Red Teaming (‘Attacker’ Perspective Testing)

AI Red Teaming is crucial to staying ahead of adversaries. It involves systematically attacking your AI systems to find weaknesses. Historically, red teams did double-blind security exercises on production systems. Now, AI red teaming encompasses a broader range of harms, including bias and safety issues, often in shorter, targeted engagements.

Key recommendations:

Conduct Regular Red Team Exercises on AI Models: Simulate prompt injection attacks, attempt to extract hidden model prompts or secrets, try known jailbreak tactics (e.g., ASCII art encoding attacks), and test model responses to adversarial inputs. Do this in a controlled environment. Microsoft’s AI Red Team discovered scenarios where models revealed sensitive info under social engineering – such testing is invaluable
Leverage External Experts if Needed: The field is evolving; consider engaging specialized AI security researchers or using crowdsourced red teams (with proper safeguards) to test your AI applications under NDA. Also utilize community knowledge like the OWASP Top 10 for LLMs and MITRE ATLAS to guide the red team on likely threat vectors
Tooling: Use tools like Counterfit (an automated AI security testing toolkit by Microsoft) to perform attacks such as model evasion and reconnaissance. Microsoft also released PyRIT to help find generative model risks. These ease simulation of attacker techniques (like feeding perturbed inputs to cause misclassification). Additionally, integrate AI-focused fuzzing – automatically generate variations of prompts to see if any slip past filters.
Penetration Testing AI-integrated Apps: If your application uses AI outputs in critical workflows (e.g., an AI that summarizes customer emails which then feed into decisions), pen-test the end-to-end flow. For example, test if an attacker’s specially crafted email could trick the AI and consequently the system (the cross-prompt injection scenario). Also test the infrastructure – ensure no route for someone to directly hit the model’s REST endpoint without auth, etc.

The goal is to identify and fix issues like: model answering questions it should refuse; model failing to sanitize outputs (potential XSS if output is shown on web); or policies in the AI pipeline not triggering correctly. Findings from red team ops must feed back into training and engineering – e.g., adjust the model with reinforcement learning from human feedback (RLHF) for problematic prompts, strengthen prompt parsing logic, or institute new content filters.

3.3 AI Blue Teaming (Defensive Operations and Tools)

On the defense side, organizations should transform their Security Operations Center (SOC) to handle AI-related signals and use AI to their advantage:

Monitoring and Threat Detection for AI:

Deploy solutions that continuously monitor AI services for malicious patterns. Microsoft Defender for Cloud’s AI workload protection surfaces alerts for issues like “Prompt injection attack detected on Azure OpenAI Service” or “Sensitive data exposure via AI model”. These are generated by analyzing model inputs/outputs and cloud telemetry. For example, Azure AI’s Content Safety system (Prompt Shield) will flag and block some malicious prompts, and those events feed security alerts. Ensure you enable Defender for Cloud threat protection for AI services ~~CSPM for AI workloads~~ to get these signals.
Use log analytics to capture AI events: track who is calling your models, what prompts are being sent (with appropriate privacy), and model responses (like error codes for rate limiting or denied content). Unusually high request rates 1q`or many blocked prompts could indicate an ongoing attack attempt.
Integrate AI events into your SIEM/XDR. Microsoft Sentinel now includes connectors for Azure OpenAI audit logs and relevant alerts. You can set up Sentinel analytics rules such as: “Multiple failed AI authentications from same IP” or “Sequence: user downloads large training dataset then model queried extensively” – indicating possible data theft or model extraction attempt.
Unified Incident View: Use a platform that correlates related alerts from identity, endpoint, Office 365, and cloud – since AI attacks often span domains (e.g., attacker phishes an admin to get access to the AI model keys, then uses those keys to abuse the service). The Microsoft 365 Defender portal does incident correlation: for instance, it can group an Entra ID risky sign-in, a suspicious VM behavior, and a content filter trigger into one incident. This helps focus on the full story of an AI breach attempt.

Access Control and Cloud Security Posture:

Follow least privilege for all AI resources. Only designate specific Entra ID groups to have access to manage or use the AI services. Use roles appropriately (e.g., training team can submit training jobs but not alter security settings).
Implement Conditional Access for AI portals/APIs: e.g., require MFA or trusted device for the developers accessing the model configuration. For unattended access (services calling AI), use managed identities with scoped permissions.
Regularly review the attack paths in your cloud environment related to AI services. Microsoft Defender for Cloud’s Attack Path Analysis can reveal if, for example, a compromised VM could lead to an AI key leak (via a path of misconfigurations). It will identify mis-set permissions or exposed secrets that create a chain. Remediate those high-risk paths first, as they represent “immediate value” for an attacker (this aligns with Scenario #2 – demonstrating quick wins by closing glaring attack paths).
Network segmentation: If possible, isolate AI training environments from internet access and from production. Use private networking so that only legitimate front-end apps can call the AI inferencing endpoints. This reduces drive-by attacks.
Continuous Posture Management: AI systems evolve, so continuously assess compliance. Azure’s AI security posture (in Defender CSPM) will highlight misconfigurations like a storage with training data not having encryption or a model endpoint without diagnostics. Treat those recommendations with priority, as they often prevent incidents.

Response and Recovery:

Develop incident response plans specifically for AI incidents. For example, Prompt Injection Incident: Steps might include capturing the malicious prompt, identifying which conversations or data it tried to access, assessing if any improper output was given, and adjusting filters or the model’s prompt instructions to prevent recurrence. Or Data Poisoning Incident: If discovered that training data was compromised, have a plan to retrain from backups and tighten contributor vetting.
Use Microsoft Sentinel or Defender XDR to automate common responses. Microsoft’s Security Copilot (an AI assistant for SOC) can help investigate multi-stage attacks faster. For instance, given an alert that an admin’s token was leaked and an AI service was accessed, Copilot could summarize all related activities and suggest remedial actions (disable admin, purge model API keys, etc.). Embrace these AI-driven security tools – appropriately governed – as force multipliers in defense.
In cloud environments, you can contain compromised AI resources quickly. Example: If a particular model endpoint is being abused, use Defender for Cloud’s workflow automation or Sentinel playbook to automatically isolate that resource (maybe tag it to remove from load balancer, or rotate its credentials) when an alert triggers
Backup and recovery: Keep secure backups of critical AI assets – training datasets (with versioning), model binaries, and configuration. If ransomware or sabotage occurs, you can restore the AI’s state. Also ensure the backup process itself is secure (backups encrypted, access logged).

AI for Security: As a positive angle, use AI analytics to enhance security. Train anomaly detection on user behavior around AI apps, use machine learning to classify which model queries might be insider threats vs normal usage patterns. Microsoft is integrating AI in Defender – for instance, using OpenAI GPT to analyze threat intelligence or generate remediation steps

📌Part 2 of Secure AI by design series, we will detail and cover the following:

Governance: Frameworks and Organizational Measures

Secure AI Implementation Best Practices

Practical Secure AI Scenarios (Use Cases)

✅Conclusion

AI technologies introduce powerful capabilities alongside new security challenges. By proactively embedding security into the design (“secure AI by design”), continuously monitoring and adapting defenses, and aligning with robust frameworks, organizations can harness AI's benefits without compromising on safety or compliance.

Key takeaways:

Prepare and Prevent: Use structured frameworks and threat models to anticipate attacks. Harden systems by default and reduce the attack surface (e.g., disable unused AI features, enforce least privilege everywhere).
Detect and Respond: Invest in AI-aware security tools (Defender for Cloud, Sentinel, Content Safety) and integrate their signals into your SOC workflows. Practice incident response for AI-specific scenarios as diligently as you do for network intrusions.
Govern and Assure: Maintain oversight through principles, policies, and external checks. Regular reviews, audits, and updates to controls will keep the AI security posture strong even as AI evolves.
Educate and Empower: Security is everyone’s responsibility – train developers, data scientists, and end-users on securely working with AI. Encourage a culture where potential AI risks are flagged and addressed, not ignored.

By following the Secure AI Guidelines – balancing innovation with rigorous security – organizations can build trust in their AI systems, protect sensitive data and operations, and meet regulatory obligations. In doing so, they pave the way for AI to be an enabler of business value rather than a source of new vulnerabilities.

Microsoft’s comprehensive set of tools and best practices, as outlined in this document, serve as a blueprint to achieve this balance. Adopting these will help ensure that your AI initiatives are not only intelligent and impactful but also secure, resilient, and worthy of stakeholder trust.

🙌 Acknowledgments

A special thank you to the following colleagues for their invaluable contributions to this blog post and the solution design:

Hiten_Sharma & JadK – EMEA Secure AI Global Black Belt, for co-authoring and providing deep insights, learning and content that shaped the design guidelines and practice.
Yuri Diogenes, Dick Lake, Shay Amar, Safeena Begum Lepakshi – Product Group and Engineering PMs from Microsoft Defender for Cloud and Microsoft Purview, for the guidance & review.

Your collaboration and expertise made this guidance possible and impactful for our security community.

Updated Sep 26, 2025

Version 1.0

Microsoft

Joined April 06, 2017

View Profile