Generative AI is reshaping how enterprises operate, introducing new efficiencies—and new risks. Imagine launching a helpful chatbot, only to learn a cleverly crafted prompt can bypass safety controls and exfiltrate sensitive data. This is today’s reality: every system prompt, plugin/tool, dataset, fine‑tune, or orchestration step can change the attack surface. This is due to the non deterministic nature in how LLM models craft responses in output. The slightest change in the input of a prompt in verbiage or tone can change the outcome of a output in subtle but non predictable ways especially when your data is involved.
Generative AI is reshaping how enterprises operate, introducing new efficiencies—and new risks. Imagine launching a helpful chatbot, only to learn a cleverly crafted prompt can bypass safety controls and exfiltrate sensitive data. This is today’s reality: every system prompt, plugin/tool, dataset, fine‑tune, or orchestration step can change the attack surface. This is due to the non deterministic nature in how LLM models craft responses in output. The slightest change in the input of a prompt in verbiage or tone can change the outcome of a output in subtle but non predictable ways especially when your data is involved.
This post shows how to operationalize AI red teaming with Microsoft Defender for AI services so security teams gain evidence‑backed visibility into adversarial behavior and turn that visibility into daily defense. By aligning with Microsoft’s Responsible AI principles of transparency, accountability, and continuous improvement, we demonstrate a pragmatic, repeatable loop that makes AI safer week after week. Crucially, security needs to have a seat at the table across the AI app lifecycle from model selection and pilot to production and ongoing updates.
Who Should Read This (and What You’ll See)
- SOC analysts & incident responders - See how AI signals materialize as high‑fidelity alerts (prompt evidence, URL intel, identity context) in Defender for Cloud and Defender XDR for fast triage and correlation.
- AI/ML engineers - Validate model safety with controlled simulations (PyRIT‑informed strategies) and understand which filters/guardrails move the needle.
- Security architects - Integrate Microsoft Defender for AI services into your cloud security program; codify improvements as policy, IaC, and identity hygiene.
- Red teamers/researchers - Run structured, repeatable adversarial tests that produce measurable outcomes the org can act on.
Why now? Data leakage, prompt injection, jailbreaks, and endpoint abuse are among the fastest‑growing threats to AI systems. With AI red teaming and Microsoft Defender for AI services, you catch intent before impact and translate insight into durable controls.
What’s Different About the AI Attack Surface
New risks sit alongside the traditional ones:
- Prompts & responses — Susceptible to prompt injection and jailbreak attempts (rule change, role‑play, encoding/obfuscation).
- User & application context — Missing context slows investigations and blurs accountability.
- Model endpoints & identities — Static keys and weak identity practices increase credential theft and scripted probing risk.
- Attached data (RAG/fine‑tuning) — Indirect prompt injection via documents or data sources.
- Orchestration layers/agents — Tool invocation abuse, unintended actions, or “over‑permissive” chains.
- Content & safety filters — Configuration drift or silent loosening erodes protection.
A key theme across these risks is context propagation. The way user identity, application parameters, and environmental signals travel with each prompt and response. When context is preserved and surfaced in security alerts, SOC teams can quickly correlate incidents, trace attack paths, and remediate threats with precision. Effective context propagation transforms raw signals into actionable intelligence, making investigations faster and more accurate.
Microsoft Defender for AI services adds a real‑time protection layer across this surface by combining Prompt Shields, activity monitoring, and Microsoft threat intelligence to produce high‑fidelity alerts you can operationalize.
The Improvement Loop (Responsible AI in Practice)
Responsible AI comes to life when teams Observe → Correlate → Remediate → Retest → Codify:
- Observe controlled jailbreak/phishing/automation patterns and collect prompt evidence.
- Correlate with identity, network, and prior incidents in Defender XDR.
- Remediate with the smallest effective control (filters, identities, rate limits, data scoping).
- Retest the same scenario to verify risk reduction.
- Codify as baseline (policy, IaC template, guardrail profile, rotation notes).
Repeat this rhythm on a schedule and you’ll build durable posture faster than a one‑time “big‑bang” control set.
Prerequisites:
To take advantage of this workshop you’ll need:
- Sandbox subscription (ideally inside a Sandbox Mgmt Group with lighter policies), you may also leverage a Free trial of Azure Subscription as well.
- Microsoft Defender for AI services plan enabled (see Participant Guide )
- Contributor access (you can deploy + view alerts)
- Region capacity confirmed (Azure AI Foundry in East US 2)
Workshop flow and testing:
Prep: Enable the Microsoft Defender for AI services plan with prompt evidence, deploy the Azure Template (one hub + single endpoint), and open the AIRT-Eval.ipynb notebook; you now have a controlled space to generate signals(see Participant Guide).
Controlled Signals: Trigger against a jailbreak attempt, a phishing URL simulation, and a suspicious user agent simulation to produce three distinct alert types.
Triage & Correlate: For each alert, review anatomy (evidence, severity, IDs) and capture prompt/URL evidence.
Harden & Retest: Apply improvements or security controls, then validate fixes.
After you harden controls and retest, the next step is validating that your defenses trigger the right alerts on demand. There is a list of Microsoft Defender for AI services alerts here. To evaluate alerts, open DfAI‑Eval.ipynb - a streamlined notebook that safely simulates adversarial activity (current alerts: jailbreak, phishing URL, suspicious user agent) to exercise Microsoft Defender for AI services detections. Think of it as the EICAR test for AI workloads: consistent, repeatable, and safe.
Next, we will review and break down each of the alerts you’ll generate in the workshop and how to read them effectively.
Anatomy of Jailbreak from AI Red team Agent:
A jailbreak is a user prompt designed to sidestep system or safety instructions—rule‑change (“ignore previous rules”), fake embedded conversation, role‑play as an unrestricted persona, or encoding tricks. Microsoft Defender for AI services (via Prompt Shields + threat intelligence) flags it before unsafe output (“left‑of‑boom”) and publishes a correlated high‑fidelity alert into Defender XDR for cross‑signal investigation.
Anatomy of a Phishing involved in an attack:
Phishing prompt URL alerts fire when a prompt or draft response embeds domains linked to impersonation, homoglyph tricks, newly registered infrastructure, encoded redirects, or reputation‑flagged hosting. Microsoft Defender for AI services enriches the URL (normalization, age, reputation, brand similarity) and—if prompt evidence is enabled—includes the exact snippet, then streams the alert into Defender XDR where end‑user/application context fields (e.g. `EndUserId`, `SourceIP`) let analysts correlate repeated lure attempts and pivot to related credential or jailbreak activity.
Anatomy of a User Agent involved in an attack:
Suspicious user agent alerts highlight enumeration or automation patterns (generic library signatures, headless runners, scanner strings, cadence anomalies) tied to AI endpoint usage and identity context. Microsoft Defender for AI services scores the anomaly and forwards it to Defender XDR enriched with optional `UserSecurityContext` (IP, user ID, application name) so analysts can correlate rapid probing with concurrent jailbreak or phishing alerts and enforce mitigations like managed identity, rate limits, or user agent filtering.
Conclusion
The goal of this Red teaming and AI Threat workshop amongst the different attendees is to catch intent before impact, prompt manipulation before unsafe output, phishing infrastructure before credential loss, and scripted probing before exfiltration. Microsoft Defender for AI services feeding Defender XDR enables a compact improvement loop that converts red team findings into operational guardrails.
Within weeks, this cadence transforms AI from experimental liability into a governed, monitored asset aligned with your cloud security program. Incrementally closing gaps within context propagation, identity hygiene, Prompt Shields & filter tuning—builds durable posture. Small, focused cycles win ship one improvement, measure its impact, promote to baseline, and repeat.