Blog Post

Apps on Azure Blog
6 MIN READ

Securing Your AI Agents Before They Ship: Red Teaming with Microsoft PyRIT

vsriramdas's avatar
vsriramdas
Icon for Microsoft rankMicrosoft
Apr 28, 2026

As AI engineers, we're shipping agents that reason, call tools, and act on behalf of users β€” but most of us haven't security-tested them. Microsoft's PyRIT framework gives you 53+ adversarial datasets, 70+ prompt converters, and 6 attack strategies out of the box. But PyRIT is a toolkit β€” it gives you the building blocks, not the pipeline. In this post, I walk through how to wrap PyRIT into a config-driven scanner with OWASP mapping, release gating, and CI/CD integration β€” so your team can start red-teaming AI agents in an afternoon, not weeks.

Securing Your AI Agents Before They Ship: Red Teaming with Microsoft PyRIT

 

You wouldn't ship a web app without running OWASP ZAP or Snyk. So why are AI agents going to production without a single security scan? Prompt injection, data leakage, system prompt theft β€” the OWASP Top 10 for LLM Applications reads like a checklist of things most teams haven't tested for.

PyRIT is Microsoft's open-source answer: an automation framework battle-tested on 100+ products including Copilot. But here's the catch β€” PyRIT is a research library. To make it work in a real engineering workflow, you need to wrap it. This post shows you how.

In this post:

  1. Why AI red teaming is fundamentally different from traditional security testing
  2. What PyRIT gives you out of the box
  3. How to build a thin wrapper that turns PyRIT into a config-driven, pipeline-ready scanner
  4. When and how to plug it into your CI/CD workflow
  5. Customizing every step for your threat model

πŸ›‘οΈ Why AI Red Teaming Is Different

If you're building agentic AI β€” systems that reason, call tools, and take actions β€” you already know that traditional security testing doesn't cut it. Microsoft's AI Red Team learned this the hard way after red-teaming 100+ generative AI products.

Three things make AI red teaming unique:

  1. You're testing two risk surfaces at once β€” security vulnerabilities (prompt injection, data exfiltration) *and* responsible AI harms (bias, toxicity, manipulation). Traditional pen testers focus on one.
  2. Outputs are probabilistic β€” the same prompt can produce different responses across runs. You can't just assert on a fixed output. You need automated scoring at scale.
  3. Every architecture is different β€” standalone chatbots, RAG pipelines, multi-agent workflows, tool-calling agents. A single test harness has to flex across all of them.

 

The OWASP LLM Top 10 (2025) gives us the taxonomy β€” prompt injection, sensitive information disclosure, excessive agency, system prompt leakage, data poisoning, supply chain risks, improper output handling, embedding weaknesses, misinformation, and unbounded consumption. Every AI agent you deploy is exposed to all ten. The question is whether *you* discover the gaps or your users do.

πŸ”§ What PyRIT Gives You

PyRIT (Python Risk Identification Tool) started as internal scripts at Microsoft in 2022. Today it's a 3,800-star, MIT-licensed framework with 129 contributors and a published paper.

 "We were able to pick a harm category, generate several thousand malicious prompts, and use PyRIT's scoring engine to evaluate the output from the Copilot system β€” all in the matter of hours instead of weeks." β€” Microsoft Security Blog

The building blocks:

  1. 53+ datasets β€” AIRT, HarmBench, AdvBench, XSTest, and more. Curated adversarial prompts covering content harms, jailbreaks, data exfiltration, and social bias.
  2. 70+ prompt converters β€” Base64, ROT13, Leetspeak, Unicode confusables, LLM-powered rephrasing, translation, multimodal injection. They stack β€” a prompt can be translated, then Base64-encoded, then embedded in an image.
  3. 6 attack strategies β€” from simple `PromptSendingAttack` (single-turn) to `CrescendoAttack` (gradual escalation), `TreeOfAttacksWithPruning` (TAP), and multi-turn dialogue attacks.
  4. 20+ scorers β€” LLM-as-judge, Azure AI Content Safety, true/false classifiers, Likert scales.
  5. 10+ targets β€” OpenAI, Azure, HuggingFace, HTTP endpoints, Playwright, WebSockets.

 

 

This is powerful β€”  PyRIT gives you the components β€” datasets, converters, attack strategies, scorers β€” but not the glue. You still need something that loads a config, wires the right components together, runs attacks, scores the results, and tells your pipeline pass or fail. That's what a wrapper does.

πŸ—οΈ Building an Enterprise Wrapper

The idea is simple: take PyRIT's primitives and compose them into an opinionated, config-driven pipeline that any developer can run with a single command. Below is given the idea on how we can create the wrapper around PyRIT to make it useful for agentic ai security testing, but this is not limited.

The Flow

Everything starts with a YAML config and ends with a pass/fail exit code:

The key insight: every step in this pipeline is configurable through YAML, not code. Switching attack strategies, adding converters, adjusting thresholds β€” it's all one config edit away.

Project Structure

At POC level, the wrapper is lean β€” an orchestrator that stitches PyRIT's components together:

 

The heart of it is `runner.py` β€” a single orchestrator that:

  1. Reads the YAML config
  2. Dynamically imports your agent and wraps it as a PyRIT target
  3. Loads datasets (built-in + your custom prompts)
  4. Runs attacks in escalating phases β€” plain β†’ encoded β†’ semantic β†’ multi-turn
  5. Scores every response using LLM-as-judge
  6. Maps findings to OWASP LLM Top 10 categories
  7. Generates reports (JSON, Markdown)
  8. Evaluates the release gate β€” pass or fail

That's it. One file doing the orchestration, a handful of helpers, and a YAML config. You can start here and grow the structure as your needs evolve.

The Configuration

A single YAML file drives everything:

scan:
  name: "my-agent-security-scan"

  # Point to your agent β€” any Python callable
  target:
    module: "my_agent.main"
    function: "handle_prompt"

  # Attack strategies to run
  strategies:
    - name: "prompt_sending"         # Single-turn baseline
    - name: "crescendo"              # Gradual escalation
      max_turns: 5

  # Prompt transformations β€” they stack automatically
  converters:
    - name: "Base64Converter"
    - name: "LeetspeakConverter"
    - name: "TranslationConverter"

  # Adversarial prompt sets
  datasets:
    - "airt_illegal"                 # Built-in: Microsoft AI Red Team
    - "harmbench"                    # Built-in: harm evaluation benchmark
    - path: "datasets/custom_prompts.yaml"

# Release gate thresholds
gate:
  fail_on:
    critical_findings: 0             # Zero tolerance for critical
    high_findings: 3
    max_score: 0.3                   # Defense rate must exceed 70%

 

Point the target at your agent, pick your strategies and datasets, set your thresholds β€” run. Teams can start scanning their agents in an afternoon, not weeks.

πŸ”„ Plugging Into Your Pipeline

Since the wrapper is a pip-installable package(we can use setup tools or poetry to build and make it pip installable), integrating it into any CI/CD system is straightforward β€” `pip install`, then call the CLI. No custom actions or marketplace extensions needed.

 

The key decision is when to run scans. Not every merge needs a full red team pass. Here's what works in practice:

 

 

The idea is that developers can optionally run quick scans locally as a fast feedback loop, while full scans are manually triggered or approval-gated β€” the tech lead or architect decides when it's worth running a comprehensive assessment based on the nature of the changes.

 

Since it's just a CLI, integration is the same everywhere β€” GitHub Actions, Azure DevOps, Jenkins, or a shell script. Install the package, call `pyrit-scan run`, check the exit code.

βš™οΈ Customization Without Forking

The whole point of a wrapper is that teams customize behavior through configuration β€” not by modifying framework code.

What to CustomizeHowExample
Which agent to testPoint target.module + target.function in YAML to any Python callableYour chatbot, RAG pipeline, or multi-agent workflow
Attack strategiesAdd/remove entries under strategies in YAMLStart with prompt_sending, add crescendo when ready
Prompt transformationsList converters in YAML β€” they stack automaticallyBase64 β†’ Leetspeak β†’ Translation = multi-phase evasion
DatasetsUse built-in (53+) or add custom YAML prompt filesHIPAA prompts, financial compliance scenarios
Scoring thresholdsSet per-OWASP-category thresholds in gate.fail_onZero tolerance for data leakage (LLM02), relaxed for misinformation (LLM09)
Report formatsList formats in reporting.formatsJSON for automation, PDF for compliance, JUnit for dashboards
New attack classesRegister via custom_attacks in YAML β€” module + class nameNo framework code change, no PR needed

🎯 Start Red Teaming Today

AI red teaming isn't a nice-to-have anymore. If you're shipping agentic AI β€” systems that call tools, access data, and take actions on behalf of users β€” you need automated security testing in your pipeline.

PyRIT gives you the primitives. A thin wrapper gives you the automation. Together, they turn AI security from a one-off exercise into a continuous, measurable practice.

The pattern: YAML config β†’ wrap your agent β†’ run attacks β†’ score β†’ map to OWASP β†’ gate the release.

Build it once. Run it on every release. Sleep better.

Resources

Published Apr 28, 2026
Version 1.0
No CommentsBe the first to comment