Marketplace blog

8 MIN READ

Designing AI guardrails for apps and agents in Marketplace

Julio_Colon

Microsoft

Apr 06, 2026

For software companies building, publishing, and selling AI apps and agents in Microsoft Marketplace, this article provides practical guidance on designing enforceable guardrails. Learn how to pass certification, streamline onboarding, and earn enterprise trust while enabling safe AI autonomy at scale.

Why guardrails are essential for AI apps and agents

AI apps and agents introduce capabilities that go beyond traditional software. They reason over natural language, interact with data across boundaries, and—in the case of agents—can take autonomous actions using tools and APIs. Without clearly defined guardrails, these capabilities can unintentionally compromise confidentiality, integrity, and availability, the foundational pillars of information security.

From a confidentiality perspective, AI systems often process sensitive prompts, contextual data, and outputs that may span customer tenants, subscriptions, or external systems. Guardrails ensure that data access is explicit, scoped, and enforced—rather than inferred through prompts or emergent model behavior.

From an availability perspective, AI apps and agents can fail in ways traditional software does not — such as runaway executions, uncontrolled chains of tool calls, or usage spikes that drive up cost and degrade service. Guardrails address this by setting limits on how the system executes, how often it calls tools, and how it behaves when something goes wrong.

For Marketplace-ready AI apps and agents, guardrails are foundational design elements that balance innovation with security, reliability, and responsible AI practices. By making behavioral boundaries explicit and enforceable, guardrails enable AI systems to operate safely at scale—meeting enterprise customer expectations and Marketplace requirements from day one.

This post is part of a series on building and publishing well-architected AI apps and agents on Microsoft Marketplace.

Using Open Worldwide Application Security Project (OWASP) GenAI Top 10 as a guardrail design lens

The OWASP GenAI Top 10 provides a practical framework for reasoning about AI‑specific risks that are not fully addressed by traditional application security models. It helps teams identify where assumptions about trust, input handling, autonomy, and data access are most likely to break down in AI‑driven systems.

However, not all OWASP risks apply equally to every AI app or agent. Their relevance depends on factors such as:

Agent autonomy, including whether the system can take actions without human approval
Data access patterns, especially cross‑tenant, cross‑subscription, or external data retrieval
Integration surface area, meaning the number and type of tools, APIs, and external systems the agent connects to

Because of this variability, OWASP should not be treated as a checklist to implement wholesale. Doing so can lead teams to over‑engineer controls in low‑risk areas while leaving critical gaps in places where autonomy, data movement, or tool execution create real exposure. Instead, OWASP is most effective when used as a design lens — to inform where guardrails are needed and what behaviors require explicit boundaries.

Understanding risks and enforcing boundaries are two different things. OWASP tells you where to look; guardrails are what you actually build. The goal is not to eliminate all risk, but to use OWASP insights to design selective, intentional guardrails that align with the system's architecture, autonomy, and operating context.

Translating AI risks into architectural guardrails

OWASP GenAI Top 10 helps identify where AI systems are vulnerable, but guardrails are what make those risks enforceable in practice. Guardrails are most effective when they are implemented as architectural constraints—designed into the system—rather than as runtime patches added after risky behavior appears.

In AI apps and agents, many risks emerge not from a single component, but from how prompts, tools, data, and actions interact. Architectural guardrails establish clear boundaries around these interactions, ensuring that risky behavior is prevented by design rather than detected too late.

Common guardrail categories map naturally to the types of risks highlighted in OWASP:

Input and prompt constraints
Address risks such as prompt injection, system prompt leakage, and unintended instruction override by controlling how inputs are structured, validated, and combined with system context.

Action and tool‑use boundaries
Mitigate risks related to excessive agency and unintended actions by explicitly defining which tools an AI app or agent can invoke, under what conditions, and with what scope.

Data access restrictions
Reduce exposure to sensitive information disclosure and cross‑boundary leakage by enforcing identity‑aware, context‑aware access to data sources rather than relying on prompts to imply intent.

Output validation and moderation
Help contain risks such as misinformation, improper output handling, or policy violations by treating AI output as untrusted and subject to validation before it is acted on or returned to users.

What matters most is where these guardrails live in the architecture. Effective guardrails sit at trust boundaries—between users and models, models and tools, agents and data sources, and control planes and data planes. When guardrails are embedded at these boundaries, they can be applied consistently across environments, updates, and evolving AI capabilities.

By translating identified risks into architectural guardrails, teams move from risk awareness to behavioral enforcement. This shift is foundational for building AI apps and agents that can operate safely, predictably, and at scale in Marketplace environments.

Design‑time guardrails: shaping allowed behavior before deployment

The OWASP GenAI Top 10 provides a practical framework for reasoning about AI specific risks that are not fully addressed by traditional application security models. It helps teams identify where assumptions about trust, input handling, autonomy, and data access are most likely to break down in AI driven systems.

However, not all OWASP risks apply equally to every AI app or agent. Their relevance depends on factors such as:

Agent autonomy, including whether the system can take actions without human approval
Data access patterns, especially cross-tenant, cross subscription, or external data retrieval
Integration surface area, meaning the number and type of tools, APIs, and external systems the agent connects to

Because of this variability, OWASP should not be treated as a checklist to implement wholesale. Doing so can lead teams to over engineer controls in low risk areas while leaving critical gaps in places where autonomy, data movement, or tool execution create real exposure. Instead, OWASP is most effective when used as a design lens — to inform where guardrails are needed and what behaviors require explicit boundaries.

Runtime guardrails: enforcing boundaries as systems operate

For Marketplace publishers, the key distinction between monitoring and runtime guardrails is simple:

Monitoring tells you what happened after the fact.
Runtime guardrails are inline controls that can block, pause, throttle, or require approval before an action completes.

If you want prevention, the control has to sit in the execution path. At runtime, guardrails should constrain three areas:

Agent decision paths (prevent runaway autonomy)
- Cap planning and execution. Limit the agent to a maximum number of steps per request, enforce a maximum wall‑clock time, and stop repeated loops.
- Apply circuit breakers. Terminate execution after a specified number of tool failures or when downstream services return repeated throttling errors.
- Require explicit escalation. When the agent’s plan shifts from “read” to “write,” pause and require approval before continuing.
Tool invocation patterns (control what gets called, how, and with what inputs)
- Enforce allowlists. Allow only approved tools and operations, and block any attempt to call unregistered endpoints.
- Validate parameters. Reject tool calls that include unexpected tenant identifiers, subscription scopes, or resource paths.
- Throttle and quota. Rate‑limit tool calls per tenant and per user, and cap token/tool usage to prevent cost spikes and degraded service.
Cross‑system actions (constrain outbound impact at the boundary you control) Runtime guardrails cannot “reach into” external systems and stop independent agents operating elsewhere. What publishers can do is enforce policy at your solution’s outbound boundary: the tool adapter, connector, API gateway, or orchestration layer that your app or agent controls. Concrete examples include:
- Block high‑risk operations by default (delete, approve, transfer, send) unless a human approves.
- Restrict write operations to specific resources (only this resource group, only this SharePoint site, only these CRM entities).
- Require idempotency keys and safe retries so repeated calls do not duplicate side effects.
- Log every attempted cross‑system write with identity, scope, and outcome, and fail closed when policy checks cannot run.

Done well, runtime guardrails produce evidence, not just intent. They show reviewers that your AI app or agent enforces least privilege, prevents runaway execution, and limits blast radius—even when the model output is unpredictable.

Guardrails across data, identity, and autonomy boundaries

Guardrails don't work in silos. They are only effective when they align across the three core boundaries that shape how an AI app or agent operates — identity, data, and autonomy.

Guardrails must align across:

Identity boundaries (who the agent acts for) — represent the credentials the agent uses, the roles it assumes, and the permissions that flow from those identities. Without clear identity boundaries, agent actions can appear legitimate while quietly exceeding the authority that was actually intended.
Data boundaries (what the agent can see or retrieve) — ensuring access is governed by explicit authorization and context, not by what the model infers or assumes. A poorly scoped data boundary doesn't just create exposure — it creates exposure that is hard to detect until something goes wrong.
Autonomy boundaries (what the agent can decide or execute) — defining which actions require human approval, which can proceed automatically, and which are never permitted regardless of context. Autonomy without defined limits is one of the fastest ways for behavior to drift beyond what was ever intended.

When these boundaries are misaligned, the consequences are subtle but serious. An agent may act under the authority of one identity, access data scoped to another, and execute with broader autonomy than was ever granted — not because a single control failed, but because the boundaries were never reconciled with each other. This is how unintended privilege escalation happens in well-intentioned systems.

Balancing safety, usefulness, and customer trust

Getting guardrails right is less about adding controls and more about placing them well. Too restrictive, and legitimate workflows break down, safe autonomy shrinks, and the system becomes more burden than benefit. Too permissive, and the risks accumulate quietly — surfacing later as incidents, audit findings, or eroded customer trust.

Effective guardrails share three characteristics that help strike that balance:

Transparent — customers and operators understand what the system can and cannot do, and why those limits exist
Context-aware — boundaries tighten or relax based on identity, environment, and risk, without blocking safe use
Adjustable — guardrails evolve as models and integrations change, without compromising the protections that matter most

When these characteristics are present, guardrails naturally reinforce the foundational principles of information security — protecting confidentiality through scoped data access, preserving integrity by constraining actions to authorized paths, and supporting availability by preventing runaway execution and cascading failures.

How guardrails support Marketplace readiness

For AI apps and agents in Microsoft Marketplace, guardrails are a practical enabler — not just of security, but of the entire Marketplace journey. They make complex AI systems easier to evaluate, certify, and operate at scale.

Guardrails simplify three critical aspects of that journey:

Security and compliance review — explicit, architectural guardrails give reviewers something concrete to assess. Rather than relying on documentation or promises, behavior is observable and boundaries are enforceable from day one.
Customer onboarding and trust — when customers can see what an AI system can and cannot do, and how those limits are enforced, adoption decisions become easier and time to value shortens. Clarity is a competitive advantage.
Long-term operation and scale — as AI apps evolve and integrate with more systems, guardrails keep the blast radius contained and prevent hidden privilege escalation paths from forming. They are what makes growth manageable.

Marketplace-ready AI systems don't describe their guardrails — they demonstrate them. That shift, from assurance to evidence, is what accelerates approvals, builds lasting customer trust, and positions an AI app or agent to scale with confidence.

What’s next in the journey

Guardrails establish the foundation for safe, predictable AI behavior — but they are only the beginning. The next phase extends these boundaries into governance, compliance, and day‑to‑day operations through policy definition, auditing, and lifecycle controls. Together, these mechanisms ensure that guardrails remain effective as AI apps and agents evolve, scale, and operate within enterprise environments.