marketplace ai apps & agents
24 TopicsDesigning AI guardrails for apps and agents in Marketplace
Why guardrails are essential for AI apps and agents AI apps and agents introduce capabilities that go beyond traditional software. They reason over natural language, interact with data across boundaries, and—in the case of agents—can take autonomous actions using tools and APIs. Without clearly defined guardrails, these capabilities can unintentionally compromise confidentiality, integrity, and availability, the foundational pillars of information security. From a confidentiality perspective, AI systems often process sensitive prompts, contextual data, and outputs that may span customer tenants, subscriptions, or external systems. Guardrails ensure that data access is explicit, scoped, and enforced—rather than inferred through prompts or emergent model behavior. From an availability perspective, AI apps and agents can fail in ways traditional software does not — such as runaway executions, uncontrolled chains of tool calls, or usage spikes that drive up cost and degrade service. Guardrails address this by setting limits on how the system executes, how often it calls tools, and how it behaves when something goes wrong. For Marketplace-ready AI apps and agents, guardrails are foundational design elements that balance innovation with security, reliability, and responsible AI practices. By making behavioral boundaries explicit and enforceable, guardrails enable AI systems to operate safely at scale—meeting enterprise customer expectations and Marketplace requirements from day one. You can always get a curated step-by-step guidance through building, publishing and selling apps for Marketplace through App Advisor. This post is part of a series on building and publishing well-architected AI apps and agents in Microsoft Marketplace. The series focuses on AI apps and agents that are architected, hosted, and operated on Azure, with guidance aligned to building and selling solutions through Microsoft Marketplace. Using Open Worldwide Application Security Project (OWASP) GenAI Top 10 as a guardrail design lens The OWASP GenAI Top 10 provides a practical framework for reasoning about AI‑specific risks that are not fully addressed by traditional application security models. It helps teams identify where assumptions about trust, input handling, autonomy, and data access are most likely to break down in AI‑driven systems. However, not all OWASP risks apply equally to every AI app or agent. Their relevance depends on factors such as: Agent autonomy, including whether the system can take actions without human approval Data access patterns, especially cross‑tenant, cross‑subscription, or external data retrieval Integration surface area, meaning the number and type of tools, APIs, and external systems the agent connects to Because of this variability, OWASP should not be treated as a checklist to implement wholesale. Doing so can lead teams to over‑engineer controls in low‑risk areas while leaving critical gaps in places where autonomy, data movement, or tool execution create real exposure. Instead, OWASP is most effective when used as a design lens — to inform where guardrails are needed and what behaviors require explicit boundaries. Understanding risks and enforcing boundaries are two different things. OWASP tells you where to look; guardrails are what you actually build. The goal is not to eliminate all risk, but to use OWASP insights to design selective, intentional guardrails that align with the system's architecture, autonomy, and operating context. Translating AI risks into architectural guardrails OWASP GenAI Top 10 helps identify where AI systems are vulnerable, but guardrails are what make those risks enforceable in practice. Guardrails are most effective when they are implemented as architectural constraints—designed into the system—rather than as runtime patches added after risky behavior appears. In AI apps and agents, many risks emerge not from a single component, but from how prompts, tools, data, and actions interact. Architectural guardrails establish clear boundaries around these interactions, ensuring that risky behavior is prevented by design rather than detected too late. Common guardrail categories map naturally to the types of risks highlighted in OWASP: Input and prompt constraints Address risks such as prompt injection, system prompt leakage, and unintended instruction override by controlling how inputs are structured, validated, and combined with system context. Action and tool‑use boundaries Mitigate risks related to excessive agency and unintended actions by explicitly defining which tools an AI app or agent can invoke, under what conditions, and with what scope. Data access restrictions Reduce exposure to sensitive information disclosure and cross‑boundary leakage by enforcing identity‑aware, context‑aware access to data sources rather than relying on prompts to imply intent. AI Output validation and moderation Help contain risks such as misinformation, improper output handling, or policy violations by treating AI output as untrusted and subject to validation before it is acted on or returned to users. What matters most is where these guardrails live in the architecture. Effective guardrails sit at trust boundaries—between users and models, models and tools, agents and data sources, and control planes and data planes. When guardrails are embedded at these boundaries, they can be applied consistently across environments, updates, and evolving AI capabilities. By translating identified risks into architectural guardrails, teams move from risk awareness to behavioral enforcement that supports safe AI agent operation. This shift is foundational for building AI apps and agents that can operate safely, predictably, and at scale in Marketplace environments. Design‑time guardrails: shaping allowed behavior before deployment The OWASP GenAI Top 10 provides a practical framework for reasoning about AI specific risks that are not fully addressed by traditional application security models. It helps teams identify where assumptions about trust, input handling, autonomy, and data access are most likely to break down in AI driven systems. However, not all OWASP risks apply equally to every AI app or agent. Their relevance depends on factors such as: Agent autonomy, including whether the system can take actions without human approval Data access patterns, especially cross-tenant, cross subscription, or external data retrieval Integration surface area, meaning the number and type of tools, APIs, and external systems the agent connects to Because of this variability, OWASP should not be treated as a checklist to implement wholesale. Doing so can lead teams to over engineer controls in low risk areas while leaving critical gaps in places where autonomy, data movement, or tool execution create real exposure. Instead, OWASP is most effective when used as a design lens — to inform where guardrails are needed and what behaviors require explicit boundaries. Understanding risks and enforcing boundaries are two different things. OWASP tells you where to look; guardrails are what you actually build. The goal is not to eliminate all risk, but to use OWASP insights to design selective, intentional guardrails that align with the system's architecture, autonomy, and operating context. Runtime guardrails: enforcing boundaries as systems operate For Marketplace publishers, the key distinction between monitoring and runtime guardrails is simple: Monitoring tells you what happened after the fact. Runtime guardrails are inline controls—part of runtime AI safety controls—that can block, pause, throttle, or require approval before an action completes. If you want prevention, the control has to sit in the execution path. At runtime, guardrails should constrain three areas: Agent decision paths (prevent runaway autonomy) Cap planning and execution. Limit the agent to a maximum number of steps per request, enforce a maximum wall‑clock time, and stop repeated loops. Apply circuit breakers. Terminate execution after a specified number of tool failures or when downstream services return repeated throttling errors, reinforcing autonomous agent limits. Require explicit escalation. When the agent’s plan shifts from “read” to “write,” pause and require approval before continuing. Tool invocation patterns (control what gets called, how, and with what inputs) Enforce allowlists. Allow only approved tools and operations, and block any attempt to call unregistered endpoints. Validate parameters. Reject tool calls that include unexpected tenant identifiers, subscription scopes, or resource paths. Throttle and quota. Rate‑limit tool calls per tenant and per user, and cap token/tool usage to prevent cost spikes and degraded service. Cross‑system actions (constrain outbound impact at the boundary you control) Runtime guardrails cannot “reach into” external systems and stop independent agents operating elsewhere. What publishers can do is enforce policy at your solution’s outbound boundary: the tool adapter, connector, API gateway, or orchestration layer that your app or agent controls. Concrete examples include: Block high‑risk operations by default (delete, approve, transfer, send) unless a human approves. Restrict write operations to specific resources (only this resource group, only this SharePoint site, only these CRM entities). Require idempotency keys and safe retries so repeated calls do not duplicate side effects. Log every attempted cross‑system write with identity, scope, and outcome, and fail closed when policy checks cannot run. Done well, runtime guardrails produce evidence, not just intent. They show reviewers that your AI app or agent enforces least privilege, prevents runaway execution, and limits blast radius—even when the model output is unpredictable. Guardrails across data, identity, and autonomy boundaries Guardrails don't work in silos. They are only effective when they align across the three core boundaries that shape how an AI app or agent operates — identity, data, and autonomy. Guardrails must align across: Identity boundaries (who the agent acts for) — represent the credentials the agent uses, the roles it assumes, and the permissions that flow from those identities. Without clear identity boundaries, agent actions can appear legitimate while quietly exceeding the authority that was actually intended. Data boundaries (what the agent can see or retrieve) — ensuring access is governed by explicit authorization and context, not by what the model infers or assumes. A poorly scoped data boundary doesn't just create exposure — it creates exposure that is hard to detect until something goes wrong. Autonomy boundaries (what the agent can decide or execute) — defining which actions require human approval, which can proceed automatically, and which are never permitted regardless of context. Autonomy without defined limits is one of the fastest ways for behavior to drift beyond what was ever intended. When these boundaries are misaligned, the consequences are subtle but serious. An agent may act under the authority of one identity, access data scoped to another, and execute with broader autonomy than was ever granted — not because a single control failed, but because the boundaries were never reconciled with each other. This is how unintended privilege escalation happens in well-intentioned systems. Balancing safety, usefulness, and customer trust Getting guardrails right is less about adding controls and more about placing them well. Too restrictive, and legitimate workflows break down, safe autonomy shrinks, and the system becomes more burden than benefit. Too permissive, and the risks accumulate quietly — surfacing later as incidents, audit findings, or eroded customer trust. Effective guardrails share three characteristics that help strike that balance: Transparent — customers and operators understand what the system can and cannot do, and why those limits exist Context-aware — boundaries tighten or relax based on identity, environment, and risk, without blocking safe use Adjustable — guardrails evolve as models and integrations change, without compromising the protections that matter most When these characteristics are present, guardrails naturally reinforce the foundational principles of information security — protecting confidentiality through scoped data access, preserving integrity by constraining actions to authorized paths, and supporting availability by preventing runaway execution and cascading failures. How guardrails support Marketplace readiness For AI apps and agents in Microsoft Marketplace, guardrails are a practical enabler — not just of security, but of the entire Marketplace journey. They make complex AI systems easier to evaluate, certify, and operate at scale. Guardrails simplify three critical aspects of that journey: Security and compliance review — explicit, architectural guardrails give reviewers something concrete to assess. Rather than relying on documentation or promises, behavior is observable and boundaries are enforceable from day one. Customer onboarding and trust — when customers can see what an AI system can and cannot do, and how those limits are enforced, adoption decisions become easier and time to value shortens. Clarity is a competitive advantage. Long-term operation and scale — as AI apps evolve and integrate with more systems, guardrails keep the blast radius contained and prevent hidden privilege escalation paths from forming. They are what makes growth manageable. Marketplace-ready AI systems don't describe their guardrails — they demonstrate them. That shift, from assurance to evidence, is what accelerates approvals, builds lasting customer trust, and positions an AI app or agent to scale with confidence. What’s next in the journey Guardrails establish the foundation for safe, predictable AI behavior — but they are only the beginning. The next phase extends these boundaries into governance, compliance, and day‑to‑day operations through policy definition, auditing, and lifecycle controls. Together, these mechanisms ensure that guardrails remain effective as AI apps and agents evolve, scale, and operate within enterprise environments. See the next post in the series: Governing AI apps and agents for Marketplace | Microsoft Community Hub. Key resources See curated, step-by-step guidance to help you build, publish, or sell your app or agent (no matter where you start) in App Advisor, Quick-Start Development Toolkit can connect you with code templates for AI solution patterns Microsoft AI Envisioning Day Events How to build and publish AI apps and agents for Microsoft Marketplace Get over $126K USD in benefits and technical consultations to help you replicate and publish your app with ISV Success423Views1like1CommentProduction ready architectures for AI apps and agents on Marketplace
Why “production‑ready” architecture matters for Marketplace AI apps and agents A working AI prototype is not the same as a production‑ready AI app in Microsoft Marketplace. Marketplace solutions are expected to operate reliably in real customer environments, alongside mission‑critical workloads and under enterprise constraints. As a result, AI apps published through Marketplace must meet a higher bar than “it works in a demo.” You can always get a curated step-by-step guidance through building, publishing and selling apps for Marketplace through App Advisor. Production‑ready Marketplace AI apps must assume: Alignment with enterprise expectations and Azure well‑architected AI principles, including cost optimization, security, reliability, operational excellence, and performance efficiency Architectural decisions made early are difficult to reverse, especially once customers, tenants, and billing relationships are in place A higher trust bar from customers, who expect Marketplace solutions to be Microsoft‑vetted, certified, and safe to run in production Customers come to Marketplace expecting solutions that are ready to run, ready to scale, and ready to be supported—not experiments. This post focuses on the architectural principles and patterns required to meet those expectations. Specific services and implementation details are covered later in the series. This post is part of a series on building and publishing well-architected AI apps and agents in Microsoft Marketplace. The series focuses on AI apps and agents that are architected, hosted, and operated on Azure, with guidance aligned to building and selling solutions through Microsoft Marketplace. Aligning offer type and architecture early sets you up for success A strong indicator of a smooth Marketplace journey is early alignment between offer type and solution architecture. Offer type defines more than how an AI app is listed—it establishes clear roles and responsibilities between publishers and customers, which in turn shape architectural boundaries. Across all other offer types, architecture must clearly answer three questions: Who owns the runtime? Where does the AI execute? Who controls updates and ongoing operations? These decisions will vary depending on whether the solution resides in the customer’s or publisher’s tenant based on the attributes associated with the following transactable marketplace offer types: SaaS offers, where the AI runtime lives in the publisher’s environment and architecture must support multitenant AI app design, strong isolation, and centralized operations Container offers, where workloads run in the customer’s Kubernetes environment and architecture emphasizes portability and clear operational assumptions Virtual Machine offers, where preconfigured environments run in the customer’s subscription and architecture is more tightly coupled to the OS and infrastructure footprint Azure Managed Applications, where the solution is deployed into the customer's subscription and architecture must balance customer control with defined lifecycle boundaries. What makes this model distinctive is its flexibility: an Azure Managed Application can package containers, virtual machines, or a combination of both — making it a natural fit for solutions that require customer-controlled infrastructure without sacrificing publisher-managed operations. The packaging choice shapes the underlying architecture, but the managed application wrapper is what defines how the solution is deployed, updated, and governed within the customer's environment. Architecture decisions naturally reinforce Marketplace requirements and reduce certification and operational friction later. Key factors that benefit from early alignment include: Roles and responsibilities, such as who operates the AI runtime and who is responsible for uptime, patching, scaling, and ongoing operations Proximity to data, particularly for AI solutions that rely on customer‑specific or proprietary data, where placement affects performance, data movement, and compliance Core architectural building blocks of AI apps Designing a production‑ready AI app starts with treating the solution as a system, not a single service. AI apps—especially agent‑based solutions—are composed of multiple cooperating layers that together enable reasoning, action, and safe operation at scale. At a high level, most production‑ready AI apps include the following building blocks: Interaction layer, which serves as the entry point for users or systems and is responsible for authentication, request shaping, and consistent responses Orchestration layer, which coordinates reasoning, tool selection, workflow execution, and retrieval‑augmented generation (RAG) flows across multi‑step interactions Model endpoints, which provide inference and generation capabilities and introduce distinct latency, cost, and dependency characteristics Data sources, including vector stores, operational data, documents, and logs that the AI system reasons over Control planes, such as identity, configuration, policy enforcement, feature flags, and secrets management, which govern behavior without redeploying core logic Observability, which enables tracing, monitoring, and diagnosis of agent decisions, actions, and outcomes Networking, which connects components using a zero‑trust posture where every call is authenticated and outbound access is explicitly controlled Together, these components form the foundation of most Marketplace‑ready AI architectures. How they are composed—and where boundaries are drawn—varies by offer type, tenancy model, and customer requirements. Specific services, patterns, and implementation guidance for each layer are explored later in the series. Tenancy design choices as an early architectural decision One of the earliest and most consequential architectural decisions is where the AI solution is hosted. Does it run in the publisher’s tenant, or is it deployed into the customer’s tenant? This choice establishes foundational boundaries and is difficult to change later without significant redesign. If the solution runs in the publisher’s tenant, it is inherently multi‑tenant and must be designed with strong logical isolation across customers. If it runs in the customer’s tenant, deployments are typically single‑tenant by default, with isolation provided through infrastructure boundaries. Many Marketplace AI apps fall between these extremes, making it essential to define the AI tenancy model early. Common tenancy approaches include: Publisher‑hosted, multi‑tenant solutions, where a shared AI runtime serves multiple customers and requires strict isolation of customer data, inference requests, identity, and cost attribution Customer‑hosted, single‑tenant deployments, where each customer operates an isolated instance within their own Azure subscription, often preferred for regulated or tightly controlled environments Hybrid models, which combine centralized AI services with customer‑hosted data or execution layers and require carefully defined trust and access boundaries Tenancy decisions influence several core architectural dimensions, including: Identity and access boundaries, which define how users and agents authenticate and act across tenants Data isolation, including how customer data is stored, processed, and protected Model usage patterns, such as shared models versus tenant‑specific models Cost allocation and scale, including how usage is tracked and attributed per customer These considerations are not implementation details—they shape how the AI system behaves, scales, and is governed in production. Reference architecture guidance for multi‑tenant AI and machine learning solutions in the Azure Architecture Center explores these tradeoffs in more detail. Understanding your customer’s needs Designing a production‑ready AI architecture starts with understanding the environment your customers expect your solution to operate in. Marketplace customers vary widely in their security posture, compliance obligations, operational practices, and tolerance for change. Architectures that reflect those realities reduce friction during onboarding, certification, and long‑term operation. Key customer considerations that shape architecture include: Security and compliance expectations, such as industry regulations, internal governance policies, or regional data requirements Target environments, including whether customers expect solutions to run in their own Azure subscription or are comfortable consuming centrally hosted services Change and outage windows, where operational constraints or seasonal restrictions require predictable and controlled updates Architectural alignment with customer needs is not about designing for every edge case. It is about making intentional tradeoffs that reflect how customers will deploy, operate, and depend on your AI solution in production. Specific security controls, compliance enforcement mechanisms, and operational policies are explored later in the series. This section establishes the architectural mindset required to support them. Separating environments for safe iteration Production AI systems must evolve continuously while remaining stable for customers. Separating environments is how publishers enable safe iteration without destabilizing live usage—and how customers maintain confidence when adopting and operating AI solutions in their own environments. From the publisher’s perspective, environment separation enables: Iteration on prompts, models, and orchestration logic without impacting production customers Validation of behavior changes before rollout, especially for AI‑driven systems where small changes can produce materially different outcomes Controlled release strategies that reduce operational risk From the customer’s perspective, environment separation shapes how the solution fits into their own development and operational practices: Where the solution is deployed across development, staging, and production environments How deployments are repeated or promoted, particularly when the solution runs in the customer’s tenant Whether environments can be recreated predictably, or whether customers are forced to manually reconfigure deployments with each iteration When AI solutions are deployed into the customer’s tenant, environment design becomes especially important. Customers should not be required to reverse‑engineer deployment logic, recreate environments from scratch, or re‑establish trust boundaries every time the solution evolves. These concerns should be addressed architecturally, not deferred to operational workarounds. Environment separation is therefore not just a DevOps choice—it is an architectural decision. It influences identity boundaries, deployment topology, validation strategies, and the shared operational contract between publisher and customer. Designing for AI‑specific scalability patterns AI workloads do not scale like traditional web or CRUD‑based applications. While front‑end and API layers may follow familiar scaling patterns, AI systems introduce behaviors that require different architectural assumptions. Production‑ready AI architectures must account for: Bursty inference demand, where usage can spike unpredictably based on user behavior or downstream automation Long‑running or multi‑step agent workflows, which may span tools, data sources, and time Model‑driven latency and cost characteristics, which influence throughput and responsiveness independently of application logic As a result, scalability decisions often vary by layer. Horizontal scaling is typically most effective in interaction, orchestration, and retrieval components, while model endpoints may require separate capacity planning, isolation, or throttling strategies. Treating identity as an architectural boundary Identity is foundational to Marketplace AI apps, but architecture must plan for it explicitly. Identity decisions define trust boundaries across users, agents, and services, and shape how the solution scales, secures access, and meets compliance requirements. Key architectural considerations include: Microsoft Entra ID as a foundation, where identity is treated as a core control plane rather than a late‑stage integration How users sign in, including: Their own corporate Microsoft Entra ID tenant B2B scenarios where one Entra ID tenant trusts another B2C identity providers for customer‑facing experiences How tenants authenticate, particularly in multi‑tenant or cross‑organization scenarios How AI agents act on behalf of users, including delegated access, authorization scope, and auditability How services communicate securely, using a zero‑trust posture where every call is authenticated and authorized Treating identity as an architectural boundary helps ensure that trust relationships remain explicit, enforceable, and consistent across tenants and environments. This foundation is critical for supporting secure operation, compliance enforcement, and future tenant‑linking scenarios. Designing for observability and auditability Production‑ready AI apps must be observable and auditable by design. Marketplace customers expect visibility into how systems behave in production, and publishers need clear insight to diagnose issues, operate reliably, and meet enterprise trust and compliance expectations. Key architectural considerations include: End‑to‑end observability, covering user interactions, agent reasoning steps, tool invocations, and downstream service calls Clear audit trails, capturing who initiated an action, what the AI system did, and how decisions were executed—especially when agents act on behalf of users Tenant‑aware visibility, ensuring logs, metrics, and traces are correctly attributed without exposing data across tenants Operational transparency, enabling effective troubleshooting, incident response, and continuous improvement without ad‑hoc instrumentation AI app observability design goes beyond infrastructure health. It must also account for AI‑specific behavior, such as prompt execution, model selection, retrieval outcomes, and tool usage. Without this visibility, diagnosing failures, validating changes, or explaining outcomes becomes difficult in real customer environments. Auditability is equally critical. Identity, access, and action histories must be traceable to support security reviews, regulatory obligations, and customer trust—particularly in regulated or enterprise settings. Common architectural pitfalls in Marketplace AI apps Even experienced teams run into similar challenges when moving from an AI prototype to a production‑ready Marketplace solution. The following pitfalls often surface when architectural decisions are deferred or made implicitly. Common pitfalls include: Treating AI as a single service instead of a system, where model inference is implemented without considering orchestration, data access, identity, observability, and operational boundaries Hard‑coding tenant assumptions, such as assuming a single tenant, identity model, or deployment topology, which becomes difficult to unwind as customer requirements diversify Not planning for a resilient model strategy, leaving the architecture fragile when model versions change, capabilities evolve, or providers introduce breaking behavior Assuming data lives within the same boundary as the solution, when in practice it may reside in a different tenant, subscription, or control plane Tightly coupling prompt logic to application code, making it harder to iterate on AI behavior, validate changes, or manage risk without full redeployments Assuming issues can be fixed after go‑live, which underestimates the cost and complexity of changing architecture once customers, subscriptions, and trust relationships are in place While these pitfalls may be caused by a lack of technical skill on the customer’s side, they could typically emerge when architectural decisions are postponed in favor of speed, or when AI behavior is treated as an isolated concern rather than part of a production system. What’s next in the journey The architectural decisions made early—around offer type, tenancy, identity, environments, and observability—establish the foundation on which everything else is built. When these choices are intentional, they reduce friction as the solution evolves, scales, and adapts to real customer needs. The next set of posts builds on this foundation, exploring different dimensions of operating, securing, and evolving Marketplace AI apps in production. See the next post in the series: Securing AI apps and agents on Microsoft Marketplace | Microsoft Community Hub. Key resources See curated, step-by-step guidance to help you build, publish, or sell your app or agent (no matter where you start) in App Advisor Quick-Start Development Toolkit can connect you with code templates for AI solution patterns Microsoft AI Envisioning Day Events How to build and publish AI apps and agents for Microsoft Marketplace Get over $126K USD in benefits and technical consultations to help you replicate and publish your app with ISV Success400Views7likes1CommentSuccess with AI apps and agents on Marketplace: the end-to-end
Preparing an AI app or agent for Microsoft Marketplace requires solving a broader set of problems—ones that extend beyond the model and into architecture, security, compliance, operations, and commerce. These requirements often surface late, when teams are already moving toward launch. Teams often reach the same milestone: the AI works, the demo is compelling, and early customers are interested. But when it’s time to publish, transact, and operate that solution through Marketplace, gaps emerge—around security, compliance, reliability, operations, or commerce integration. Whether you are demo ready or starting with a great AI idea, this series is designed to address those challenges across the AI app publishing lifecycle through a connected, end‑to‑end journey. It brings together the decisions and requirements needed to build AI apps and agents that are not only functional, but Marketplace‑ready from day one. You can always get a curated step-by-step guidance through building, publishing and selling apps for Marketplace through App Advisor. This post is part of a series on building and publishing well-architected AI apps and agents on Microsoft Marketplace. The series focuses on AI apps and agents that are architected, hosted, and operated on Azure, with guidance aligned to building and selling solutions through Microsoft Marketplace. Why an end‑to‑end journey matters A working AI app does not automatically mean a Marketplace‑ready AI app. Marketplace readiness spans far more than model selection or prompt engineering. It requires a holistic approach across: Architecture and hosting design Security and AI guardrails Compliance and governance Operational maturity Commerce, billing, and lifecycle integration While guidance exists across each of these areas, it is often fragmented. This series connects those pieces into a single, reusable mental model that software companies can use to design, build, publish, and operate AI apps and agents—supporting end‑to‑end AI readiness with confidence. This first post frames the journey. Each subsequent post goes deep into one stage. The marketplace‑ready AI app and agent lifecycle At a high level, Marketplace‑ready AI apps and agents follow this lifecycle: Define how the AI app and agent will be delivered Identify industry compliance and regulatory requirements Design a production‑ready AI architecture Embed security and AI guardrails into the design Validate compliance and governance Build and test an MVP with potential customers Build for quality, reliability, and scale Integrate with Marketplace commerce Prepare for publishing and go‑live Operate, monitor, and evolve safely Promoting your AI app and agent to close initial sales This lifecycle is intentionally introduced once, at a high level and serves as an AI app launch checklist, where decisions made earlywill shape everything that follows. Throughout the series, this lifecycle serves as a shared reference point. Step 1: Decide how your AI app and agent will be packaged and delivered The first decision is how the AI app and agent will be delivered through Marketplace. Offer types—such as SaaS, Azure Managed Applications, Containers, and Virtual Machines—are not just listing formats. They are delivery models that directly impact: Identity and authentication Billing and metering Deployment responsibilities Operational ownership Customer onboarding experience Supported sales models Choosing an offer type early helps avoid costly redesigns later. Step 2: Design a production‑ready AI architecture Marketplace AI apps and agents are expected to meet enterprise customer expectations for performance, reliability, and security. Architecture decisions must account for: Customer business, compliance, and security needs Offer‑specific hosting best practices For example, SaaS offers typically require: Tenant isolation Environment separation Strong identity boundaries Architecture must also support both AI behavior and Marketplace lifecycle events, such as provisioning, subscription changes, and entitlement checks. Step 3: Secure the AI app and agent and define guardrails Security cannot be treated as a certification checklist at the end of the process. AI introduces new risks beyond traditional applications, including expanded attack surfaces through prompts and inputs. Frameworks such as the OWASP GenAI Top 10 provide a useful lens for identifying these risks. Guardrails must be enforced: At design time through architecture and policy decisions At runtime through monitoring, enforcement, and response AI‑specific incident response must also factor in privacy regulations and customer trust. Step 4: Treat AI agents as compliance‑governed systems AI agents and their data are first‑class compliance subjects. This includes: Prompts and responses Contextual and training data Actions taken by the agent These artifacts must be auditable and governed inline, not retroactively. At the same time, publishers must balance compliance with intellectual property protection by enabling explainability and transparency without exposing proprietary logic. Step 5: Build for quality, reliability, and scale Marketplace buyers expect predictable behavior. AI apps and agents should formalize: Quality and evaluation frameworks Reliability and performance targets Scaling and cost optimization strategies Quality, reliability, and performance directly influence customer trust and satisfaction. Step 6: Integrate with Marketplace commerce and lifecycle APIs Marketplace is not “just a listing,” but a runtime contract that depends on AI commerce integration. For transactable offers that help you sell globally direct to customers or through channel and allow customers to count sales of your app against their cloud commitments, Marketplace becomes an operational contract. Subscription state, entitlements, billing, and metering are runtime responsibilities of the application. For SaaS offers, SaaS Fulfillment APIs define the source of truth for subscription lifecycle events. Integrate Marketplace lead flows with your CRM using the Marketplace lead connector for CRM Step 7: Prepare for publishing, certification, and go‑live Publishing requires more than code completion. Marketplace certification validates: Security posture Customer experience Operational readiness Using templates, checklists, and tooling such as Quick Start Development Toolkit, Marketplace Rewards resources, and App Advisor reduces friction and rework. Step 8: Operate and evolve safely after go‑live Launch is not the end of the journey. AI apps and agents evolve continuously, making: Safe deployment strategies CI/CD discipline Rollback and monitoring practices This is essential for protecting both customers and publishers. Operational maturity also includes maintaining Marketplace offer assets (store images) as the product evolves. Use this framework to help you build a production ready AI app and agent, well architected, secured, reliable, scalable and integrated with Microsoft Marketplace global commerce engine. Step 9: Promote your AI app and agent Becoming Marketplace‑ready does not end at publication. AI app and agent success also depends on how effectively the solution is discovered, evaluated, and trusted by customers within Microsoft Marketplace and the broader Microsoft ecosystem. Promotion in Microsoft Marketplace is tightly integrated with how customers discover and purchase solutions. AI apps and agents are surfaced through Marketplace search, categories, and in‑product experiences, and once your AI app or agent becomes Azure IP co-sell eligible - the purchase of your offer can count towards your customers' Microsoft Azure Consumption Commitments (MACC) motivating customers to buy your offer. This reduces buying friction and accelerates evaluation‑to‑purchase transitions. Top activities to grow your sales: Optimize your listing once you publish your app, by getting an agentic review of your published listing in seconds, based on Marketplace listing best practices and expert Microsoft editorial guidance. Promote your Marketplace offer and track your engagement following best practices. Manage and nurture leads from trials to purchase, and from purchase to higher level SKUs. Private offers, which allow publishers to create customer-specific or negotiated offers directly through Marketplace, including multiparty private offers involving Microsoft channel partners Sell through channel, use resale enabled offers to enable resellers and channel partners to sell your app to their customers, Co-sell motions, where eligible AI apps and agents are sold jointly with Microsoft sellers and count toward customer cloud consumption commitments Effective customer engagement depends on alignment between how the AI app and agent is positioned and how it is delivered. Clear descriptions, accurate architectural boundaries, and transparent operational expectations help customers move confidently from discovery to production adoption. For publishers, programs such as ISV Success provide guidance and tooling to help align technical readiness, Marketplace requirements, and go‑to‑market execution as AI apps and agents scale through Microsoft Marketplace. Sales don't happen by accident, it's essential you engage in promoting your marketing. When promotion is treated as a first‑class step in the lifecycle, it reinforces trust, accelerates evaluation, and increases the likelihood that an AI app and agent transitions from initial interest to sustained use. How to use this series This series is designed to be used in two ways: Read sequentially to understand the full Marketplace‑ready journey Use individual posts alongside Microsoft Learn content, App Advisor, and Quick Start resources for deeper implementation guidance This series provides a structured, end‑to‑end view of what it takes to move from a working AI app and agent to a solution that customers can trust, deploy, and buy through Marketplace. It is designed to complement hands‑on implementation guidance, including Microsoft Learn resources such as Publishing AI agents to the Microsoft marketplace, and to help software companies navigate Marketplace readiness with fewer surprises and less rework. The upcoming post is about choosing your marketplace offer type which defines the operating model of your AI app or agent on Marketplace and influences key architectural decisions for your app or agent. Key resources See curated, step-by-step guidance to help you build, publish, or sell your app or agent (no matter where you start) in App Advisor, Quick-Start Development Toolkit Microsoft AI Envisioning Day Events How to build and publish AI apps and agents for Microsoft Marketplace Get over $126K USD in benefits and technical consultations to help you replicate and publish your app with ISV Success333Views2likes0CommentsDesign predictable usage-based billing for AI apps and agents selling in Microsoft Marketplace
Design predictable usage‑based billing for AI apps and agents selling on Microsoft Marketplace Compared to traditional software, pricing and billing feel harder because of the range of AI functionality. They reason, they infer, call tools, process data, all, to complete tasks on the customer’s behalf. If you’re building an AI app or agent to sell in Microsoft Marketplace, usage‑based billing needs to be designed with care, instrumented with intention, and explained in a way customers can trust. This post, along with App Advisor’s curated step-by-step guidance through building, publishing and selling apps for Marketplace, walks through how to do exactly that—without over‑engineering or surprising your customers later. This post is part of a series on building and publishing well-architected AI apps and agents in Microsoft Marketplace. The series focuses on AI apps and agents that are architected, hosted, and operated on Azure, with guidance aligned to building and selling solutions through Microsoft Marketplace. Why billing for AI systems is different Traditional software pricing is usually tied to static entitlements, such as licenses, seats, fixed feature sets and/or a predictable runtime footprint. AI apps and agents don’t work that way. Their cost and value are driven by runtime behavior, such as: How often a model is invoked How many tokens are processed per request How deep reasoning chains go How frequently tools or APIs are called How much data is accessed, transformed, or embedded AI behaviors are subject to change based on the interpretation of prompts and subsequent outputs processed by agents and models. That variability is why pricing AI like traditional software often creates friction—margins erode and customers may lose trust. Pricing decisions should start with business value in mind, not the meter level. Start with plan design before you define meters Plans explain pricing. Meters enforce pricing. Your Marketplace plan is where customers learn what they are buying and how it works. Before you design a single metered dimension, your plan should clearly answer: What AI behaviors are allowed What usage is included What usage becomes billable What limits apply How customers upgrade as they grow An effective plan design typically considers several key factors, such as the distinction between public and private plans, the allocation of included usage versus charges for overages, the balance of base fees against variable consumption, and the provision of clear upgrade paths across different tiers. For instance, if you’re creating an AI support agent, a well-structured plan might offer up to 1,000 resolved conversations each month for a set monthly fee, with additional charges for any conversations beyond that limit and a higher tier that grants access to increased usage allowances. When customers can easily understand what is included, what triggers extra costs, and how they can upgrade as their needs grow, metering feels straightforward and fair. Conversely, when plan details are ambiguous, even accurately measured charges can seem arbitrary, leading to uncomfortable billing discussions. Choose a billing model that matches how your AI behaves When structuring your AI solution’s pricing, begin by evaluating the expected usage patterns and the business value your AI delivers. Actively consider the nature of your agent’s workloads, the variability of customer interactions, and the predictability of operating costs. Flat Fee: Weigh the benefits of flat rate or subscription pricing. Opt for fixed monthly or annual fees when your AI solution operates within defined limits and usage remains consistent. This approach simplifies billing for customers and provides them with clear expectations. Subscription pricing works best for AI agents whose engagement is steady and whose costs don’t fluctuate dramatically. Usage-based (metered): If your AI’s usage varies widely or scales rapidly, usage-based (metered) pricing is often preferable. This model aligns charges with actual consumption, ensuring customers pay only for what they use. To implement it, leverage Marketplace metering APIs to track and bill usage accurately. Consider usage-based pricing when customer demand is unpredictable or your AI’s operational costs increase with higher workloads. Hybrid: For AI solutions that deliver ongoing baseline value but occasionally handle intensive tasks, hybrid models combine the strengths of both approaches. Offer a base subscription for predictable service, then layer in usage charges for overages. This structure is common for agents serving regular needs with intermittent spikes, enabling you to manage cost recovery while giving customers cost certainty. Metering looks different depending on your offer type As you move forward with your plan design and billing model, it’s important to recognize that metering varies significantly based on how your solution is delivered. SaaS offers: Usage tracking is accomplished through Marketplace Metering APIs, allowing you to capture AI-driven activities such as agent task executions, workflow runs, document analysis, or token processing. Your metering should align closely with the customer’s subscription lifecycle, plan tiers, and the included usage, ensuring transparency and consistency as customers progress through different service levels. Container-based offers: You might meter resources like nodes, cores, pods, or clusters—or even application-specific AI dimensions. Accurate attribution across tenants and deployments is crucial, so customers are billed reliably according to their actual consumption. Virtual machine offers: Metering is generally linked to VM runtime or license usage. Although the granularity is often lower than SaaS solutions, billing remains contractually enforced, and publishers must ensure that measurements are dependable and align with customer agreements. Azure Managed Applications: Metering should reflect solution management exclusively, while the underlying infrastructure costs are handled separately through Azure’s billing system. For more about offer types, visit Marketplace Offer Types for AI Apps and agents: SaaS vs Managed App vs Containers. Design metered dimensions customers can actually explain As you refine your billing model for Marketplace offers, it’s vital to consider how your metered dimensions will be perceived and understood by your customers. The most effective dimensions reflect clear, customer-visible value rather than abstract internal system mechanics. For AI-driven solutions, this often means tracking tangible outcomes such as agent tasks executed, successful workflows completed, data objects processed, or AI-assisted actions performed. Choosing these straightforward metrics not only makes invoices easier for customers to interpret but also strengthens your position during billing reviews by tying charges directly to business outcomes. For example, “documents analyzed” is a much clearer and more defensible metric than “token batches processed,” and “resolved workflows” resonates more with customers than “model invocations.” Ultimately, a strong metered dimension is one that a customer can easily explain to their finance or procurement teams. If the charge isn’t readily understandable, it’s a signal to revisit and refine your measurement approach. Track and plan metrics using the Microsoft Marketplace metering service APIs Under‑reporting impacts revenue. Marketplace enforces billing based on what you report. Once you've determined how your solution will be delivered and understood how metering varies by offer type, the next step is to ensure your billing model is both transparent and robust. This is accomplished by tracking your plan and meter metrics through the Microsoft Marketplace Metering Service APIs —a process that not only supports accurate billing but also builds customer trust. Instrumenting usage at runtime is essential: you must reliably capture and report consumption, making sure each event is precisely recorded and associated with the correct subscription and plan. Aggregating this usage and sending it to the marketplace—whether hourly or daily, covering the previous 24 hours—ensures billing remains consistent and defensible. Add metering guardrails to avoid cost surprises As you implement usage-based metering for your Marketplace offers, it’s essential to build guardrails that protect both your business and your customers from unexpected costs. Metering is a critical component of your service reliability, directly influencing customer trust and the overall transparency of your billing model. Ensuring your metering remains both dependable and customer-focused is crucial for maintaining trust and transparency. As you instrument your solution, take care to attribute usage precisely across multiple tenants, so every charge is accurately mapped to the correct customer and subscription. Additionally, aggregating usage on a consistent schedule—such as hourly or daily—not only supports predictable reporting but also helps customers better understand their consumption patterns. These practices lay a solid foundation for metering that supports both your business objectives and your customers’ needs, creating a seamless experience that aligns with the overall goals of your Marketplace offering. Marketplace-ready offerings typically feature: Usage caps that set clear maximums, limiting exposure to unforeseen charges. Soft limits with proactive alerts as customers approach their thresholds. Hard limits to enforce plan boundaries and prevent overages beyond agreed levels. Transparent usage dashboards, giving customers real-time visibility into their consumption. For example, when a customer reaches 80% of their allotted usage, they receive an alert and can decide whether to upgrade their plan, pause usage, or proceed into overage with full awareness—eliminating surprise invoices at month’s end. What’s Next in the Journey After establishing robust billing and metering, the next step is to enhance your AI solution’s performance, optimize API workloads, and improve production observability—laying the groundwork for scalable, efficient, and reliable operations. These capabilities help keep AI systems cost‑effective and reliable as usage grows. Key Resources See curated, step-by-step guidance to help you build, publish, or sell your app or agent (no matter where you start) in App Advisor Quick-Start Development Toolkit can connect you with code templates for AI solution patterns Microsoft AI Envisioning Day Events How to build and publish AI apps and agents for Microsoft Marketplace Get over $126K USD in benefits and technical consultations to help you replicate and publish your app with ISV Success300Views3likes1CommentAI apps and agents: choosing your Marketplace offer type
Choosing your Marketplace offer type is one of the earliest—and most consequential—decisions you’ll make when preparing an AI app for Microsoft Marketplace. It’s also one of the hardest to change later. This post is the second in our Marketplace‑ready AI app series. Its goal is not to push you toward a specific option, but to help you understand how Marketplace offer types map to different AI delivery models—so you can make an informed decision before architecture, security, and publishing work begins. You can always get a curated step-by-step guidance through building, publishing and selling apps for Marketplace through App Advisor. This post is part of a series on building and publishing well-architected AI apps and agents in Microsoft Marketplace. The series focuses on AI apps and agents that are architected, hosted, and operated on Azure, with guidance aligned to building and selling solutions through Microsoft Marketplace. Why offer type is an important Marketplace decision Offer type is more than a packaging choice. It defines the operating model of your AI app on Marketplace: How customers acquire your solution Where the AI runtime executes Determining the right security and business boundaries for the AI solution and associated contextual data Who operates and updates the system How transactions and billing are handled Once an offer type is selected, it cannot be changed without creating a new offer. Teams that choose too quickly often discover later that the decision creates friction across architecture, security boundaries, or publishing requirements. Microsoft’s Publishing guide by offer type explains the structural differences between offer types and why this decision must be made up front. How Marketplace offer types map to AI delivery models AI apps differ from traditional software in a few critical ways: Contextual data may need to remain in a specific tenant or geography Agents may operate autonomously and continuously Control over infrastructure often determines trust and compliance How the solution is charged and monetized, including whether pricing is usage‑based, metered, or subscription‑driven (for example, billing per inference, per workflow execution, or as a flat monthly fee) The buyer’s technical capability, including the level of engineering expertise required to deploy and operate the solution (for example, SaaS is generally easier to consume, while container‑based and managed application offers often require stronger cloud engineering and DevOps skills) Marketplace offer types don’t describe features. They define responsibility boundaries—who controls the AI runtime, who owns the infrastructure, and where customer data is processed. At a high level, Marketplace supports four primary delivery models for AI solutions: SaaS Azure Managed Application Azure Container Virtual Machine Each represents a different balance between publisher control and customer control. The sections below explain what each model means in practice. Check out the interactive offer selection wizard in App Advisor for decision support. Below, we unpack each of the offer types. SaaS offers for AI apps SaaS is the most common model for AI apps and agents on Marketplace—and often the default starting point. With a SaaS offer, the AI service runs in the publisher’s Azure environment and is accessed by customers through a centralized endpoint. This model works well for: Multi‑tenant AI platforms and agents Continuous model and prompt updates Rapid experimentation and iteration Usage‑based or subscription billing Because the service is centrally hosted, publishers retain full control over deployment, updates, and operational behavior. For multi-tenant AI apps, this also means making early decisions about Microsoft Entra ID configuration—such as how customers are onboarded, whether access is granted through tenant-level consent or external identities, and how user identities, roles, and data are isolated across tenants to prevent cross-tenant access or data leakage. For official guidance, see the SaaS section of the Marketplace publishing guide and the AI agent overview, which describes SaaS‑based agent deployments. Plan a SaaS offer for Microsoft Marketplace. Azure Managed Applications for AI solutions In this model, the solution is deployed into the customer’s Azure subscription, not the publisher’s. There are two variants: Managed applications, where the publisher retains permissions to operate and update the deployed resources Solution templates, where the customer fully manages the deployment after installation This model is a strong fit when AI workloads must run inside customer‑controlled environments, such as: Regulated or sensitive data scenarios Customer‑owned networking and identity boundaries Infrastructure‑heavy AI solutions that can’t be centralized Willingness or need on part of the customer or IT team to tailor the app to the needs of the end customer specific environment Managed Applications sit between SaaS and fully customer‑run deployments. They offer more customer control than SaaS, while still allowing publishers to manage lifecycle aspects when appropriate. Marketplace guidance for Azure Applications is covered in the publishing guide. For more information, see the following links: Plan an Azure managed application for an Azure application offer. Azure Container offers for AI workloads Container offer AI workloads—typically on AKS—using container images supplied by the publisher. This model is best suited for scenarios that require: Strict data residency Air‑gapped or tightly controlled environments Customer‑managed Kubernetes infrastructure The publisher delivers the container artifacts, but deployment, scaling, and runtime operations occur in the customer’s environment. This shifts operational responsibility, risk and compute costs away from the publisher and toward the customer. Container offer requirements are covered in the Marketplace publishing guide. Plan a Microsoft Marketplace Container offer. Virtual Machine offers for AI solutions Virtual Machine offers still play a role, particularly for specialized or legacy AI solutions. VM offers package a pre‑configured AI environment that customers deploy into their Azure subscription. Compared to other models, they offer: Updates and scaling are more tightly scoped Iteration cycles tend to be longer The solution is more closely aligned with specific OS or hardware requirements They are most commonly used for: Legacy AI stacks Fixed‑function AI appliances Solutions with specialized hardware or driver dependencies VM publishing requirements are also documented in the Marketplace publishing guide. Plan a virtual machine offer for Microsoft Marketplace. Comparing offer types across AI‑specific decision dimensions Rather than asking “which offer type is best,” it’s more useful to ask what trade‑offs you’re making in an AI app delivery model comparison. Key lenses to consider include: Who operates the AI runtime day‑to‑day Where customer data and AI prompts inputs and outputs are processed and stored Example: When evaluating Saas vs managed apps for AI, check industry specific compliance requirements to evaluate whether the data has to remain in the customer’s tenant or it can be sent to the publisher’s tenant. How quickly models, prompts, and logic can evolve The balance between publisher control and customer control How Marketplace transactions and billing align with runtime behavior SaaS Container (AKS / ACI) Virtual Machine (VM) Azure Managed Application What it is Fully managed, externally hosted app integrated with Marketplace for billing and entitlement Containerized app deployed into customer-managed Azure container environments VM image deployed directly into the customer’s Azure subscription Azure native solution deployed into the customer’s subscription, managed by the publisher Control plane Publisher‑owned Customer owned Customer owned Customer owned (with publisher access) Operational model Centralized operations, updates, and scaling Customer operates infra; publisher provides containers Customer operates infra; publisher provides VM image Per customer deployment and lifecycle Good fit scenarios • Multi‑tenant AI apps serving many customers • Fast onboarding and trials • Frequent model or feature updates • Publisher has full runtime control • AI apps or agents built as microservices • Legacy or lift-and-shift AI workloads • Enterprise AI solutions requiring customer owned infrastructure Avoid when • Customers require deployment into their own subscription • Strict data residency mandates customer control • Offline or air‑gapped environments are required • Customers standardize on Kubernetes • Custom OS or driver dependencies • Tight integration with customer Azure resources Typical AI usage pattern Centralized inference and orchestration across tenants • Portability across environments is important • Specialized runtime requirements • Strong compliance and governance needs Different AI solutions land in different places across these dimensions. The right choice is the one that matches your operational reality—not just your product vision. Note: If your solution primarily delivers virtual machines or containerized workloads, use a Virtual Machine or Container offer instead of an Azure Managed Application. Supported sales models and pricing options by Marketplace offer type Marketplace offer types don’t just define how an AI app and agent is deployed — they also determine how it can be sold, transacted, and expanded through Microsoft Marketplace. Understanding the supported sales models early helps avoid misalignment between architecture, pricing, and go‑to‑market strategy. Supported sales models Offer type Transactable listing Public listing Private offers Resale enabled Multiparty private offers Azure IP Co‑sell eligible SaaS Yes Yes Yes Yes Yes Yes Container Yes Yes Yes No Yes Yes Virtual Machine Yes Yes Yes Yes Yes Yes Azure Managed Application Yes Yes Yes No Yes Yes What these sales models mean Transactable listing A Marketplace listing that allows customers to purchase the solution directly through Microsoft Marketplace, with billing handled through Microsoft. Public listing A listing that is discoverable by any customer browsing Microsoft Marketplace and available for self‑service acquisition. Private offers Customer‑specific offers created by the publisher with negotiated pricing, terms, or configurations, purchased through Marketplace. Resale enabled Using resale enabled offers, software companies can authorize their channel partners to sell their existing Marketplace offers on their behalf. After authorization, channel partners can independently create and sell private offers without direct involvement from the software company. Multiparty private offers Private offers that involve one or more Microsoft partners (such as resellers or system integrators) as part of the transaction. Azure IP Co‑sell eligible When all requirements are met this allows your offers to contribute toward customers' Microsoft Azure Consumption Commitments (MACC). Pricing section Marketplace offer types determine the AI pricing models available. Make sure you build towards a marketplace offer type that aligns with how you want to deploy and price your solution. SaaS – Subscription or flat‑rate pricing, per‑user pricing, and usage‑based (metered) pricing. Container – Kubernetes‑based offers support multiple Marketplace‑transactable pricing models aligned to how the workload runs in the customer’s environment, including per core, per core in cluster, per node, per node in cluster, per pod, or per cluster pricing, all billed on a usage basis. Container offers can also support custom metered dimensions for application‑specific usage. Alternatively, publishers may offer Bring Your Own License (BYOL) plans, where customers deploy through Marketplace but bring an existing software license. Virtual Machine – Usage-based hourly pricing (flat rate, per vCPU, or per vCPU size), with optional 1-year or 3-year reservation discounts. Publishers may also offer Bring Your Own License (BYOL) plans, where customers bring an existing software license and are billed only for Azure infrastructure. Azure Managed Application – A monthly management or service fee charged by the publisher; Azure infrastructure consumption is billed separately to the customer. Note: Azure Managed Applications are designed to charge for management and operational services, not for SaaS‑style application usage or underlying infrastructure consumption. Buyer‑side assumptions to be aware of For customers to purchase AI apps and agents through these sales models: The customer must be able to purchase through Microsoft Marketplace using their existing Microsoft procurement setup Marketplace purchases align with enterprise buying and governance controls, rather than ad‑hoc vendor contracts For private and multiparty private offers, the customer must be willing to engage in a negotiated Marketplace transaction, rather than pure self‑service checkout Important clarification Supported sales models are consistent across Marketplace offer types. What varies by offer type is how the solution is provisioned, billed, operated, and updated. Sales flexibility alone should not drive offer‑type selection — it must align with the architecture and operating model of the AI app and agent. How this decision impacts everything that follows Offer type selection for AI apps and agents ripples through the rest of the Marketplace journey. They directly shape: Architecture design choices Security and compliance boundaries Fulfillment APIs and billing integration Publishing and certification requirements Cost, scalability, and operational responsibility Follow the series for updates on new posts. What’s next in the journey With the offer type decision in place, the focus shifts to turning that choice into a production‑ready solution. This includes designing an architecture that aligns with your delivery model, establishing clear security and compliance boundaries, and preparing the operational foundations required to run, update, and scale your AI app or agent confidently in customer environments. Getting these elements right early reduces rework and sets the stage for a smoother Marketplace journey. See the next post in the series: Designing Production‑Ready AI App and Agent Architectures for Microsoft Marketplace. Key resources See curated, step-by-step guidance to help you build, publish, or sell your app or agent (no matter where you start) in App Advisor Quick-Start Development Toolkit can connect you with code templates for AI solution patterns Microsoft AI Envisioning Day Events How to build and publish AI apps and agents for Microsoft Marketplace Get over $126K USD in benefits and technical consultations to help you replicate and publish your app with ISV Success300Views4likes0CommentsFirm AI for professional services: governed, agentic workflows built on Microsoft Azure
In this installment of our Partner Spotlight series, we’re highlighting partners building industry-focused AI solutions and bringing them to customers through Microsoft Marketplace. I connected with Richard Baskerville from Intapp to learn how the company is delivering Firm AI—governed, agentic capabilities designed specifically for professional and financial services firms—while aligning with Microsoft Azure for security, scale, and enterprise procurement. Intapp’s approach shows what it looks like to pair deep vertical workflow expertise with a trusted cloud platform so customers can adopt AI in a way that is both practical and accountable. About Richard Baskerville Richard Baskerville is a Senior Director at Intapp, where he helps shape strategic AI alliances across the Microsoft ecosystem and beyond. _______________________________________________________________________________________________________________________________________________________________ [JR] Tell us about Intapp. What inspired the founding of the company, and what problems do your solutions help customers solve? [RB] Intapp was founded on a durable observation: professional firms—law firms, accounting practices, private capital firms, and investment banks—operate differently than general enterprises. Their workflows are shaped by client relationships, professional obligations, and regulatory requirements that generic software was never designed to handle. Our founders saw that these firms were either building bespoke systems at enormous cost or forcing themselves into enterprise tools that didn’t fit. Today, Intapp delivers Firm AI—governed AI purpose-built for professional services. Our solutions span the full firm lifecycle: business development through Intapp DealCloud, time capture and billing through Intapp Time, and risk and compliance management across Intapp Conflicts, Intake, Terms, Walls, and Employee Compliance. Underpinning it all is Intapp Celeste, our agentic AI platform, which puts AI to work on the specific workflows that drive firm performance—while keeping humans accountable and in control. For firms where every engagement carries professional liability, that governance layer isn’t a feature; it’s the foundation. [JR] What industries or types of organizations do you primarily serve today? [RB] Intapp serves professional and financial services firms exclusively. That focus is intentional—and it’s what makes us different. Our customers are law firms, accounting and consulting practices, investment banks and advisory firms, private capital managers, and real assets firms. Many are among the largest in their categories globally. These firms share characteristics that set them apart from general enterprises: partnership structures, billable-hour economics, client conflict management, regulatory oversight, and relationship networks spanning decades. Generic CRM or ERP tools aren’t built for these dynamics. Intapp is. That vertical depth—built over more than 20 years—is what Firm AI means in practice: AI that understands the context of a law firm partner’s client obligations or a dealmaker’s fund-level requirements, not just the general shape of business software. [JR] What were your initial expectations for Microsoft Marketplace when you first started your journey? [RB] Our initial expectation was straightforward: Marketplace would give us a cleaner path to transact with customers already operating inside the Microsoft ecosystem. Many of our customers—large law firms and financial services firms—had already committed significant Azure spend through enterprise agreements. Marketplace offered a way to meet them commercially where they already were. What we underestimated was how much Marketplace would shape the broader partnership. We went in expecting a distribution channel. What we found was a framework that connected co-sell, Azure consumption alignment, and joint go-to-market in ways that changed how our teams engaged. The commercial mechanics—particularly MACC drawdown eligibility—became a real conversation-opener with procurement and finance stakeholders, not just a contract path. [JR] What applications do you have available in Microsoft Marketplace, and how do they help customers? [RB] Intapp has 12 SaaS solutions available in Microsoft Marketplace today, transacted via private offers. They span the full professional services firm lifecycle—from relationship intelligence and deal management with Intapp DealCloud, to time capture with Intapp Time, to risk and compliance management across Intapp Conflicts, Intake, Terms, Walls, and Employee Compliance. Because we focus exclusively on professional and financial services, customers aren’t buying horizontal software adapted for their industry; they’re buying solutions designed for their workflows, compliance obligations, and client structures. Looking ahead, Intapp Celeste—our agentic AI platform—will be available in Marketplace via a consumption-based, metered model. That structure matches how agentic AI gets used: variably, tied to real firm activity, and governed end-to-end. [JR] What were the biggest lessons you learned early on when selling through Marketplace? [RB] Three lessons stood out early: Listing is the beginning, not the end. Initial traction required deliberate investment in co-sell enablement—ensuring Microsoft field teams could position Intapp clearly, not just point to a catalog entry. Specificity wins. Our customers are sophisticated buyers. What worked was leading with vertical relevance—speaking directly to the compliance requirements of a law firm or the relationship-data challenges of a private capital manager. Private offers require commercial fluency. Helping customers understand how Intapp maps to Azure commitments—and helping Microsoft sellers tell that story—made a material difference in deal velocity. [JR] How has your business changed with a transactable offer on the Marketplace? [RB] Transactable offers have changed how deals close. Customers with existing Azure commitments can apply Intapp spend against their MACC, removing a procurement obstacle that previously added months to cycles. Finance and procurement can work within familiar Microsoft frameworks rather than running separate vendor onboarding. Marketplace has also expanded our reach. Microsoft’s field organization has relationships we can’t replicate at scale, and co-sell has helped translate that reach into qualified pipeline—especially in segments where we previously had limited coverage. And the signal matters: being transactable in Marketplace reinforces that Intapp is an enterprise-grade partner, not a niche point solution. [JR] How has collaborating with Microsoft sellers impacted your Marketplace growth? [RB] Microsoft sellers are the activation mechanism for our Marketplace offers. Without field alignment, a listing is a catalog entry. With it, it becomes a joint pipeline motion. We’ve invested in enablement—giving sellers the vertical context to position Intapp credibly in front of legal and financial services CIOs, and making it easy to bring us into deals where Azure capabilities are already in the conversation. That alignment shows up in specific segments. In private capital and investment banking, Microsoft enterprise relationships often predate ours—co-sell provides warm introductions backed by a trusted infrastructure partner. In legal, where Microsoft 365 is near-universal, that adjacency and deep interoperability creates natural entry points. Co-sell turns those adjacencies into active pipeline rather than theoretical opportunity. [JR] What has made the co-sell relationship with Microsoft particularly valuable for Intapp? [RB] Co-sell works because it’s structurally aligned, not just commercially convenient. Microsoft’s investment in these verticals—through industry clouds, compliance frameworks, and dedicated field teams—maps directly onto Intapp’s customer base. We’re selling into the same firms, with the same platform expectations underpinning both offerings. What makes it particularly valuable is the mutual trust transfer. Firms hold Microsoft to a high standard for security, data governance, and regulatory compliance. When Microsoft sellers bring Intapp into the conversation, that credibility extends to us and compresses the trust-building phase of an enterprise cycle—especially in regulated industries. [JR] How does Microsoft Marketplace fit into Intapp’s long-term growth strategy? [RB] Marketplace is central to how we scale Firm AI globally. Our ambition is to be the governed AI platform of record for professional firms across legal, private capital, accounting, and advisory. Helping customers decrement MACC through Marketplace purchases is a clear win-win because it aligns platform investment with workflow outcomes. The upcoming Marketplace availability of Intapp Celeste which will offer different commercial models for customers (e.g. consumption based) marks the next phase. As Celeste deepens integration with Microsoft 365, Teams, and Azure AI services, the commercial and technical stories converge—customers can both buy and operate within an architecture where Firm AI and Microsoft’s platform reinforce each other. [JR] What advice would you give other SDCs who are just starting their Microsoft Marketplace journey? [RB] Three things matter most early on: Earn the co-sell relationship before you need it. Invest in enablement early so Microsoft sellers have enough vertical context to represent your value clearly. Get your commercial model right for the channel. Understand how private offers interact with Azure commitments, and plan for consumption-based pricing where it fits AI usage. Lead with a sharp point of view. The partners who gain traction fastest are the obvious choice for a specific industry workflow or problem—know what that is and communicate it consistently. _______________________________________________________________________________________________________________________________________________________________ Closing reflection Intapp’s Marketplace journey shows that industry-specific, governed AI wins when it’s paired with an enterprise platform customers already trust. By making solutions transactable—especially through private offers that align to customer’s existing Azure commitments—Intapp reduces procurement friction and accelerates adoption. And like many successful partners, their growth ultimately comes down to enablement: clear vertical messaging and tight co-sell alignment that turns Marketplace presence into a real, qualified pipeline.300Views0likes0CommentsQuality and evaluation framework for successful AI apps and agents in Microsoft Marketplace
Why quality in AI is different — and why it matters for Marketplace Traditional software quality spans many dimensions — from performance and reliability to correctness and fault tolerance — but once those characteristics are specified and validated, system behavior is generally stable and repeatable. Quality is assessed through correctness, reliability, performance, and adherence to specifications. AI apps and agents change this equation. Their behavior is inherently non-deterministic and context‑dependent. The same prompt can produce different responses depending on model version, retrieval context, prior interactions, or environmental conditions. For agentic systems, quality also depends on reasoning paths, tool selection, and how decisions unfold across multiple steps — not just on the final output. This means an AI app can appear functional while still falling short on quality: producing responses that are inconsistent, misleading, misaligned with intent, or unsafe in edge cases. Without a structured evaluation framework, these gaps often surface only in production — in customer environments, after trust has already been extended. For Microsoft Marketplace, this distinction matters. Buyers expect AI apps and agents to behave predictably, operate within clear boundaries, and remain fit for purpose as they scale. Quality measurement is what turns those expectations into something observable — and that visibility is what determines Marketplace readiness. You can always get a curated step-by-step guidance through building, publishing and selling apps for Marketplace through App Advisor. This post is part of a series on building and publishing well-architected AI apps and agents in Microsoft Marketplace. The series focuses on AI apps and agents that are architected, hosted, and operated on Azure, with guidance aligned to building and selling solutions through Microsoft Marketplace. How quality measurement shapes Marketplace readiness AI apps and agents that can demonstrate quality — with documented evaluation frameworks, defined release criteria, and evidence of ongoing measurement — are easier to evaluate, trust, and adopt. Quality evidence reduces friction during Marketplace review, clarifies expectations during customer onboarding, and supports long-term confidence in production. When quality is visible and traceable, the conversation shifts from "does this work?" to "how do we scale it?" — which is exactly where publishers want to be. Publishers who treat quality as a first-class discipline build the foundation for safe iteration, customer retention, and sustainable growth through Microsoft Marketplace. That foundation is built through the decisions, frameworks, and evaluation practices established long before a solution reaches review. What "quality" means for AI apps and agents Quality for AI apps and agents is not a single metric — it spans interconnected dimensions that together define whether a system is doing what it was built to do, for the people it was built to serve. The HAX Design Library — Microsoft's collection of human-AI interaction design patterns — offers practical guidance for each one. These dimensions must be defined before evaluation begins. You can only measure what you have first described. Accuracy and relevance — does the output reflect the right answer, grounded in the right context? HAX patterns Make clear what the system can do (G1) and notify users when the AI is uncertain (G10) help publishers design systems where accuracy is visible and outputs are understood in the right context — not treated as universally authoritative. Safety and alignment — does the output stay within intended use, without harmful, biased, or policy-violating content? HAX patterns Mitigate social biases (G6) and Support efficient correction (G9) help ensure outputs stay within acceptable boundaries — and that users can identify and address issues before they cause downstream harm. Consistency and reliability — does the system behave predictably across users, sessions, and environments? HAX patterns Remember recent interactions (G12) and notify users about changes (G18) keep behavior coherent within sessions and ensure updates to the model or prompts are never silently introduced. Fitness for purpose — does the system do what it was designed to do, for the people it was designed to serve, in the conditions it will actually operate in? HAX patterns make clear how well the system can do what it does (G2) and Act on the user's context and goals (G4) ensure the system responds to what users actually need — not just what they literally typed. These dimensions work together — and gaps in any one of them will surface in production, often in ways that are difficult to trace without a deliberate evaluation framework. Designing an evaluation framework before you ship Evaluation frameworks should be built alongside the solution. At the end, gaps are harder and costlier to close. The discipline mirrors the design-in approach that applies to security and governance: decisions made early shape what is measurable, what is improvable, and what is ready to ship. A well-structured evaluation framework defines five things: What to measure — the quality dimensions that matter most for this solution and its intended use cases. For AI apps and agents, this typically includes task adherence, response coherence, groundedness, and safety — alongside the fitness-for-purpose dimensions defined in the previous section. How to measure it — the methods, tools, and benchmarks used to assess quality consistently. Effective evaluation combines AI-assisted evaluators (which use a model as a judge to score outputs), rule-based evaluators (which apply deterministic logic), and human review for edge cases and safety-relevant responses that automated methods cannot fully capture. Who evaluates — the right combination of automated metrics, human review, and structured customer feedback. No single method is sufficient; the framework defines how each is applied and when human judgment takes precedence. When to evaluate — at defined milestones: during development to establish a baseline, pre-release to validate against acceptance thresholds, at rollout to catch regression, and continuously in production to detect drift as models, prompts, and data evolve. What triggers re-evaluation — model updates, prompt changes, new data sources, tool additions, or meaningful shifts in customer usage patterns. Re-evaluation should be a scheduled and triggered discipline, not an ad hoc response to visible failures. The framework becomes a shared artifact — used by the publisher to release safely, and by customers to understand what quality commitments they are adopting when they deploy the solution in their environment. Evaluate your AI agents - Microsoft Foundry | Microsoft Learn Evaluation methods for AI apps and agents Quality must be assessed across complementary approaches — each designed to surface a different category of risk, at a different stage of the solution lifecycle. Automated metric evaluation — evaluators assess agent responses against defined criteria at scale. Some use AI models as judges to score outputs like task adherence, coherence, and groundedness; others apply deterministic rules or text similarity algorithms. Automated evaluation is most effective when acceptance thresholds are defined upfront — for example, a minimum task adherence pass rate before a release proceeds. Safety evaluation — a dedicated evaluation category that identifies potential content risks, policy violations, and harmful outputs in generated responses. Safety evaluators should run alongside quality evaluators, not as a separate afterthought. Human-in-the-loop evaluation — structured expert review of edge cases, borderline outputs, and safety-relevant responses that automated metrics cannot fully capture. Human judgment remains essential for interpreting context, intent, and impact. Red-teaming and adversarial testing — probing the system with challenging, unexpected, or intentionally misused inputs (including prompt injection attempts and tool misuse) to surface failure modes before customers encounter them. Microsoft provides dedicated AI red teaming guidance for agent-based systems. Customer feedback loops — structured collection of real-world signals from users interacting with the system in production. Production feedback closes the gap between what was tested and what customers actually experience. Each method has a distinct role. The evaluation framework defines when and how each is applied — and which results are required before a release proceeds, a change is accepted, or a capability is expanded. Defining release criteria and ongoing quality gates Quality evaluation only drives improvement when it is connected to clear release criteria. In an LLMOps model, those criteria are automated gates embedded directly into the CI/CD pipeline, applied consistently at every stage of the release cycle. In continuous integration (CI), automated evaluations run with every change — whether that change is a prompt update, a model version, a new tool, or a data source modification. CI gates catch regressions early, before they reach customers, by validating outputs against predefined quality thresholds for task adherence, coherence, groundedness, and safety. In continuous deployment (CD), quality gates determine whether a build is eligible to proceed. Release criteria should define: Minimum acceptable thresholds for each quality dimension — a release does not proceed until those thresholds are met Known failure modes that block release outright versus those that are tracked, monitored, and accepted within defined risk tolerances Deployment constraints — conditions under which a release is paused, rolled back, or progressively expanded to a subset of users before full rollout Ongoing evaluation must be scheduled and triggered. As models, prompts, tools, and customer usage patterns evolve, the baseline shifts. LLMOps treats re-evaluation as a continuous discipline: run evaluations, identify weak areas, adjust, and re-evaluate before changes propagate. This connects directly to governance. Quality evidence — the record of what was measured, when, and against what criteria — is part of the audit trail that makes AI behavior accountable, explainable, and trustworthy over time. For more on the governance foundation this builds on, see Governing AI apps and agents for Marketplace readiness. Quality across the publisher-customer boundary Clear quality ownership reduces friction at onboarding, builds confidence during operation, and protects both parties when behavior deviates. In the Marketplace context, quality is a shared responsibility — but the boundaries are distinct. Publishers are responsible for: Designing and running the evaluation framework during development and release Defining quality dimensions and thresholds that reflect the solution's intended use Providing customers with transparency into what quality means for this solution — without exposing proprietary prompts or internal logic Customers are responsible for: Validating that the solution performs appropriately in their specific environment, with their data and their users Configuring feedback and monitoring mechanisms that surface quality signals in their tenant Treating quality evaluation as a shared ongoing responsibility, not a one-time publisher guarantee When both sides understand their role, quality stops being a handoff and becomes a foundation — one that supports adoption, sustains trust, and enables both parties to respond confidently when behavior shifts. What's next in the journey A strong quality framework sets the baseline — but keeping that quality visible as solutions scale is its own discipline. The next posts in this series explore what comes after the framework is in place: API resilience, performance optimization, and operational observability for AI apps and agents running in production environments. See the next post in the series: Designing a reliable environment strategy for Microsoft Marketplace AI apps and agents | Microsoft Community Hub. Key resources See curated, step-by-step guidance to help you build, publish, or sell your app or agent (no matter where you start) in App Advisor Quick-Start Development Toolkit can connect you with code templates for AI solution patterns Microsoft AI Envisioning Day Events How to build and publish AI apps and agents for Microsoft Marketplace Get over $126K USD in benefits and technical consultations to help you replicate and publish your app with ISV Success299Views0likes0CommentsSecuring AI agents at scale: Identity, governance, and zero trust
AI agents are reshaping enterprise operations. They reason over data, call APIs, trigger workflows, and act on behalf of users — often without human intervention. IDC projects 1.3 billion AI agents in production by 2028, and Capgemini research shows four out of five organizations plan to integrate agents within the next three years. The race to build is already underway. The harder question — one most organizations are not yet equipped to answer — is: how do you operate AI agents safely at scale? Deploying a demo-ready agent is table stakes. Making that same agent production-grade, with proper identity, access controls, governance, and runtime protections, is an entirely different engineering and operational challenge. Why agentic AI changes the security equation Traditional software executes deterministic logic. An AI agent reasons, plans, and takes action — dynamically, across systems it was never explicitly programmed to touch. When connected to enterprise APIs and data, an agent stops being a chatbot and starts functioning as a digital worker. That shift has direct security implications: Agents can trigger business processes and access sensitive data autonomously Their capabilities expand as new tools and Model Context Protocol (MCP) servers are connected They behave non-deterministically, making static policy controls insufficient A compromised or misconfigured agent can have a large blast radius Most organizations are deploying into this environment without the governance infrastructure to match. The result is growing agent sprawl, unclear accountability, and unmanaged risk — exactly the conditions that security and compliance teams lose sleep over. A three-pillar framework for secure agent operations Securing AI agents at enterprise scale requires a structured approach. The following three-pillar framework provides a practical foundation: Manage, Govern, and Protect. Pillar 1: Manage — build an agent registry and enforce identity The first pillar focuses on visibility. You cannot secure what you cannot see, and you cannot control what you cannot identify. Every agent in your environment must have a unique, managed identity — a first-class identity type distinct from human accounts, service principals, and workload identities. Without it, agents are anonymous automation: untraceable, un-auditable, and uncontrollable. With agent identity in place, organizations gain: Traceability — a full record of what each agent accessed, when, and in what context Auditability — the clean audit trail that security and compliance teams require Granular control — the ability to restrict, block, or revoke access based on policy The operational target is an agent registry: a centralized fleet view of every agent across the organization, including those built on non-Microsoft platforms. Operators should be able to inspect agent metadata, validate ownership, review configuration, and block agents directly from this registry. This fleet management capability is what separates enterprise-grade deployments from uncontrolled agent sprawl. Pillar 2: Govern — automate agent lifecycle management The second pillar addresses the operational reality that emerges once agents number in the hundreds or thousands: manual governance doesn't scale. Three foundational practices make lifecycle governance work at scale. Assign a sponsor to every agent Each agent must have a designated owner — a person or team accountable for its activity and access decisions from creation through retirement. Orphaned agents, where no human is accountable, represent a high-risk surface area. Sponsorship creates a clear chain of accountability that also supports incident response when something goes wrong. Automate lifecycle policies Agents change hands, move between teams, and eventually retire. Access granted at creation may be inappropriate six months later. Lifecycle policies — covering creation, handoff, access review, and decommissioning — must be automated to remain effective at scale. The goal is to remove routine human intervention from the process while ensuring every action is logged and auditable. Enforce just-in-time, time-bound access Agents should never hold standing permissions. Access should be intentionally scoped, time-limited, and removed when no longer required. This directly addresses permission accumulation — one of the most common and underappreciated risks in agentic deployments, where an agent gradually gains access to resources far beyond its original purpose. The full lifecycle flow runs from creation (sponsor assigned, access requested and approved) through active operation (periodic access reviews) to retirement (identity deleted, access expired). Every stage should generate an auditable record. Pillar 3: Protect — secure agents at runtime The third pillar covers production runtime security — what happens once agents are live and actively accessing corporate resources. This is where risk is most dynamic and where a defense-in-depth approach is non-negotiable. Agents require access to file stores, APIs, calendars, MCP servers, and third-party SaaS systems to deliver value. Each of those connections is also an exposure point for prompt injection, data exfiltration, or access to malicious content. Three mechanisms form the runtime protection stack. Conditional access policies for agents The same conditional access framework used for human identities can be applied to agents — enforcing policy based on agent identity and operating context. This enables broad baseline protections across the entire agent fleet, with stricter controls for high-impact agents or sensitive scenarios. Access decisions happen continuously and automatically, not as one-off reviews. Risk signal detection and automated response Risky agent behaviors — sudden access spikes, repeated resource access failures, or attempts to reach unfamiliar systems — generate signals that can trigger automated responses. Policies can block the agent, limit its blast radius, and alert security operations for investigation, all without manual intervention. Network-layer controls Identity and access controls alone are not sufficient. Content inspection, threat-intelligence-based URL filtering, and file-level network filtering add additional layers of protection, particularly against agents that interact with external systems or user-supplied inputs. No single control point should be relied upon exclusively. Understanding agent blueprints and identity hierarchy One of the most important architectural concepts for enterprise agent governance is the distinction between an agent blueprint and an agent identity. An agent blueprint is a reusable, tenant-scoped template that defines the security, authentication, and governance characteristics for a class of agents. It sets the operational boundary: what the agent is allowed to do, which resources it can access, and which policies govern its behavior. An agent identity is a specific instance derived from a blueprint. One blueprint can generate many identities, each inheriting the governance constraints defined in the template. This parent-child model delivers concrete benefits across roles: Developers get a predictable, consistent identity model for every agent they build IT and security teams gain class-level audit trails and lifecycle control without per-instance overhead End users and customers can trust that every agent instance operates within the boundaries set at design time Governance intent — authentication, authorization, execution boundaries — is defined once and applied consistently across the entire agent family derived from that blueprint. MCP tool governance: Managing agent interoperability Modern agents rarely operate in isolation. They connect to external tools and services through protocols like MCP (Model Context Protocol), enabling tasks like retrieving documents from SharePoint, sending email, scheduling meetings, and coordinating with other agents. The power of this connectivity is real. A single agent instruction can simultaneously invoke multiple MCP servers — retrieving a file, generating a shareable link, sending it via email, and creating a calendar event — all in one natural-language-triggered flow. That same power is also a governance surface. Not every available MCP tool is appropriate for every agent, and not every tool is safe for production use without review. The right approach is to treat tool access like data access: intentional, reviewed, and tied to the agent's identity. Access to specific MCP servers should be formally requested, approved by IT or security, and documented as part of the agent onboarding process — not left to developer discretion at build time. Five practices for production-ready agent security The framework above translates into concrete practices for teams building and operating agents today. Build identity in, not on. Retrofitting governance onto an existing agent fleet is significantly harder than starting with identity baked into the build process. Every new agent should have an identity, a sponsor, and a defined access scope before it ships. Design for the fleet, not the agent. The practices and templates you establish for one agent should scale to hundreds. Reusable blueprints, automated lifecycle policies, and standardized onboarding patterns are investments that compound as agent counts grow. Threat-model before you deploy. For each production agent, work through what data it can access, what actions it can take, what happens under a malicious prompt, and what the blast radius looks like if it is compromised. Threat modeling surfaces risks that capability-focused development naturally misses. Automate governance alongside capability. The instinct in AI development is to automate what the agent does. Apply that same instinct to how the agent is governed. Lifecycle policies, access reviews, risk signal responses, and audit log generation should all run without manual orchestration. Apply zero trust to agents explicitly. Least privilege, continuous verification, and assume-breach principles apply to agents just as they do to humans and workloads. Agents that accumulate permissions over time, or that operate without continuous policy enforcement, are a liability as the threat landscape evolves. Distributing governed agents through Microsoft Marketplace The governance architecture described in this post — Entra agent identity, blueprints, lifecycle policies, and zero trust controls — applies equally to software development companies distributing agents through Microsoft Marketplace. These constructs are what determine whether a third-party agent can be trusted and managed by the enterprise customers who purchase it. With Agent 365 now generally available, enterprise customers have a control plane for managing every agent in their Microsoft 365 tenant, including agents procured from Marketplace. A software company-built agent that arrives without a registered Entra agent identity will surface as unmanaged in the customer's registry, requiring manual remediation before it can be approved for use — a friction point that directly delays adoption. For software company agents to pass enterprise IT and security review in an Agent 365-enabled tenant, three things must be in place from day one: Registered Entra agent identity — enables the customer to apply their own conditional access and lifecycle policies to the agent A well-defined agent blueprint — tells the customer's IT and security teams what the agent is permitted to do, what resources it accesses, and what governance characteristics it carries Declared and scoped MCP tool access — allows the customer's admin to review and approve tool permissions before the agent goes live A software company agent that ships with these elements in place signals it was built for enterprise operations from the start — and that trust signal is increasingly what separates agents that clear procurement review from those that stall in it. The competitive case for getting this right Organizations that build secure, governed agent infrastructure do not just reduce risk — they accelerate adoption. When security and operations teams can trust the agents in their environment, the internal friction that slows deployment decreases. Partners and software companies that deliver governed, auditable agent solutions create durable differentiation compared to those shipping ungoverned agents into customer environments. The companies winning in the agentic AI space are not just building agents. They are building the platform, practices, and governance infrastructure around them. That is what enterprise customers are increasingly requiring, and it is what separates a compelling demo from a production-grade solution. The principles have not changed — identity, least privilege, auditability, and defense in depth. What has changed is that those principles now apply to a new class of actor in the enterprise: the AI agent. *This blog is based on the Security for SDC Series Season 2 Episode 4: Securing and Governing AI Agents Before They Go Live presented by Ryan Nguyen and Paul Woo View past and upcoming sessions in the Security for software development company series: Security for SDC Series: Securing the Agentic Era200Views0likes0CommentsGoverning AI apps and agents for Marketplace
Governing AI apps and agents Governance is what turns powerful AI functionality into a solution that enterprises can confidently adopt, operate, and scale. It establishes clear responsibility for actions taken by the system, defines explicit boundaries for acceptable behavior, and creates mechanisms to review, explain, and correct outcomes over time. Without this structure, AI systems can become difficult to manage as they grow more connected and autonomous. For publishers, governance is how trust is earned — and sustained — in enterprise environments. It signals that AI behavior is intentional, accountable, and aligned with customer expectations, not left to inference or assumption. As AI apps and agents operate across users, data, and systems, risk shifts away from what a model can generate and toward how its behavior is governed in real‑world conditions. Marketplace readiness reflects this shift. It is defined less by raw capability and more by control, accountability, and trust. You can always get a curated step-by-step guidance through building, publishing and selling apps for Marketplace through App Advisor. This post is part of a series on building and publishing well-architected AI apps and agents in Microsoft Marketplace. The series focuses on AI apps and agents that are architected, hosted, and operated on Azure, with guidance aligned to building and selling solutions through Microsoft Marketplace. What governance means for AI apps and agents Governance in AI systems is operational and continuous. It is not limited to documentation, checklists, or periodic reviews — it shapes how an AI app or agent behaves while it is running in real customer environments. For AI apps and agents, governance spans three closely connected dimensions: Policy What the system is allowed to do, what data it is allowed to access, what is restricted, and what is explicitly prohibited. Enforcement How those policies are applied consistently in production, even as context, inputs, and conditions change. Evidence How decisions and actions are traced, reviewed, and audited over time. Governance works when intent, behavior, and proof move together — turning expectations into outcomes that can be trusted and examined. These dimensions are interdependent. Policy without enforcement is aspiration. Enforcement without evidence is unverifiable. Governance in action Governance becomes real when responsibility is explicit. For AI apps and agents, this starts with clarity around who is responsible for what: Who the agent acts for — and how its use protects business value Ensuring the agent is used for its intended purpose, produces measurable value, and is not misused, over‑extended, or operating outside approved business contexts. Who owns data access and data quality decisions Governing how the agent consumes and produces data, whether access is appropriate, and whether the data used or generated is reliable, accurate, and aligned with business and integrity expectations. Who is accountable for outcomes when behavior deviates Defining responsibility when the agent’s behavior creates risk, degrades value, or produces unexpected outcomes — so corrective action is timely, intentional, and owned. When governance is left vague or undefined, accountability gaps surface and agent actions become difficult to justify and explain across the publisher, the customer, and the solution itself. In this model, responsibility is shared but distinct. The publisher is responsible for designing and implementing the governance capabilities within the solution — defining boundaries, enforcement points, and evidence mechanisms that protect business value by default. Marketplace customers expect to understand who is accountable before they adopt an AI solution, not after an incident forces the question. The customer is responsible for configuring, operating, and applying those capabilities within their own environment, aligning them to internal policies, risk tolerance, and day‑to‑day use. Governance works when both roles are clear: the publisher provides the structure, and the customer brings it to life in practice. Data governance for AI: beyond storage and access For Marketplace‑ready AI apps and agents, data governance must account for where data moves, not just where it resides. Understanding how data flows across systems, tools, and tenants is essential to maintaining trust as solutions scale. Data governance for AI apps and agents extends beyond where data is stored. These systems introduce new artifacts that influence behavior and outcomes, including prompts and responses, retrieval context and embeddings, and agent‑initiated actions and tool outputs. Each of these elements can carry sensitive information and shape downstream decisions. Effective data governance for AI apps and agents requires clear structure: Explicit data ownership — defining who owns the data and under what conditions it can be accessed or used Access boundaries and context‑aware authorization — ensuring access decisions reflect identity, intent, and environment, not just static permissions Retention, auditability, and deletion strategies — so data use remains traceable and aligned with customer expectations over time Relying on prompts or inferred intent to determine access is a governance gap, not a shortcut. Without explicit controls, data exposure becomes difficult to predict or explain. Runtime policy enforcement in production Policies are stress tested when the agent is responding to real prompts, touching real data, and taking actions that carry real consequences. For software companies building AI apps and agents for Microsoft Marketplace, runtime enforcement is also how you keep the system fit for purpose: aligned to its intended use, supported by evidence, and constrained when conditions change. At runtime, governance becomes enforceable through three clear lanes of behavior: Decisions that require human approval Use approval gates for higher‑impact steps (for example: executing a write operation, sending an external request, or performing an irreversible workflow). This protects the business value of the agent by preventing “helpful” behavior from turning into misuse. Actions that can proceed automatically — within defined limits Automation is earned through clarity: define the agent’s intended uses and keep tool access, data access, and action scope anchored to those uses. Fit‑for‑purpose isn’t a feeling — it’s something you support with defined performance metrics, known error types, and release criteria that you measure and re‑measure as the system runs. Behaviors that are never permitted — regardless of context or intent Block classes of behavior that violate policy (including jailbreak attempts that try to override instructions, expand tool scope, or access disallowed data). When an intended use is not supported by evidence — or new evidence shows it no longer holds — treat that as a governance trigger: remove or revise the intended use in customer‑facing materials, notify customers as appropriate, and close the gap or discontinue the capability. To keep runtime enforcement meaningful over time, pair it with ongoing evaluation: document how you’ll measure performance and error patterns, run those evaluations pre‑release and continuously, and decide how often re‑evaluation is needed as models, prompts, tools, and data shift. This is what keeps autonomy intentional. It allows AI apps and agents to operate usefully and confidently, while ensuring behavior remains aligned with defined expectations — and backed by evidence — as systems evolve and scale. Auditability, explainability, and evidence Guardrails are the points in the system where governance becomes observable: where decisions are evaluated, actions are constrained, and outcomes are recorded. As described in Designing AI guardrails for apps and agents in Marketplace, guardrails shape how AI systems reason, access data, and take action — consistently and by default. Guardrails may be embedded within the agent itself or implemented as a separate supervisory layer — another agent or policy service — that evaluates actions before they proceed. Guardrail responses exist on a spectrum. Some enforce in the moment — blocking an action or requiring approval before it proceeds — while others generate evidence for post‑hoc review. Marketplace‑ready AI apps and agents could implement both, with the response mode matched to the severity, reversibility, and business impact of the action in question. These expectations align with the governance and evidence requirements outlined in the Microsoft Responsible AI Standard v2 General Requirements. In practice, guardrails support auditability and explainability by: Constraining behavior at design time Establishing clear defaults around what the system can and cannot do, so intended use is enforced before the system ever reaches production. Evaluating actions at runtime Making decisions visible as they happen — which tools were invoked, which data was accessed, and why an action was allowed to proceed or blocked. When governance is unclear, even strong guardrails lose their effectiveness. Controls may exist, but without clear intent they become difficult to justify, unevenly applied across environments, or disconnected from customer expectations. Over time, teams lose confidence not because the system failed, but because they can’t clearly explain why it behaved the way it did. When governance and guardrails are aligned, the result is different. Behavior is intentional. Decisions are traceable. Outcomes can be explained without guesswork. Auditability stops being a reporting exercise and becomes a natural byproduct of how the system operates day to day. Aligning governance with Marketplace expectations Governance for AI apps and agents must operate continuously, across all in‑scope environments — in both the publisher’s and the customer’s tenants. Marketplace solutions don’t live in a single boundary, and governance cannot stop at deployment or certification. Runtime enforcement is what keeps governance active as systems run and evolve. In practice, this means: Blocking or constraining actions that violate policy — such as stopping jailbreak attempts that try to override system instructions, escalate tool access, or bypass safety constraints through crafted prompts Adapting controls based on identity, environment, and risk — applying stricter limits when an agent acts across tenants, accesses sensitive data, or operates with elevated permissions Aligning agent behavior with enterprise expectations in real time — ensuring actions taken on behalf of users remain within approved roles, scopes, and approval paths These controls matter because AI behavior is dynamic. The same agent may behave differently depending on context, inputs, and downstream integrations. Governance must be able to respond to those shifts as they happen. Runtime enforcement is distinct from monitoring. Enforcement determines what is allowed to continue. Monitoring explains what happened once it’s already done. Marketplace‑ready AI solutions need both, but governance depends on enforcement to keep behavior aligned while it matters most. Operational health through auditability and traceability Operational health is the combination of traceability (what happened) and intelligibility (how to use it responsibly). When both are present, governance becomes a quality signal customers can feel day to day — not because you promised it, but because the system consistently behaves in ways they can understand and trust. Healthy AI apps and agents are not only traceable — they are intelligible in the moments that matter. For Marketplace customers, operational trust comes from being able to understand what the system is intended to do, interpret its behavior well enough to make decisions, and avoid over‑relying on outputs simply because they are produced confidently. A practical way to ground this is to be explicit about who needs to understand the system: Decision makers — the people using agent outputs to choose an action or approve a step Impacted users — the people or teams affected by decisions informed by the system’s outputs Once those stakeholders are clear, governance shows up as three operational promises you can actually support: Clarity of intended use Customers can see what the agent is designed to do (and what it is not designed to do), so outputs are used in the right contexts. Interpretability of behavior When an agent produces an output or recommendation, stakeholders can interpret it effectively — not perfectly, but reasonably well — with the context they need to make informed decisions. Protection against automation bias Your UX, guidance, and operational cues help customers stay aware of the natural tendency to over‑trust AI output, especially in high‑tempo workflows. This is where auditability and traceability become more than logs. Well governed AI systems should still answer: Who initiated an action — a user, an agent acting on their behalf, or an automated workflow What data was accessed — under which identity, scope, and context What decision was made, and why — especially when downstream systems or people are affected The logs should show evidence that stakeholders can interpret those outputs in realistic conditions — and there is a method to evaluate this, with clear criteria for release and ongoing evaluation as the solution evolves. Explainability still needs balance. Customers deserve transparency into intended use, behavior boundaries, and how to interpret outcomes — without requiring you to expose proprietary prompts, internal logic, or implementation details. For more information on securing your AI apps and agents, visit Securing AI apps and agents on Microsoft Marketplace | Microsoft Community Hub. What's next in the journey Governance creates the conditions for AI apps and agents to operate with confidence over time. With clear policies, enforcement, and evidence in place, publishers are better prepared to focus on operational maturity — how solutions are observed, maintained, and evolved safely in production. The next post explores what it takes to keep AI apps and agents healthy as they run, change, and scale in real customer environments. See the next post in the series: Quality and evaluation framework for successful AI apps and agents in Microsoft Marketplace | Microsoft Community Hub. Key resources See curated, step-by-step guidance to help you build, publish, or sell your app or agent (no matter where you start) in App Advisor Quick-Start Development Toolkit can connect you with code templates for AI solution patterns Microsoft AI Envisioning Day Events How to build and publish AI apps and agents for Microsoft Marketplace Get over $126K USD in benefits and technical consultations to help you replicate and publish your app with ISV Success200Views4likes0CommentsDesigning a reliable environment strategy for Microsoft Marketplace AI apps and agents
Technical guidance for software companies Delivering an AI app or agent through Microsoft Marketplace requires more than strong model performance or a well‑designed user flow. Once your solution is published, both you and your customers must be able to update, test, validate, and promote changes without compromising production stability. A structured environment strategy—Dev, Stage, and Production—is the architectural mechanism that makes this possible. This post provides a technical blueprint for how software companies and Microsoft Marketplace customers should design, operate, and maintain environment separation for AI apps and agents. It focuses on safe iteration, version control, quality gates, reproducible deployments, and the shared responsibility model that spans publisher and customer tenants. You can always get a curated step-by-step guidance through building, publishing and selling apps for Marketplace through App Advisor. This post is part of a series on building and publishing well-architected AI apps and agents in Microsoft Marketplace. The series focuses on AI apps and agents that are architected, hosted, and operated on Azure, with guidance aligned to building and selling solutions through Microsoft Marketplace. Why environment strategy is a core architectural requirement Environment separation is not just a DevOps workflow. It is an architectural control that ensures your AI system evolves safely, predictably, and traceably across its lifecycle. This is particularly important for Marketplace solutions because your changes impact not just your own environment, but every tenant where the solution runs. AI‑driven systems behave differently from traditional software: Prompts evolve and drift through iterative improvements. Model versions shift, sometimes silently, affecting output behavior. Tools and external dependencies introduce new boundary conditions. Retrieval sources change over time, producing different Retrieval Augmented Generation (RAG) contexts. Agent reasoning is probabilistic and can vary across environments. Without explicit boundaries, an update that behaves as expected in Dev may regress in Stage or introduce unpredictable behavior in Production. Marketplace elevates these risks because customers rely on your solution to operate within enterprise constraints. A well‑designed environment strategy answers the fundamental operational question: How does this solution change safely over time? Publisher-managed environment (tenant) Software companies publishing to Marketplace must maintain a clear three‑tier environment strategy. Each environment serves a distinct purpose and enforces different controls. Development environment: Iterate freely, without customer impact In Dev, engineers modify prompts, adjust orchestration logic, integrate new tools, and test updated model versions. This environment must support: Rapid prompt iteration with strict versioning, never editing in place. Model pinning, ensuring inference uses a declared version. Isolated test data, preventing contamination of production RAG contexts. Feature‑flag‑driven experimentation, enabling controlled testing. Staging environment: Validate behavior before promotion Stage is where quality gates activate. All changes—including prompt updates, model upgrades, new tools, and logic changes—must pass structured validation before they can be promoted. This environment enforces: Integration testing Acceptance criteria Consistency and performance baselines Safety evaluation and limits enforcement Production environment: Serve customers with reliability and rollback readiness Solutions running in production environments, regardless of whether they are publisher hosted or deployed into a customer's tenant must provide: Stable, predictable behavior Strict separation from test data sources Clearly defined rollback paths Auditability for all environment‑specific configurations This model highlights the core environments required for Marketplace readiness; in practice, publishers may introduce additional environments such as integration, testing, or preproduction depending on their delivery pipeline. The customer tenant deployment model: Deploying safely across customer environments Once a Marketplace customer purchases and deploys your AI app or agent, they must be able to deploy and maintain your solution across all their environments without reverse engineering your architecture. A strong offer must provide: Repeatable deployments across all heterogeneous environments. Predictable configuration separation, including identity, data sources, and policy boundaries. Customer‑controlled promotion workflows—updates should never be forced. No required re‑creation of environments for each new version. Publishers should design deployment artifacts such that customers do not have to manually re‑establish trust boundaries, identity settings, or configuration details each time the publisher releases a solution update. Plan for AI‑specific environment challenges AI systems introduce behavioral variances that traditional microservices do not. Your environment strategy must explicitly account for them. Prompt drift Prompts that behave well in one environment may respond differently in another due to: Different user inputs, where production prompts encounter broader and less predictable queries than test environments Variation in RAG contexts, driven by differences in indexed content, freshness, and data access Model behavior shifts under scale, including concurrency effects and token pressure Tool availability differences, where agents may have access to different tools or permissions across environments This requires explicit prompt versioning and environment-based promotion. Model version mismatches If one environment uses a different model version or even a different checkpoint, behavior divergence will appear immediately. Publishers should account for the following model management best practices: Model version pinning per environment Clear promotion paths for model updates RAG context variation Different environments may retrieve different documents unless seeded on purpose. Publishers should ensure their solutions avoid: Test data appearing in production environments Production data leaking into non-production environments Cross contamination of customer data in multi-tenant SaaS solutions Make sure your solution accounts for stale-data and real-time data. Agent variability Agents exhibit stochastic reasoning paths. Environments must enforce: Controlled tool access Reasoning step boundaries Consistent evaluation against expected patterns Publisher–customer boundary: Shared responsibilities Marketplace AI solutions span publisher and customer tenants, which means environment strategy is jointly owned. Each side has well-defined responsibilities. Publisher responsibilities Publishers should: Design an environment model that is reproducible inside customer tenants. Provide clear documentation for environment-specific configuration. Ensure updates are promotable, not disruptive, by default. Capture environment‑specific logs, traces, and evaluation signals to support debugging, audits, and incident response. Customer responsibilities Customers should: Maintain environment separation using their governance practices. Validate updates in staging before deploying them in production. Treat environment strategy as part of their operational contract with the publisher. Environment strategies support Marketplace readiness A well‑defined environment model is a Marketplace accelerator. It improves: Onboarding Customers adopt faster when: Deployments are predictable Configurations are well scoped Updates have controlled impact Long-term operations Strong environment strategy reduces: Regression risk Customer support escalations Operational instability Solutions that support clear environment promotion paths have higher retention and fewer incidents. What’s next in the journey The next architectural decision after environment separation is identity flow across these environments and across tenant boundaries, especially for AI agents acting on behalf of users. The follow‑up post will explore tenant linking, OAuth consent patterns, and identity‑plane boundaries in Marketplace AI architectures. See the next post in the series: Designing Tenant Linking to Scale Microsoft Marketplace AI Apps. Key Resources See curated, step-by-step guidance to help you build, publish, or sell your app or agent (no matter where you start) in App Advisor Quick-Start Development Toolkit can connect you with code templates for AI solution patterns Microsoft AI Envisioning Day Events How to build and publish AI apps and agents for Microsoft Marketplace Get over $126K USD in benefits and technical consultations to help you replicate and publish your app with ISV Success199Views1like0Comments