Production‑ready architecture is what separates an AI idea from an AI product customers can safely trust and run at scale. In Microsoft Marketplace, those architectural choices determine not just how your solution works—but whether it can be trusted, operated, and supported in real enterprise environments.
Why “production‑ready” architecture matters for Marketplace AI apps and agents
A working AI prototype is not the same as a production‑ready AI app in Microsoft Marketplace. Marketplace solutions are expected to operate reliably in real customer environments, alongside mission‑critical workloads and under enterprise constraints. As a result, AI apps published through Marketplace must meet a higher bar than “it works in a demo.”
Production‑ready Marketplace AI apps must assume:
- Alignment with enterprise expectations and the Azure Well‑Architected Framework, including cost optimization, security, reliability, operational excellence, and performance efficiency
- Architectural decisions made early are difficult to reverse, especially once customers, tenants, and billing relationships are in place
- A higher trust bar from customers, who expect Marketplace solutions to be Microsoft‑vetted, certified, and safe to run in production
Customers come to Marketplace expecting solutions that are ready to run, ready to scale, and ready to be supported—not experiments.
This post focuses on the architectural principles and patterns required to meet those expectations. Specific services and implementation details are covered later in the series.
This post is part of a series on building and publishing well-architected AI apps and Agents on Microsoft Marketplace.
Aligning offer type and architecture early sets you up for success
A strong indicator of a smooth Marketplace journey is early alignment between offer type and solution architecture. Offer type defines more than how an AI app is listed—it establishes clear roles and responsibilities between publishers and customers, which in turn shape architectural boundaries. Across all other offer types, architecture must clearly answer three questions:
- Who owns the runtime?
- Where does the AI execute?
- Who controls updates and ongoing operations?
These decisions will vary depending on whether the solution resides in the customer’s or publisher’s tenant based on the attributes associated with the following transactable marketplace offer types:
- SaaS offers, where the AI runtime lives in the publisher’s environment and architecture must support multi‑tenancy, strong isolation, and centralized operations
- Container offers, where workloads run in the customer’s Kubernetes environment and architecture emphasizes portability and clear operational assumptions
- Virtual Machine offers, where preconfigured environments run in the customer’s subscription and architecture is more tightly coupled to the OS and infrastructure footprint
- Azure Managed Applications, where the solution is deployed into the customer's subscription and architecture must balance customer control with defined lifecycle boundaries. What makes this model distinctive is its flexibility: an Azure Managed Application can package containers, virtual machines, or a combination of both — making it a natural fit for solutions that require customer-controlled infrastructure without sacrificing publisher-managed operations. The packaging choice shapes the underlying architecture, but the managed application wrapper is what defines how the solution is deployed, updated, and governed within the customer's environment.
Architecture decisions naturally reinforce Marketplace requirements and reduce certification and operational friction later. Key factors that benefit from early alignment include:
- Roles and responsibilities, such as who operates the AI runtime and who is responsible for uptime, patching, scaling, and ongoing operations
- Proximity to data, particularly for AI solutions that rely on customer‑specific or proprietary data, where placement affects performance, data movement, and compliance
Core architectural building blocks of AI apps
Designing a production‑ready AI app starts with treating the solution as a system, not a single service. AI apps—especially agent‑based solutions—are composed of multiple cooperating layers that together enable reasoning, action, and safe operation at scale.
At a high level, most production‑ready AI apps include the following building blocks:
- Interaction layer, which serves as the entry point for users or systems and is responsible for authentication, request shaping, and consistent responses
- Orchestration layer, which coordinates reasoning, tool selection, workflow execution, and retrieval‑augmented generation (RAG) flows across multi‑step interactions
- Model endpoints, which provide inference and generation capabilities and introduce distinct latency, cost, and dependency characteristics
- Data sources, including vector stores, operational data, documents, and logs that the AI system reasons over
- Control planes, such as identity, configuration, policy enforcement, feature flags, and secrets management, which govern behavior without redeploying core logic
- Observability, which enables tracing, monitoring, and diagnosis of agent decisions, actions, and outcomes
- Networking, which connects components using a zero‑trust posture where every call is authenticated and outbound access is explicitly controlled
Together, these components form the foundation of most Marketplace‑ready AI architectures. How they are composed—and where boundaries are drawn—varies by offer type, tenancy model, and customer requirements. Specific services, patterns, and implementation guidance for each layer are explored later in the series.
Tenancy design choices as an early architectural decision
One of the earliest and most consequential architectural decisions is where the AI solution is hosted. Does it run in the publisher’s tenant, or is it deployed into the customer’s tenant? This choice establishes foundational boundaries and is difficult to change later without significant redesign.
If the solution runs in the publisher’s tenant, it is inherently multi‑tenant and must be designed with strong logical isolation across customers. If it runs in the customer’s tenant, deployments are typically single‑tenant by default, with isolation provided through infrastructure boundaries. Many Marketplace AI apps fall between these extremes, making it essential to define the tenancy model early.
Common tenancy approaches include:
- Publisher‑hosted, multi‑tenant solutions, where a shared AI runtime serves multiple customers and requires strict isolation of customer data, inference requests, identity, and cost attribution
- Customer‑hosted, single‑tenant deployments, where each customer operates an isolated instance within their own Azure subscription, often preferred for regulated or tightly controlled environments
- Hybrid models, which combine centralized AI services with customer‑hosted data or execution layers and require carefully defined trust and access boundaries
Tenancy decisions influence several core architectural dimensions, including:
- Identity and access boundaries, which define how users and agents authenticate and act across tenants
- Data isolation, including how customer data is stored, processed, and protected
- Model usage patterns, such as shared models versus tenant‑specific models
- Cost allocation and scale, including how usage is tracked and attributed per customer
These considerations are not implementation details—they shape how the AI system behaves, scales, and is governed in production. Reference architecture guidance for multi‑tenant AI and machine learning solutions in the Azure Architecture Center explores these tradeoffs in more detail.
Understanding your customer’s needs
Designing a production‑ready AI architecture starts with understanding the environment your customers expect your solution to operate in. Marketplace customers vary widely in their security posture, compliance obligations, operational practices, and tolerance for change. Architectures that reflect those realities reduce friction during onboarding, certification, and long‑term operation.
Key customer considerations that shape architecture include:
- Security and compliance expectations, such as industry regulations, internal governance policies, or regional data requirements
- Target environments, including whether customers expect solutions to run in their own Azure subscription or are comfortable consuming centrally hosted services
- Change and outage windows, where operational constraints or seasonal restrictions require predictable and controlled updates
Architectural alignment with customer needs is not about designing for every edge case. It is about making intentional tradeoffs that reflect how customers will deploy, operate, and depend on your AI solution in production.
Specific security controls, compliance enforcement mechanisms, and operational policies are explored later in the series. This section establishes the architectural mindset required to support them.
Separating environments for safe iteration
Production AI systems must evolve continuously while remaining stable for customers. Separating environments is how publishers enable safe iteration without destabilizing live usage—and how customers maintain confidence when adopting and operating AI solutions in their own environments.
From the publisher’s perspective, environment separation enables:
- Iteration on prompts, models, and orchestration logic without impacting production customers
- Validation of behavior changes before rollout, especially for AI‑driven systems where small changes can produce materially different outcomes
- Controlled release strategies that reduce operational risk
From the customer’s perspective, environment separation shapes how the solution fits into their own development and operational practices:
- Where the solution is deployed across development, staging, and production environments
- How deployments are repeated or promoted, particularly when the solution runs in the customer’s tenant
- Whether environments can be recreated predictably, or whether customers are forced to manually reconfigure deployments with each iteration
When AI solutions are deployed into the customer’s tenant, environment design becomes especially important. Customers should not be required to reverse‑engineer deployment logic, recreate environments from scratch, or re‑establish trust boundaries every time the solution evolves. These concerns should be addressed architecturally, not deferred to operational workarounds.
Environment separation is therefore not just a DevOps choice—it is an architectural decision. It influences identity boundaries, deployment topology, validation strategies, and the shared operational contract between publisher and customer.
Designing for AI‑specific scalability patterns
AI workloads do not scale like traditional web or CRUD‑based applications. While front‑end and API layers may follow familiar scaling patterns, AI systems introduce behaviors that require different architectural assumptions.
Production‑ready AI architectures must account for:
- Bursty inference demand, where usage can spike unpredictably based on user behavior or downstream automation
- Long‑running or multi‑step agent workflows, which may span tools, data sources, and time
- Model‑driven latency and cost characteristics, which influence throughput and responsiveness independently of application logic
As a result, scalability decisions often vary by layer. Horizontal scaling is typically most effective in interaction, orchestration, and retrieval components, while model endpoints may require separate capacity planning, isolation, or throttling strategies.
Treating identity as an architectural boundary
Identity is foundational to Marketplace AI apps, but architecture must plan for it explicitly. Identity decisions define trust boundaries across users, agents, and services, and shape how the solution scales, secures access, and meets compliance requirements.
Key architectural considerations include:
- Microsoft Entra ID as a foundation, where identity is treated as a core control plane rather than a late‑stage integration
- How users sign in, including:
- Their own corporate Microsoft Entra ID tenant
- B2B scenarios where one Entra ID tenant trusts another
- B2C identity providers for customer‑facing experiences
- How tenants authenticate, particularly in multi‑tenant or cross‑organization scenarios
- How AI agents act on behalf of users, including delegated access, authorization scope, and auditability
- How services communicate securely, using a zero‑trust posture where every call is authenticated and authorized
Treating identity as an architectural boundary helps ensure that trust relationships remain explicit, enforceable, and consistent across tenants and environments. This foundation is critical for supporting secure operation, compliance enforcement, and future tenant‑linking scenarios.
Designing for observability and auditability
Production‑ready AI apps must be observable and auditable by design. Marketplace customers expect visibility into how systems behave in production, and publishers need clear insight to diagnose issues, operate reliably, and meet enterprise trust and compliance expectations.
Key architectural considerations include:
- End‑to‑end observability, covering user interactions, agent reasoning steps, tool invocations, and downstream service calls
- Clear audit trails, capturing who initiated an action, what the AI system did, and how decisions were executed—especially when agents act on behalf of users
- Tenant‑aware visibility, ensuring logs, metrics, and traces are correctly attributed without exposing data across tenants
- Operational transparency, enabling effective troubleshooting, incident response, and continuous improvement without ad‑hoc instrumentation
For AI systems, observability goes beyond infrastructure health. It must also account for AI‑specific behavior, such as prompt execution, model selection, retrieval outcomes, and tool usage. Without this visibility, diagnosing failures, validating changes, or explaining outcomes becomes difficult in real customer environments.
Auditability is equally critical. Identity, access, and action histories must be traceable to support security reviews, regulatory obligations, and customer trust—particularly in regulated or enterprise settings.
Common architectural pitfalls in Marketplace AI apps
Even experienced teams run into similar challenges when moving from an AI prototype to a production‑ready Marketplace solution. The following pitfalls often surface when architectural decisions are deferred or made implicitly.
Common pitfalls include:
- Treating AI as a single service instead of a system, where model inference is implemented without considering orchestration, data access, identity, observability, and operational boundaries
- Hard‑coding tenant assumptions, such as assuming a single tenant, identity model, or deployment topology, which becomes difficult to unwind as customer requirements diversify
- Not planning for a resilient model strategy, leaving the architecture fragile when model versions change, capabilities evolve, or providers introduce breaking behavior
- Assuming data lives within the same boundary as the solution, when in practice it may reside in a different tenant, subscription, or control plane
- Tightly coupling prompt logic to application code, making it harder to iterate on AI behavior, validate changes, or manage risk without full redeployments
- Assuming issues can be fixed after go‑live, which underestimates the cost and complexity of changing architecture once customers, subscriptions, and trust relationships are in place
While these pitfalls may be caused by a lack of technical skill on the customer’s side, they could typically emerge when architectural decisions are postponed in favor of speed, or when AI behavior is treated as an isolated concern rather than part of a production system.
What’s next in the journey
The architectural decisions made early—around offer type, tenancy, identity, environments, and observability—establish the foundation on which everything else is built. When these choices are intentional, they reduce friction as the solution evolves, scales, and adapts to real customer needs. The next set of posts builds on this foundation, exploring different dimensions of operating, securing, and evolving Marketplace AI apps in production.
Key resources
See curated, step-by-step guidance to help you build, publish, or sell your app or agent (no matter where you start) in App Advisor
Quick-Start Development Toolkit can connect you with code templates for AI solution patterns
Microsoft AI Envisioning Day Events
How to build and publish AI apps and agents for Microsoft Marketplace