agent architecture
1 TopicBuild Agent Architecture using AI Landing Zones
Establish a shared Governance Hub to centralize model access, MCP catalogs, and policy enforcement, then give every agent a traceable identity, runtime protection, and data governance through Agent 365. Layer in Microsoft Fabric’s Ontologies so your agents reason over real business context, not just raw data. Choose your runtime — no-code, hosted container, or custom — and deploy a production-grade environment in minutes using the AI Landing Zone accelerator. Matt McSpirit, Microsoft Azure Expert, joins Jeremy Chapman, Microsoft 365 Director, to share how to architect, govern, and scale a full agent mesh across the Microsoft stack. One model gateway governs all AI traffic. Azure API Management enforces policies, caps workload quotas, & attributes costs per team. Pair it with Azure API Center for a full MCP catalog. Check it out. Improve AI output quality across every query. Fabric IQ in Microsoft Fabric treats knowledge models as shared, governed assets. Ontologies give agents an explicit map of your business entities, processes, and relationships. Get started. No-code, hosted, or custom container. Pick the runtime that fits your agent. Foundry & Copilot Studio handle orchestration, telemetry, and identity. External runtimes like AKS or EKS plug in through the governance hub. See it in action. QUICK LINKS: 00:00 — Build secure AI agents 01:07 — Accelerate development, reduce risks 02:51 — Model Gateway + MCP Gateway 03:24 — Agent 365 Unified Control Plane 03:56 — Azure Policy + Azure Monitor 04:27 — Intelligent data platform 05:37 — OneLake in Microsoft Fabric 06:39 — Microsoft Purview data governance 07:13 — Fabric IQ with Ontologies 07:51 — Landing Zones 09:23 — Three Hosting Runtimes 11:06 — AI Landing Zone Accelerator 13:16 — Scalability 14:16 — Wrap up Link References Check out https://aka.ms/AIArchitecture Unfamiliar with Microsoft Mechanics? As Microsoft’s official video series for IT, you can watch and share valuable content and demos of current and upcoming tech from the people who build it at Microsoft. Subscribe to our YouTube: https://www.youtube.com/c/MicrosoftMechanicsSeries Talk with other IT Pros, join us on the Microsoft Tech Community: https://techcommunity.microsoft.com/t5/microsoft-mechanics-blog/bg-p/MicrosoftMechanicsBlog Watch or listen from anywhere, subscribe to our podcast: https://microsoftmechanics.libsyn.com/podcast Keep getting this insider knowledge, join us on social: Follow us on Twitter: https://twitter.com/MSFTMechanics Share knowledge on LinkedIn: https://www.linkedin.com/company/microsoft-mechanics/ Enjoy us on Instagram: https://www.instagram.com/msftmechanics/ Loosen up with us on TikTok: https://www.tiktok.com/@msftmechanics Video Transcript: - How do you design an enterprise-ready AI agent architecture that scales fast without bypassing security and control? It’s possible, and it doesn’t need to slow down your developers. In fact, in the next few minutes, we’ll show you how you can implement this architecture across the Microsoft stack and demonstrate what you can do right now to get started. And to explain the mechanics behind it, I’m joined once again by Azure expert, Matt McSpirit. Welcome back! - It’s great to be back! - So, there’s a lot of agent development going on right now, a lot of development, and the raw speed of that can mean that people aren’t always optimizing for scale or re-usability or potential security risks. It’s a bit like the wild, wild west out there. - That’s right, and that’s part of what makes it so exciting right now. But it also creates some real challenges. You’ve got teams building agents in silos, often without a shared approach or standards. On top of that, there’s frequently direct, ungoverned access to models and data, which raises concerns. And when you layer in inconsistent identity management and limited auditability, it becomes harder to scale things responsibly or maintain control as all of this grows. - Okay, so what can be put in place then to accelerate agent development, but also reduce our risks so we can get kind of a win-win? - Well, you’re setting a high bar! And it’s totally possible. One of the best things you can do to unlock developers and agent coding is to establish a catalog of shared services and resources, which exists in two parts. The first provides a shared platform that centralizes AI model registries, knowledge sources, such as MCP servers, APIs, and agents, making them easy to discover and use, along with shared identity and access management, policy engines, and reporting for consistency and observability. We call this the governance hub, and it’s directly tied to Agent 365 and architecturally part of what we call the Platform Landing Zone. The second part, which we call the Intelligent Data Platform, consists of the shared data components themselves, including core operational data as well as the underlying data that agents connect to via MCP Servers and APIs. These are managed data sources spanning structured and unstructured data, where each can be leveraged as tools for agents. Then you have a third component, the Application Plane with defined AI Landing Zones, which are specific to individual agents or sets of agents with similar requirements and policy controls. They’re important because not all AI apps and agents should have the same resources or controls applied to them. Now, together, it’s an enterprise-wide pattern for how agents discover and invoke models, tools, data, and other agents. And at the same time, this allows teams to run decentralized and to choose the runtime that best fits their agent. And for peace of mind, everything is wrapped in a common set of reporting with shared policy and access controls. - Okay, so why don’t we make this real for everyone who’s watching right now and go through all of these components and their corresponding architecture? - Yeah, sounds good. Let’s start with the shared governance hub, which as mentioned, is made up of a few components. First, the AI gateway in Azure API Management enforces policies and routes AI traffic, and enables transparent per‑workload cost attribution. Next, the MCP gateway provides a central catalog of MCP Server data, enabling agent access to knowledge sources using the Model Context Protocol while managing inventory and access permissions. The governance hub also includes Agent 365 as the unified control plane for agents, which itself contains agent management as part of the Microsoft 365 admin center. Then Microsoft Entra enforces strong identity with least privilege access and clear ownership for agents. Microsoft Defender adds runtime protection and deep observability with end-to-end activity tracing and detection. And finally, Microsoft Purview enforces data protection and posture by monitoring how agents interact with sensitive information. Finally at the platform level, you’ll use services like Azure Policy to manage any resources running in Azure directly, or via Azure Arc for anything outside of the Microsoft cloud, including on-premises or in other cloud platforms. Then Azure Monitor for unified reporting of infrastructure, application and agent services, along with the resources or tokens consumed. - So, the governance hub then is like a central set of shared enabling services and plumbing that you connect to and view and control agents with and all the related activities. And it really glues everything together, but it doesn’t include the agent frontend, the logic, the backend, so where do those parts come in? - So, those all come next. Moving on to the Intelligent Data Platform, this provides the managed data for agents to reason over. This is valuable, unique information to the business. For agents to work well, connected data needs to be trusted and clearly understood. And the platform comprises grounding sources optimized for agent reasoning, with several options depending on the nature of what agents are designed to do. For example, the Work IQ API provides the context for how you work, with connections to your email, calendar, previous meetings, Teams chat, files, and more. Foundry IQ then lets you combine multiple knowledge sources for your agents, where everything from structured data sources in databases, to unstructured data in your cloud stores, even images, can be retrieved by agentic processes. And Fabric IQ can then be used to add context over your connected business operations, like sales data, customer records, logistics, and more. You can combine the output of all of these IQs to generate richer, more relevant responses. - Right, and speaking of Microsoft Fabric, it really forms the foundation for operational data in the Intelligent Data Platform, along with things like governance and security, so why don’t we spend some time then to explain all of this further? Why don’t we start with the data lake and how we can manage underlying operational data? - Sure, this is where OneLake in Microsoft Fabric plays an important role. As a unified enterprise data lake, it brings in all operational data across multiple sources. Database mirroring continuously replicates data from on-premises and cloud databases. And shortcuts enable virtual access to external platforms like Databricks, Snowflake, and major cloud object stores without data movement, allowing it to be queried as a single unified data source. For batch and data streaming ingestion of continuous data feeds, Fabric Pipelines and Real-Time Intelligence can be leveraged by your agents. Everything in Microsoft Fabric is stored as Delta files, allowing them to be queried with SQL or Spark. And all access is secured through Microsoft Entra down to row and column level. - Okay, so as agents scale and start acting on business information autonomously, controls over that data are really important to prevent sensitive data from getting into the wrong hands. - They are, and this is where ideally your data should be organized as domain-owned products with clear ownership, data quality controls, and observable policies, so this is where Microsoft Purview can help. It’s got built-in data governance capabilities across the lifecycle from discovery and cataloging to automated classification, end-to-end lineage, policy-based access controls, and auditing. And if your operational data is running on Fabric, the context for agents gets better. Most sources and even the best MCP Servers do not encode business meaning, process, relationships, and priority. And that’s where Fabric IQ comes in to treat knowledge models as shared, governed assets, explicitly representing business entities, relationships, and metrics in a machine-readable form to improve AI output quality and relevance. It uses Ontologies to represent operational models by defining business entities and the relationships that connect them. These give agents an explicit, machine-readable representation of business analytics. And all of this ensures that your agents have consistency, control, and traceability. - Okay, so now with our data and control backends in place for secured access to business context, why don’t we move on to agent-specific controls? - Yeah, and that’s where AI Application Landing Zones come in. We did a whole Mechanics show on this a while back, and they can be used for individual or multiple agents with similar policy requirements. Each landing zone is a spoke subscription boundary that hosts per-workload resources under the ownership of an AI project team. It’s a best practice for apps, and this translates to agents as well. So, here, each project team is responsible for solution delivery and operations within the boundary of the application landing zone. And these still leverage the shared set of resources as part of the platform landing zone, including identity and security controls. This separation matches how enterprises actually structure their teams and operate, where the central team enforces baseline controls, while project teams build and run workloads. Each landing zone itself is a standardized foundation, with its own private networking configuration, AI-specific resources, monitoring and telemetry, along with application services including containers. Everything is managed using Azure Resource Groups, so there are clear RBAC boundaries. And each landing zone is integrated with the governance hub and the intelligent data plane. This way, you have the structure for developers to deploy production-ready agent solutions without violating corporate policies or provisioning new foundational infrastructure for every project. - That said, there are always seems to be new ways to build agents these days, so how does the Application Plane then support development? - Well, enterprise scenarios have different trade-offs. This balances convenience, control, extensibility, and operational responsibility, so we support three hosting formats as runtimes. First, there’s no-code and low-code agents implemented as prompt agents and workflows within Microsoft Copilot Studio and in Microsoft Foundry Agent Service. You’d use managed services in the platform for orchestration, evaluation, and telemetry, each leveraging MCP tools, knowledge grounding, and project-scoped observability. Then you have hosted container agents running in Microsoft Foundry. These are built-in code. And they benefit from core capabilities like conversation history storage and identity and access control, private connectivity, along with agent and fleet-level telemetry and tracing. And when interacted with, they operate under project-level orchestration and centrally applied governance, ensuring policies and guardrails, such as content filtering, prompt injection protection, and copyright safeguards, are consistently enforced. Then, finally, there are custom container-hosted agents, which run externally, outside of Foundry, in places like Azure Container Apps, Azure Kubernetes Service, or non-Azure environments, like EKS. All three still consume hub-mediated models and tools and use identity patterns to avoid unmanaged distribution, while sending telemetry for end-to-end observability. A key part of this is that model and tool access is mediated by the governance hub so that they can use template-driven patterns instead of unique embedded endpoints and credentials for each agent. - So, we’ve shown these before for apps. Is there an accelerator then to deploy AI Landing Zones? - Yeah, the landing zone accelerator provides infrastructure-as-code templates and continuous delivery pipelines. It’s a production-grade reference architecture, and I’ll show you where to find it and how to set one up. First, you can find the AI Landing Zones page on GitHub using aka.ms/AILandingZone. This site contains all of our documentation. Below, you’ll see the Reference Architectures and the automated Extensible Implementations to get the underlying services up and running. You can deploy these using the Azure CLI, Terraform, or using the portal method, so I’ll choose the Azure Portal option so I can show you what gets deployed. You’ll start with the normal project and instance details. so here, you can choose if you want Platform Landing Zone Integration. I’ve chosen no in this case. Next, you’ll see all of the selectable AI services, data backend, Key Vault, and storage. Then the application services, including container infrastructure. Followed by the DevOps configuration with jumpbox options. Next is ensuring that monitoring is configured with Log Analytics and App Insights. Everything uses OpenTelemetry signals and standardized telemetry pipelines so that traces and metrics can be correlated across your agent activities. Then there are all of your network configurations, with firewall and virtual network settings. These are configured with defined private networking posture with virtual networks, private endpoints, network security groups, and private DNS to eliminate public ingress paths between agents, gateways, telemetry, and managed retrieval stores for data. Of course, you’ll want to add the right tagging for this specific project. And after the service runs a quick validation check, you can Create all of the selected resources. And since we’re creating dozens of new resources and more than a hundred deployments counting the dependencies and configurations, this will run several minutes, so we’ll jump ahead to the results. And in the corresponding resource group, you can see all of the resources deployed, networking, Foundry AI, container apps, Cosmos DB instance, monitoring, storage, and our jump VMs. Everything deployed uses managed agent identities as the default mechanism for service-to-service access, avoiding any credential distribution. - Okay, so now we’ve got everything configured and deployed, so how would all this scale over time? - Yeah, this is really important as agents proliferate so you can manage them. The first thing is unique agent identities, which ensures that every agent has clear ownership, permissions, and lifecycle status, so actions can be attributed even in autonomous execution. Next are your shared policy baselines and guardrails, where for example, you enforce authentication, authorization, data access, as well as content safety controls so that they’re consistent across the different agent runtimes. Then you have consistent and defined observability, where your organization can trace interactions for each agent, along with the tools and data they use, as well as correlate failures and attribute costs and risks to each agent via monitoring. It also helps to create agent templates for developers, just like you would for applications, so they can reuse what works instead of reinventing the wheel. And with that in place, you can speed up deployments while still maintaining the right controls and visibility across your agents. - Now we’ve got our architecture, and for anyone who’s watching right now, looking to learn more and get started, what do you recommend? - Well, first, you don’t have to wait. You can get started today by expanding your governance hub and pulling together your registry of data products, MCPs, APIs, models, and agents. Also, make sure that you have the right shared resources and policies at the platform level, and start building! Check out aka.ms/AIArchitecture for more information. - Thanks, Matt, for joining us today and showing how you can build an agent mesh out in your organization. Of course, keep checking back to Microsoft Mechanics for the latest updates, subscribe if you haven’t already, and thanks for watching.140Views0likes0Comments