agents
257 TopicsAgents League: The Esports-Inspired Hackathon Where AI Agents Battle for Glory
Ready to put your AI skills to the ultimate test? Agents League is here, a dynamic, esports-inspired developer challenge that brings the thrill of live competition to the world of agentic AI. Whether you're a seasoned AI developer or just getting started, this is your chance to build, compete, and win. What is Agents League? Agents League is a week-long hackathon running as part of AI Skills Fest (June 4β14, 2026). Unlike traditional hackathons, Agents League combines live AI coding battles, asynchronous project submissions, and a thriving Discord community all competing for a total prize pool of $55,000 USD. This isn't just about building it's about showcasing what's possible with agentic AI in a format that's fast, competitive, and globally accessible. Three Challenge Tracks Pick One or Compete in All 1. Creative Apps Build innovative applications using GitHub Copilot for AI-assisted development. Show off your creativity and demonstrate how AI can accelerate app creation from concept to code. 2. Reasoning Agents Create intelligent agents using Microsoft Foundry that solve complex problems through multi-step reasoning. This track is all about building agents that can think, plan, and execute. 3. Enterprise Agents Build business-ready knowledge agents integrated with Microsoft 365 Copilot, authored in Copilot Studio. Perfect for developers focused on real-world enterprise solutions. Live Microsoft Reactor EventsβDon't Miss the Battles! The heart of Agents League beats through live Microsoft Reactor events. Watch experts go head-to-head in live coding battles, learn cutting-edge techniques, and get inspired for your own submissions: Event What You'll Learn Creative Apps Battle See GitHub Copilot in action as experts build innovative apps live Reasoning Agents Battle Watch multi-step reasoning agents come to life with Microsoft Foundry Enterprise Agents Battle Learn to build M365-integrated agents with Copilot Studio π View the full event series Key Dates Registration Deadline: June 12, 2026, 12:00 PM PT Hacking Period: June 4β14, 2026 Submission Deadline: June 14, 2026, 11:59 PM PT What You Get Live coding battles with expert demonstrations Curated technical experiences and on-demand content Learning resources on Microsoft Learn and AI Skills Navigator Community support through Discord GitHub-based submissions for transparent, collaborative judging Why Participate? Agents League isn't just another hackathon. It's designed as a streamlined, competitive format that: β Fits into your schedule with focused, time-boxed challenges β Provides real-world product innovation experience β Offers global accessibilityβparticipate from anywhere β Demonstrates the latest capabilities of agentic AI, including new IQ tools β Connects you with a passionate developer community Ready to Enter the Arena? Register Now for Agents League Before you register: Review the Hackathon Rules and Regulations for prize categories and judging criteria Join the Microsoft Reactor event series for live battles and learning Check out the Microsoft Event Code of Conduct Join the Conversation Have questions? Want to connect with fellow competitors? Join the Agents League community on Discord and start strategizing with developers from around the world. Whether you're building creative apps, reasoning agents, or enterprise solutionsβthe arena awaits. May the best agent win! π Agents League hackathon is open to the public and offered at no cost. Government employees should check with their employers to ensure participation is permitted in accordance with applicable policies. Related Links: Agents League Hackathon Registration Microsoft Reactor Series AI Skills FestCopilot, Microsoft 365 & Power Platform Community call
π‘ Copilot, Microsoft 365 & Power Platform Development bi-weekly community call focuses on different use cases and features within the Microsoft 365 and Power Platform - across Microsoft 365 Copilot, Copilot Studio, SharePoint, Power Apps and more. Demos in this call are presented by the community members. π Looking to catch up on the latest news and updates, including cool community demos, this call is for you! π On 11th of June we'll have following agenda: Latest on SharePoint Framework (SPFx) Latest on Copilot prompt of the week PnPjs CLI for Microsoft 365 Dev Proxy Reusable Controls for SPFx SPFx Toolkit VS Code extension PnP Search Solution Demos this time Mike Fortgens (Ichicraft) β Personalized SharePoint pages with configurable widgets Vipul Jain (Bosch Global Software Technologies) β Creating Smart Export to PDF in SharePoint Online using SPFx JoΓ£o Mendes (Kuehne & Nagel) & Hugo Bernierβ β Creating a custom events web part with React and SharePoint Framework (SPFx) π Download recurrent invite from https://aka.ms/community/m365-powerplat-dev-call-invite π & πΊ Join the Microsoft Teams meeting live at https://aka.ms/community/m365-powerplat-dev-call-join π‘ Building something cool for Microsoft 365 or Power Platform (Copilot, SharePoint, Power Apps, etc)? We are always looking for presenters - Volunteer for a community call demo at https://aka.ms/community/request/demo π See you in the call! π Resources: Previous community call recordings and demos from the Microsoft Community Learning YouTube channel at https://aka.ms/community/youtube Microsoft 365 & Power Platform samples from Microsoft and community - https://aka.ms/community/samples Microsoft 365 & Power Platform community details - https://aka.ms/community/home π§‘ Sharing is caring!17Views0likes0CommentsCopilot, Microsoft 365 & Power Platform product updates call
π‘Copilot, Microsoft 365 & Power Platform product updates call concentrates on the different use cases and features within the Microsoft 365 and in Power Platform. Call includes topics like Microsoft 365 Copilot, Copilot Studio, Microsoft Teams, Power Platform, Microsoft Graph, Microsoft Viva, Microsoft Search, Microsoft Lists, SharePoint, Power Automate, Power Apps and more. π Weekly Tuesday call is for all community members to see Microsoft PMs, engineering and Cloud Advocates showcasing the art of possible with Microsoft 365 and Power Platform. π On the 9th of June we'll have following agenda: News and updates from Microsoft Together mode group photo Vishal Anil β Announcing the Communicator App in Microsoft Teams Steve Pucelik β From Versions to Insights: AI-Powered Document Intelligence in SharePoint Embedded Anshul Jethwani & Harish Swaminathan β Introduction to 8 new Agent Builder templates for Microsoft 365 Copilot π & πΊ Join the Microsoft Teams meeting live at https://aka.ms/community/ms-speakers-call-join ποΈ Download recurrent invite for this weekly call from https://aka.ms/community/ms-speakers-call-invite π See you in the call! π‘ Building something cool for Microsoft 365 or Power Platform (Copilot, SharePoint, Power Apps, etc)? We are always looking for presenters - Volunteer for a community call demo at https://aka.ms/community/request/demo π Resources: Previous community call recordings and demos from the Microsoft Community Learning YouTube channel at https://aka.ms/community/youtube Microsoft 365 & Power Platform samples from Microsoft and community - https://aka.ms/community/samples Microsoft 365 & Power Platform community details - https://aka.ms/community/home π§‘ Sharing is caring!23Views0likes0CommentsFrom insight to action: how Adobe and Microsoft are helping marketers move faster with AI
Todayβs marketing leaders are under pressure to do more than everβdeliver meaningful personalization, accelerate execution, and prove measurable business impact. At the same time, teams are navigating increasing complexity: fragmented data, disconnected tools, and insights that arrive too late to act on. AI can change thisβbut only when itβs embedded directly into how people already work. Thatβs why Microsoft and Adobe are deepening our partnership: bringing customer experience intelligence, AI-powered workflows, and enterprise-grade AI directly into Microsoft 365 Copilotβso teams can move from insight to alignment to execution in one continuous workflow. The result is faster decisions, more coordinated execution, and clearer business outcomesβwithout breaking flow or context. Bringing customer experience intelligence into the flow of work Marketing teams donβt struggle because they lack data. They struggle because insights live in one place, collaboration in another, and execution somewhere else entirely. That disconnect slows teams down and creates unnecessary friction between analysis and action. Together, Adobe and Microsoft are changing that dynamic by connecting Adobeβs customer experience capabilities with Microsoft 365 Copilot and Copilot Coworkβso insight, collaboration, and next-best action can happen where work already happens: in Copilot Chat and in everyday apps like Teams, Word, and PowerPoint. Marketers can ask questions, explore insights, align with teammates, and take action without jumping between toolsβturning intelligence into impact at the moment it matters. Adobe Marketing Agent for Microsoft 365 Copilot: now generally available A major milestone in this journey is the general availability of the Adobe Marketing Agent for Microsoft 365 Copilot, now available via Microsoft Commercial Marketplace. The Adobe Marketing Agent brings Adobe customer experience intelligence directly into Copilot, enabling marketing teams to: Accelerate time from insight to decision Move seamlessly from analysis to execution Keep humans firmly in control, with AI supportingβnot replacingβdecisionβmaking Importantly, the agent is enterprise-ready by design. IT administrators can deploy and manage the experience through the Microsoft 365 admin center, ensuring security, governance, and compliance at scale. Expanding executive experiences with Copilot Cowork Looking ahead, Adobe skills designed for customer experience orchestration will be accessible in Copilot Coworkβin a future release. This upcoming experience will enable customer experience leaders to engage with customer experience insights in a more direct, conversational way, bringing strategic visibility into the same Copilot environments where decisions are made and actions are coordinated. Built on Azure to scale securely and responsibly The technology foundation of this innovation is Azure. Adobe Experience Platform, Adobe Experience Platform Agent Orchestrator, and Adobe AI Agents are built on Azure and leverage Azure AI models, providing the scalability, security, and reliability enterprises require. By running on Azure, these agentic experiences benefit from Microsoftβs global infrastructure, enterpriseβgrade security, and responsible AI commitmentsβsupporting customer trust as organizations scale AI across their business. Designed for interoperability across agent ecosystems Modern enterprises donβt operate in a single ecosystemβand their agents shouldnβt either. Adobe agents are built to interoperate with agents created using Microsoft Azure AI Foundry or Copilot Studio, enabling customers to orchestrate richer, crossβfunctional workflows across marketing, sales, service, and operations. This architecture is designed to enable organizations to compose agentic solutions that reflect how work actually happensβacross systems, teams, and business processes. Moving from experimentation to execution This partnership reflects a broader shift in how organizations adopt AIβmoving from experimentation to embedded, enterpriseβready execution. By bringing the full power of Adobe Experience Platform together with Microsoftβs AI platform, cloud infrastructure, and Copilot experiences, weβre helping teams move faster with clarity, confidence, and control. This is how AI becomes not just powerfulβbut practical. Learn more Adobe + Microsoft partnership page Adobe Marketing Agent for Microsoft Copilot page96Views1like0CommentsDriving AIβPowered Healthcare: A Data & AI Webinar and Workshop Series
Across these sessions, youβll learn how healthcare organizations are using Microsoft Fabric, advanced analytics, and AI to unify fragmented data, modernize analytics, and enable intelligent, scalable solutions, from enterprise reporting to AIβpowered use cases. Whether youβre just getting started or looking to accelerate adoption, these sessions offer practical guidance, realβworld examples, and handsβon learning to help you build a strong data foundation for AI in healthcare. Date Topic Details Location Registration Link May 6 Webinar: Microsoft Fabric Foundations - A Simple Path to Modern Analytics and AI Discover how Microsoft Fabric consolidates fragmented analytics into a single integrated data platform, making it easier to deliver trusted insights and adopt AI without added complexity. Virtual Register May 13 Webinar: Reduce BI Sprawl, Cut Cost and Build an AI-Ready Analytics Foundation Learn how Power BI enables enterprise BI consolidation, consistent metrics, and secure, scalable analytics that support both operational reporting and emerging AI use cases. Virtual Register May 19-20 In Person Workshop: Driving AIβPowered Healthcare: Advanced Analytics, AI, and RealβWorld Impact Attend this twoβday, inβperson event to learn how healthcare organizations use Microsoft Fabric to unify data, accelerate AI adoption, and deliver measurable clinical and operational value. Day 1 focuses on strategy, architecture, and realβworld healthcare use cases, while Day 2 offers handsβon workshops to apply those concepts through guided labs and agentβpowered solutions. Chicago Register May 27 Webinar: Unified Data Foundation for AI & Analytics - Leveraging OneLake and Microsoft Fabric This session shows how organizations can simplify fragmented data architectures by using Microsoft Fabric and OneLake as a single, governed foundation for analytics and AI. Virtual Register June 3-4 In Person Workshop: Driving AIβPowered Healthcare: Advanced Analytics, AI, and RealβWorld Impact Attend this twoβday, inβperson event to learn how healthcare organizations use Microsoft Fabric to unify data, accelerate AI adoption, and deliver measurable clinical and operational value. Day 1 focuses on strategy, architecture, and realβworld healthcare use cases, while Day 2 offers handsβon workshops to apply those concepts through guided labs and agentβpowered solutions. New York Register June 10 Webinar: From Data to Decisions: How AI Data Agents in Microsoft Fabric Redefine Analytics Join us to learn how Fabric Data Agents enable users to interact with enterprise data through AIβpowered, governed agents that understand both data and business context. Virtual RegisterFoundry Toolkit for VS Code at //build: Hosted Agents End-to-End, a Smarter Toolbox, and More
Weβre excited to share whatβs new for Foundry Toolkit for Visual Studio Code at //build 2026. Since going generally available, the toolkit has kept moving fast, and this release is a big one. The headline: a complete, end-to-end Hosted Agent experience, scaffold, run, deploy, and observe without ever leaving VS Code. On top of that, weβve expanded the Toolbox with native enterprise integrations and shipped a wave of LangGraph samples so every developer has a clear path from idea to production. From your first prompt to a production-grade, observable agent, Foundry Toolkit meets you where you are. Hosted Agents, End to End Building an agent is the easy part; getting it from a first draft to a production-grade, observable service is what matters. This release makes the full Hosted Agent lifecycle available in VS Code, and it follows the way you actually work β scaffold, run, deploy, observe. Scaffold β start from a rich set of samples Hosted Agent creation now opens with a refreshed scaffolding experience and a rich sample selection, so you start from a working, framework-appropriate template instead of a blank file. Creation is smarter, too: we auto-select your subscription when thereβs only one, gate tabs more clearly, and tightened spacing for a cleaner setup flow. Run (F5) β inspect as you build Press F5 and your agent runs locally with the Agent Inspector, now aligned with the rest of the extension and featuring Copilot SDK visualization so you can see what the Inspector visualizes as the agent executes. Itβs the fastest loop from change to verification before anything leaves your machine. Deploy β a new UX and new ways to ship Different teams ship differently, so deployment got a refreshed UX and two new options for Hosted Agents: ZIP Code Deploy: Package your agent source as a ZIP and deploy it directly to Microsoft Foundry Agent Service. Bring-Your-Own-Image (BYOI): Already have a pre-built container in your own Azure Container Registry? Deploy straight from it. Observe β know it works in production Once deployed, the full observability story is now available: Hosted Agent Tracing: Inspect end-to-end traces of Hosted Agent invocations directly from VS Code β tool calls, delegation chains, and timing for real debugging instead of guesswork. Continuous Evaluation Settings: A new page to configure ongoing evaluation for deployed Hosted Agents, so quality is measured continuously β not just at ship time. Evaluations Node: One-click access to evaluation runs and results right from the Foundry project tree. A Smarter, More Connected Toolbox What it is, and why it matters A Toolbox is how your agent gets its capabilities β the curated set of tools, knowledge sources, and integrations it can call at runtime. Instead of hand-wiring each connection, you assemble a Toolbox once and your agent consumes it consistently across local runs and production. The result: agents that can act on real enterprise data and systems, with the connections managed in one place. From what to how: create, connect, consume Create: Start a new Toolbox from the Foundry Toolkit sidebar βTools Catalogβ and pick the capabilities your agent needs. Connect: Configure and wire in enterprise systems through native, first-class connections once, and use it for all your agents. Consume: Reference the Toolbox from your Hosted Agent so its tools are available the moment the agent runs, locally (F5) and once deployed. New this release Building on that flow, the Toolbox is now richer and more enterprise-ready: WorkIQ as a Built-in Tool: A first-class WorkIQ experience powered by A2A connections β no MCP fallback required. End-to-end toolbox creation with WorkIQ works out of the box. Fabric IQ (OneLake Catalog) Integration: Connect your agents to Microsoft Fabric OneLake catalogs directly from the Toolbox. Toolbox Guardrails: Apply content-safety guardrails to your Toolbox for safer agent execution. Faster discovery: A new Toolbox Search Toggle and Agent Tool Multi-Select let you find and wire in multiple tools in a single action. LangGraph Reaches Parity LangGraph developers, this one is for you. Weβve added five new Hosted Agent samples that bring LangGraph to full parity with the Agent Framework Responses learning path β so you get an equivalent, end-to-end walkthrough no matter which framework you prefer: MCP β tool loading from a remote MCP server (defaults to GitHub Copilot MCP) via MultiServerMCPClient. Workflows β a custom StateGraph chaining three specialized LLM nodes: slogan writer, legal reviewer, and formatter. Files β local filesystem tools plus the Foundry-Toolbox code_interpreter working over session-uploaded files. Human-in-the-Loop β a StateGraph that drafts a proposal and pauses for approval via langgraph.types.interrupt. Observability β GenAI OpenTelemetry tracing with enable_auto_tracing(); spans, metrics, and logs flow to Application Insights. Weβve also refreshed the existing bring-your-own LangGraph samples against the new hosting layer (chat with local tools, Foundry-managed Toolbox loading, and SSE-streamed multi-turn sessions backed by a MemorySaver checkpointer), so every sample reflects how Hosted Agents work today. Polish Across the Board A release is more than headline features. This one also includes a redesigned Prompt Builder βImprove an Instructionβ dialog for faster iteration, fixes for MCP toolbox tool icons, clearer ZIP-deploy error surfacing, and assorted Agent Builder and Playground regression fixes β the whole experience feels tighter end to end. Get Started Today Install: Foundry Toolkit on the VS Code Marketplace Quick Start: Follow our getting-started tutorial to build your first Hosted Agent Deep Dive: Explore the documentation, samples, and LangGraph parity walkthroughs Join the Community Share your projects, file issues, or suggest features on our GitHub repository. We canβt wait to see what you build. Welcome to the next chapter of AI development!140Views0likes0CommentsDevOps for Microsoft Hosted Agents: From Terraform Apply to Production-Grade Agent Delivery
A companion piece to Infrastructure as Code for AI: Building and Deploying Microsoft Hosted Agents with Terraform. Just announced β source-code deploy (preview). Foundry has just added a second Hosted Agent deploy path alongside the container path this post covers. Instead of a container image, you upload a .zip of your source plus a requirements.txt (Python 3.13 / 3.14) or a .csproj (.NET 10), and the Agent Service either builds dependencies for you ( remote_build ) or runs a prebuilt bundle ( bundled ). The version definition uses code_configuration instead of container_configuration β the two are mutually exclusive on a given version. Versioning is content-addressable on the zip's SHA-256, so the dedup behaviour described below still applies. Required roles shift slightly: deploying the agent needs Foundry Project Manager at project scope, and the platform-assigned agent identity gets Foundry User (both handled automatically by azd and the Foundry VS Code Toolkit). The DevOps loop in this post β immutable versions, eval gating, manifest-driven promotion, traffic-split canary, per-version observability β transfers directly; only the build-and-push stage changes (no Dockerfile, no ACR for remote_build ). The container path covered here remains fully supported and is still the right choice if you need custom base images, system packages, or non-Python/.NET runtimes. Full details: Deploy a hosted agent from source code (preview). What this post assumes. It describes recommended enterprise DevOps patterns on top of Microsoft Foundry Hosted Agents. Some patterns β evaluation gating, traffic-based rollout, manifest-driven promotion β are best practices and may not be enforced by the platform itself. Hosted Agents and several related capabilities (A2A, certain deployment and routing controls) are in preview and may evolve. TL;DR Terraform provisions the platform: Foundry account, project, model deployment, ACR, App Insights, RBAC. DevOps pipelines ship agent versions, not source branches β the deploy artifact is a container image digest plus an immutable version spec. Evaluation should be treated as a release gate, not a dashboard. Quality regressions should fail the build the same way unit-test failures do. Traffic split between versions is the rollout and rollback primitive. Rollback typically avoids rebuilding or redeploying artifacts. Observability is sliced per version β during canary, two versions serve simultaneously and aggregate metrics lie. The Delivery Pipeline at a Glance Terraform ββββΊ Foundry project (AIServices) + model deployment + ACR + App Insights β PR opened βΌ βββΊ docker build ββββΊ push to ACR ββββΊ capture image digest β βΌ Foundry SDK: create agent version (image digest + cpu/mem + env + protocols) β βΌ Evaluation gate βββββΊ fail β stop β βΌ pass Promote via manifest β staging β prod β βΌ Traffic-split canary (0% β 10% β 100%) β βΌ App Insights: per-version latency, cost, sampled quality, sandbox sizing Infrastructure as Code gets the platform stood up. It does not, on its own, ship an agent. The gap between terraform apply succeeding and a customer-facing agent reliably serving requests in production is where DevOps lives β and for Microsoft Hosted Agents on Microsoft Foundry, that gap has its own shape. A Hosted Agent is not a prompt and a tool list. It is your own code, packaged as a container image, pushed to Azure Container Registry, and deployed to a Foundry project. The Foundry Agent Service pulls the image, provisions an isolated execution environment per agent session, assigns the agent its own dedicated Microsoft Entra ID (agent identity), and exposes a dedicated endpoint. An agent supports up to four protocols, any of which can be combined in a single deployment: Responses ( .../protocols/openai/responses ) β OpenAI-compatible chat-style API. Implemented in the container. Invocations ( .../protocols/invocations ) β arbitrary JSON in / arbitrary JSON out for webhook receivers and non-conversational workloads. Implemented in the container. A2A ( .../protocols/a2a , preview) β the open Agent2Agent protocol for agent-to-agent delegation across frameworks and vendors. Surfaced on its own endpoint path by the platform. Activity β the Teams / M365 channel protocol. The platform bridges Responses to Activity automatically when an agent is published to a Microsoft 365 channel. Microsoft manages the runtime, scaling, session state, and lifecycle. You ship the image and the version definition. Important β Foundry version compatibility. Hosted Agents are supported on the new Microsoft Foundry project resource model ( azurerm_cognitive_account_project under a Cognitive Services account of kind = "AIServices" ). The older Azure AI Foundry Hub model ( azurerm_ai_foundry / azurerm_ai_foundry_project , kind = "Hub" ) β the Azure MLβderived workspace surface β does not expose Hosted Agent capabilities. They are two distinct Azure resource types with different APIs. Everything in this post assumes the new Foundry project. That shape drives three things every DevOps loop for Hosted Agents has to handle: The deploy artifact is a container image plus an immutable agent version. A version snapshots the image digest, CPU/memory, environment variables, and protocol configuration. To change anything, you create a new version. The platform supports weighted traffic between versions, which is your blue/green and canary primitive. The agent identity is created for you, per agent. You don't pick one or wire managed-identity references manually. Each agent is assigned a dedicated Microsoft Entra ID (agent identity) at deploy time; RBAC to downstream resources is granted to that identity. Quality is non-deterministic. Two terraform apply runs against the same configuration produce identical resources. Two agent runs against the same input can produce different outputs. Your pipeline has to gate on evaluation, not only on tests passing and HTTP 200s. This post lays out an end-to-end DevOps loop on top of that shape: how to structure the repository, what runs in CI versus CD, how to gate releases on evaluation, how to promote across environments, how to use version traffic split for safe rollouts and instant rollback, and what observability is worth wiring beyond the defaults. A Quick Tour of Microsoft Foundry If you've spent more time in Azure OpenAI or AI Studio than in Foundry, a short orientation helps before the DevOps patterns make sense. Microsoft Foundry is Microsoft's unified platform for building, evaluating, deploying, and operating AI applications and agents. It consolidates what used to be spread across Azure OpenAI, Azure AI Studio, and the AI Hub model into a single resource and a single portal at ai.azure.com. Three pieces are worth knowing up front. The resource model Foundry is built on two Azure resources: Foundry account β an azurerm_cognitive_account with kind = "AIServices" , project_management_enabled = true , a custom_subdomain_name , and a managed identity. This is the top-level container: it holds your model deployments (Azure OpenAI and the broader Foundry model catalog), connections to backing services, and the Foundry-managed Toolbox MCP endpoint. Foundry project β an azurerm_cognitive_account_project under that account. A project is the scope for agents, evaluations, conversation history, indexes, and per-app connections. One project per app or per environment is the usual shape. This is the new Foundry model β and it is the only model that supports Hosted Agents. The older Azure AI Foundry Hub ( azurerm_ai_foundry + azurerm_ai_foundry_project , kind = "Hub" ) is a separate Azure MLβderived workspace and cannot host Hosted Agents. The two surfaces look superficially similar in the portal but are distinct Azure resource types with different APIs and feature sets. If a tutorial, sample, or piece of Terraform you find online creates an azurerm_ai_foundry Hub, it is targeting the classic surface and the Hosted Agents APIs ( /agents , agent versions, traffic split, dedicated endpoints) will not be available against it. To use Hosted Agents you must provision a new Foundry account + project as described above. There is no in-place upgrade from a Hub. What Foundry gives you A Foundry project is more than a container. Out of the box it provides: A model catalog and deployment surface β Azure OpenAI models (GPT-4.1, GPT-4o, o-series, embeddings), plus open and partner models, all deployed and invoked through the same project endpoint with the same auth model. Two agent execution modes β prompt-based agents (defined entirely by instructions + tool configuration in the portal, suitable for conversational assistants) and Hosted Agents (your own containerized code, the subject of this post). A managed Toolbox β a project-level MCP endpoint that exposes Foundry-curated tools (Code Interpreter, Web Search, Azure AI Search, OpenAPI, custom MCP, A2A) with consolidated auth. Hosted Agent code connects to the Toolbox using standard MCP client libraries. First-class evaluation β datasets, graders (similarity, LLM-as-judge, safety, groundedness), and evaluation runs as a built-in concept, not a bolt-on. Built-in tracing β OpenTelemetry traces from agents land in a linked Application Insights resource automatically. No manual instrumentation needed to get the basics. Per-agent identity β when you deploy a Hosted Agent, the platform creates a dedicated Microsoft Entra ID (agent identity) for it and gives it a dedicated endpoint. RBAC to downstream resources is granted to that identity. How the pieces line up for Hosted Agents For the rest of this post, the mental model is: Resource group βββ Foundry account (Cognitive Services, kind=AIServices) βββ Model deployments (e.g. gpt-4.1) βββ Foundry project βββ Hosted Agent: customer-support β βββ Version v1 (image digest A, 100% traffic) β βββ Version v2 (image digest B, 0% traffic β canary) βββ Hosted Agent: webhook-handler βββ Evaluations βββ Connections (ACR, AI Search, Key Vaultβ¦) βββ Toolbox (MCP) Terraform provisions the account, project, model deployments, ACR, App Insights, and RBAC. Hosted Agents β images, versions, traffic weights β are managed through azd or the Foundry SDK. That boundary is what the rest of this post automates. The minimal Terraform shape For Hosted Agents you need the new-model shape instead. The skeleton below is the minimum that lets you deploy a Hosted Agent on top of it β storage, Key Vault, monitoring, networking, and OIDC for CI live alongside for more details see Infrastructure as Code for AI: Building and Deploying Microsoft Hosted Agents with Terraform | Microsoft Community Hub. # Foundry account (new model β required for Hosted Agents) resource "azurerm_cognitive_account" "foundry" { name = "ai-${local.name}" resource_group_name = azurerm_resource_group.main.name location = azurerm_resource_group.main.location kind = "AIServices" sku_name = "S0" project_management_enabled = true custom_subdomain_name = "ai-${local.name}" # required for AAD auth identity { type = "SystemAssigned" } } # Model deployment the agent will call resource "azurerm_cognitive_deployment" "gpt" { name = "gpt-4.1" # stable name β agents pin to this cognitive_account_id = azurerm_cognitive_account.foundry.id model { format = "OpenAI" name = "gpt-4.1" version = "2025-04-14" } sku { name = "GlobalStandard" capacity = 10 } } # Foundry project β the scope for Hosted Agents, evals, conversations resource "azurerm_cognitive_account_project" "main" { name = "proj-${local.name}" cognitive_account_id = azurerm_cognitive_account.foundry.id location = azurerm_resource_group.main.location identity { type = "SystemAssigned" } } # Container registry the agent image is pushed to and pulled from resource "azurerm_container_registry" "acr" { name = "acr${replace(local.name, "-", "")}" resource_group_name = azurerm_resource_group.main.name location = azurerm_resource_group.main.location sku = "Standard" admin_enabled = false # use RBAC, not admin user } # The project's managed identity needs to pull the agent image resource "azurerm_role_assignment" "project_acr_pull" { scope = azurerm_container_registry.acr.id role_definition_name = "AcrPull" # use Container Registry Repository Reader if the ACR has ABAC enabled principal_id = azurerm_cognitive_account_project.main.identity[0].principal_id } A few things worth calling out: kind = "AIServices" + project_management_enabled = true + custom_subdomain_name are what make this a new-model Foundry account. Omit project_management_enabled and azurerm_cognitive_account_project will not provision; omit custom_subdomain_name and you lose the Foundry endpoint shape that Entra-authenticated access depends on. azurerm_cognitive_account_project is the new-Foundry project resource. Do not use azurerm_ai_foundry_project β that targets the Hub model and does not host agents. Keep the model deployment name stable. Agent code (and your agent.yaml ) pins to the deployment name, not the model version. Changing the version is safe; changing the name forces a new agent version. The project MI needs ACR pull, not push. CI pushes the image (via its own identity); the platform pulls it on the project's behalf when the agent runs. ABAC-enabled ACR is supported but requires --source-acr-auth-id [caller] on az acr build in your CI script β a common gotcha. A note on the provider. Everything above uses the hashicorp/azurerm provider. Foundry's surface evolves quickly, and you will occasionally hit a property or child resource that AzureRM hasn't caught up with yet β project connections, capability hosts, and some newer agent-related fields are common examples. When that happens, reach for azure/azapi: use azapi_update_resource to patch a missing property on an AzureRM-owned resource, and azapi_resource for resources AzureRM doesn't model at all. Keep AzureRM as the default and use AzAPI as a targeted gap-filler, so you don't fork ownership of mainstream resources. The Hosted Agent Delivery Loop A working delivery loop has five stages. Each maps to a specific artifact, a specific tool, and a specific failure mode. Stage Artifact Tool Primary failure mode Infra provisioning Terraform state terraform apply Quota, RBAC propagation, ACR not reachable Image build & push OCI image in ACR (ACR must remain publicly reachable today) docker build / az acr build Image too large, base image CVEs Agent version create Immutable version (image digest + config) azd or Foundry SDK Bad env var, wrong protocol declared Evaluation Eval dataset + grader Foundry evaluators Quality / safety regression Traffic shift & observe Version weights, App Insights traces Foundry SDK + Azure Monitor Silent quality decay, sandbox over/under-sizing The first stage is where the prior post left off. The remaining four are this post. Infra provisioning assumes the standard pattern: terraform plan runs on every PR as a review gate (posted as a PR comment) and terraform apply runs only on merge to the environment branch. Everything below assumes the platform is already applied. Repository Shape A repository that supports the loop end-to-end looks roughly like this: agent-platform/ βββ infra/ # Terraform from the prior post (AIServices + project) β βββ modules/foundry-project/ β βββ environments/ β βββ dev.tfvars β βββ staging.tfvars β βββ prod.tfvars βββ agents/ β βββ customer-support/ β β βββ Dockerfile β β βββ src/ # Agent code (Python or C#) β β βββ agent.yaml # Version spec: image, cpu/memory, protocols, env β β βββ evals/ β β β βββ dataset.jsonl β β β βββ graders.yaml β β βββ README.md β βββ webhook-handler/ β βββ ... βββ scripts/ β βββ deploy_agent_version.py # Build β push β create version β optional weight shift β βββ run_evals.py β βββ promote_version.py # Shifts traffic between versions βββ .github/workflows/ βββ infra.yml # Terraform plan/apply βββ agent-pr.yml # Build, push to ACR, deploy candidate version, run evals βββ agent-release.yml # Promote a tested version to staging / prod Two deliberate choices. First, infrastructure and agents live in the same repo but in separate top-level directories with separate pipelines. They have different cadences and different reviewers. Second, each agent is its own folder with its own Dockerfile , code, version spec, and eval suite. A single PR touches one agent's directory cleanly; a code-review diff stays focused. The Agent Version as the Deploy Unit A Hosted Agent is deployed as a version. A version is immutable β once created it captures: the container image digest (not just the tag β the digest, so it cannot drift), CPU and memory allocation for the per-session sandbox (e.g. 1 vCPU / 2 GiB), the container protocols the image implements β responses , invocations , or both, environment variables passed to the container at runtime, any other version-scoped configuration (e.g. base model deployment name). The container's container_protocol_versions only declares responses and/or invocations β the two protocols the container itself implements. A2A (preview) is surfaced by the platform on its own endpoint path, and Activity is bridged from Responses automatically when the agent is published to a Microsoft 365 channel. Under the hood, agent versions run on Azure Container Apps with VM-isolated sandboxes, which is also why you may see the term revision in some Container Appsβsurfaced APIs and limits β a Hosted Agent version corresponds to one such revision. To change any of those, you create a new version. The platform keeps the old one and shifts traffic between them by weight. This is the primitive you use for canary rollouts and for rollback β both reduce to a traffic-weight change, not a redeploy. An agent.yaml per agent makes the version reproducible from source: # agents/customer-support/agent.yaml name: customer-support container: image: ${ACR_LOGIN_SERVER}/customer-support # digest resolved at deploy time cpu: 1 memory: 2Gi protocols: # container_protocol_versions - responses # add `invocations` here if the container also handles webhook-style payloads env: # The platform automatically injects FOUNDRY_PROJECT_ENDPOINT, # AZURE_AI_MODEL_DEPLOYMENT_NAME, and APPLICATIONINSIGHTS_CONNECTION_STRING # β you only set what's specific to your agent. LOG_LEVEL: info metadata: owner: support-team source_commit: ${GITHUB_SHA} scripts/deploy_agent_version.py is the executable form of this spec. Its job per agent is: Build the container image ( docker build locally, or az acr build server-side for ABAC ACRs). Push to ACR and capture the resulting image digest β not the :latest tag. Resolve environment variables from the target environment's config. Call the Foundry SDK to create a new agent version pinned to that digest. Emit a deployment-manifest.json containing the agent name, version ID, image digest, source commit SHA, and the eval dataset hash used. One gotcha: the platform deduplicates. A create version call with no change to the version parameters (same image digest, same env, same CPU/memory, same protocols) will not produce a new version object. Write the script to treat "no new version returned" as success and reuse the existing version ID in the manifest, not as a failure to retry. That manifest is the cross-pipeline contract. PR pipelines produce one. Promotion pipelines consume one. Rollback consumes a previous one. Evaluation as a Release Gate Foundry ships evaluators (datasets, graders, evaluation runs) as a first-class platform feature. Whether to block a release on their results is a team decision, not a platform mandate β but it is the recommended pattern for any agent serving real users. A pipeline that promotes an agent because the image built, the container started, and the version was created with HTTP 200 will eventually ship a regression that an integration test cannot catch. Treat the eval suite the way you treat unit tests: failures stop the pipeline. A minimal but honest evaluation setup has three pieces. A reference dataset. Twenty to fifty representative scenarios is enough to start. Each row is an input plus either a reference answer, a set of must-include facts, or a rubric. Store as JSONL alongside the agent: {"id":"refund-1","input":"How do I get a refund for order 12345?","must_include":["return window","14 days","original payment method"]} {"id":"escalate-1","input":"This is the third time my package is late.","rubric":"Agent should acknowledge, apologize, offer escalation, not promise compensation."} Graders. Foundry's evaluators library ships templates β exact match, similarity, LLM-as-judge for rubric scoring, and built-in safety and groundedness graders. Pick what matches your dataset shape. LLM-as-judge is the workhorse for open-ended responses; pin its model deployment explicitly so the grader itself does not drift between runs. Thresholds. Decide what "passing" means before the first run. A common pattern: Hard floor on safety / groundedness β any regression fails the build. Relative threshold on quality β no more than X% drop versus the last known-good version. Absolute floor on must-include coverage β for example β₯ 90%. Wire it into the PR pipeline: # .github/workflows/agent-pr.yml (excerpt) - name: Build, push, and create candidate version run: | python scripts/deploy_agent_version.py \ --agent customer-support \ --project $EVAL_PROJECT \ --version-suffix pr-${{ github.event.number }} \ --traffic 0 # create the version, do not route traffic yet - name: Run evaluations against candidate endpoint run: | python scripts/run_evals.py \ --agent customer-support \ --version pr-${{ github.event.number }} \ --baseline last-known-good \ --fail-on-regression The PR creates a candidate version with zero traffic weight against a long-lived "eval" Foundry project, runs evaluations against the candidate version's dedicated endpoint, and then deletes the candidate version on PR close. A standing eval project beats a per-PR Foundry project β provisioning a project per PR is slow and adds RBAC overhead that does not earn its keep. Environment Promotion Three environments is the floor: dev , staging , prod . Each is its own Foundry project, ideally its own Foundry account in its own resource group. What promotes between them is the image digest and the version spec β not source code, and not "redeploy from main." A workable model: dev β every push to a feature branch builds an image and creates a dev version. Loose evaluation thresholds. Used for human poking and end-to-end debugging. staging β merges to main create a staging version. Full eval suite, strict thresholds. Same sandbox sizing, same env vars, same protocols as prod. prod β manually approved promotion from staging. Promotion script reads the staging manifest, finds the image digest that passed, and creates the prod version pointed at that exact digest. No rebuild. The "same digest" rule is the recommended pattern for safe promotion. If staging passed evaluations on customer-support@sha256:abcβ¦ running gpt-4.1 , prod should get that exact image. Re-building from main in the prod pipeline reintroduces the risk you spent staging trying to eliminate β a different base-image patch level, a different transitive dependency, a different build clock β even though nothing in your source changed. GitHub Actions environments make the approval concrete: jobs: promote-prod: needs: deploy-staging environment: production # requires reviewer approval runs-on: ubuntu-latest steps: - name: Create prod version from staging manifest run: | python scripts/deploy_agent_version.py \ --agent customer-support \ --project $PROD_PROJECT \ --from-manifest staging-manifest.json \ --traffic 10 # canary at 10% The canary weight is the second half of safe promotion: create the prod version, give it a small fraction of traffic, watch the App Insights traces, then shift the rest with promote_version.py . Traffic-Split Rollout and Instant Rollback Weighted version traffic changes the rollback model entirely. Rollback typically avoids rebuilding or redeploying artifacts β the previous version is still there, ready to take traffic. A typical canary flow: Create new version v42 at 0% traffic. Endpoint exists; no production calls reach it. Shift to 10%. Observe for an hour or a day, depending on traffic volume. Shift to 50%, then 100%. Old version stays at 0% but is not deleted. After a stability window (commonly a week), delete the previous version to free quota. Rollback is the reverse: shift weights back to the previous version. It is a control-plane call, not a deploy. The agent's endpoint URL does not change, sessions in flight continue on whichever version they started on, and new sessions land on whatever the weights say. Two consequences worth internalizing: Keep at least the last two known-good versions live. Rollback is only as fast as your ability to flip weights to a version that already exists. Do not skip the canary step under deadline pressure. A 0%β100% cutover gives you the same blast radius as a non-canaried deploy. The platform supports incremental rollout; use it. For a destructive change β a removed protocol, a renamed agent, an env var the previous version cannot tolerate β rollback may not be safe. Forward-fix is the answer. Identify those changes in PR review and require an explicit "rollback path: forward-fix" note in the PR. Handling Model Version Changes A model deployment bump is the highest-blast-radius runtime change you can make to a Hosted Agent: the agent's behaviour on every input can shift. Treat it like a dependency upgrade. Open a PR that changes only the AZURE_AI_MODEL_DEPLOYMENT_NAME (or the model version on the deployment, via Terraform). Build a new image if needed, create a new agent version, run the full eval suite at 0% traffic. Run a larger regression dataset if you have one. Require a human reviewer who is not the PR author. Promote through staging, then canary in prod for at least one business day before shifting full traffic. If the new model is faster or cheaper, the temptation is to skip steps. Don't. A quality regression in prod almost always costs more than a careful upgrade. The Terraform side is small: openai_model_version is a variable on the azurerm_cognitive_deployment . Terraform recreates the deployment if the version changes. The Hosted Agent picks up the new deployment the next time it calls the model β if you kept the deployment name stable, which is your contract with the agent code. If you change the deployment name as well, the agent needs a new version that knows the new name. Observability That Actually Tells You Something The platform injects an Application Insights connection string into every Hosted Agent container as an environment variable. Agents that use the protocol libraries emit OpenTelemetry traces by default. That gives you per-request latency, token counts, tool invocations, and conversation IDs out of the box. That is the floor. Add to it: Custom span attributes on every request. Agent name, agent version ID, image digest (short), model deployment name. Without these, post-incident analysis cannot tell you which version was live when a problem started β especially during a traffic-split rollout where two versions are serving simultaneously. Quality signal capture. Sample a percentage of production conversations into a queue for offline grading. Run the same graders you used in CI against that sample on a schedule. This is your drift detector for response quality. Sandbox right-sizing signals. Hosted Agents bill on the CPU/memory you allocate per session. Oversizing multiplies cost by your concurrency. Track CPU and available memory inside the sandbox and compare against the version's allocation β if peaks stay below ~50%, the next version should drop a tier; if they push above ~70%, raise it. Right-sizing is a per-version decision because versions are immutable. Per-version error and latency. Slice every standard metric by version ID. A canary that looks fine in aggregate can be quietly worse than the previous version on specific request shapes. Cost dimensions. Tag traces with customer_id or tenant_id if you have multi-tenancy. Aggregating session cost by tenant in App Insights is straightforward once the dimension is on the span. Alerts on shape, not just rate. A doubling in average response length or a sudden drop in tool invocation frequency often precedes a quality regression that error-rate alerts will miss entirely. A weekly "agent health" report in your team channel β pulling these App Insights queries together β beats a perfect dashboard nobody opens. A Pragmatic Maturity Path Most teams cannot build the whole loop on day one. A reasonable order: Infrastructure in Terraform. AIServices account, project, model deployment, ACR, App Insights, role assignment so the project MI can pull from ACR. First agent deployed manually with azd . Just to prove the round trip end to end. agent.yaml plus a deploy script that builds, pushes by digest, and creates a version. One environment. Three environments with manual promotion by manifest. A 20-row eval dataset with one grader, run on every PR. Advisory only at first. Eval as a blocking gate. Thresholds tuned from the advisory phase. Canary rollout via traffic split. Versions held live for a stability window before deletion. Production sampling into offline evaluation. Drift detection. Model version upgrade playbook. Documented, exercised once on a low-risk agent. Tested rollback via weight shift. The first time you discover a rollback bug should not be during an incident. Each step is independently useful. Skipping ahead β particularly to step 6 without time in step 5 β produces thresholds that block legitimate changes and erode trust in the pipeline. Where This Is Heading The platform is moving. A few things to watch as you build: Declarative Hosted Agent versions in Terraform. AzureRM coverage of Hosted Agents and agent versions is expanding. Parts of the deploy script will collapse into Terraform as that lands. The script-driven approach in this post is the bridge, not the destination. Continuous evaluation as a first-class platform feature. Sampling production traffic into scheduled evals β what you wire by hand today β is moving into the Foundry control plane. Multi-agent composition over A2A. As the A2A endpoint moves from preview to general availability and more frameworks ship A2A clients, multi-agent workflows become a first-class deployment shape. The DevOps loop extends β version pinning between agents, eval at the workflow level, observability across the agent graph β but the manifest grows accordingly. Toolbox-managed tool surfaces. As more tool integrations move behind the project Toolbox MCP endpoint, the agent image gets smaller and the tool configuration becomes a project-level concern. That changes what belongs in agent.yaml versus what belongs in Terraform. The throughline: the more the platform absorbs, the more your job shifts from wiring plumbing to defining policy. What "good" means for your agent, what the quality floor is, who can approve a model upgrade, how fast you can roll back. Those decisions do not get automated away. The pipeline just makes them executable. Conclusion Terraform provisions the Foundry project, model deployment, ACR, and observability. The DevOps loop on top of it β container builds pinned by digest, immutable agent versions, evaluation as a release gate, manifest-driven promotion across environments, traffic-split canary and rollback, and observability sliced by version β gets Hosted Agents to production and keeps them there. Build it incrementally. Treat the image digest and the version spec as the deploy artifact, not the source branch. Make evaluation a check the pipeline cares about. Use version weights as your rollout and rollback primitive. And design for the day the platform absorbs the next layer of plumbing, so that when it does, your work moves up the stack instead of getting thrown away.283Views0likes0CommentsWhatβs New in Microsoft 365 Copilot | May 2026
Welcome to the May 2026 edition of What's New in Microsoft 365 Copilot! Every month, we highlight new features and enhancements to keep Microsoft 365 admins up to date with Copilot features that help your users be more productive and efficient in the apps they use every day.10KViews7likes3CommentsInfrastructure as Code for AI: Building and Deploying Microsoft Hosted Agents with Terraform
AI agents are no longer experimental. Teams are shipping production-grade agents that retrieve information, call APIs, reason over documents, and orchestrate multi-step workflows at scale. Microsoft Foundry's Hosted Agents service gives you a fully managed runtime for those agents, built on top of the Microsoft Foundry Agent Service, with Microsoft handling the infrastructure, scaling, and runtime lifecycle. The challenge is that provisioning this infrastructure by hand or clicking through the portal, running one-off CLI commands, or relying on undocumented shell scripts, simply does not scale. It introduces configuration drift, makes reproducing environments painful, and creates real governance risk as teams grow. This post walks through how to provision and manage the Azure infrastructure required to run Microsoft Hosted Agents using Terraform. You will leave with working configuration, a clear understanding of the resource model, and practical guidance on where Terraform can take you all the way and where you will need to supplement with the Azure CLI or the Microsoft Foundry Agent Service SDK. What Are Microsoft Hosted Agents? Microsoft Hosted Agents are AI agents deployed and managed within Microsoft Foundry. Microsoft Foundry is Microsoft's unified platform for building, evaluating, and deploying AI applications and agents. It provides: A managed compute runtime β Microsoft provisions and scales the infrastructure so you do not manage VMs or containers. An agent execution environment β agents are defined with instructions, tools (code interpreter, Bing grounding, Azure AI Search, function calling), and a backing model endpoint. Deep Azure integration β identity via Microsoft Entra ID, secrets via Azure Key Vault, storage via Azure Blob, tracing via Azure Monitor and Application Insights. A project-scoped model β each Microsoft Foundry project encapsulates an agent's resources, connections, and deployments within a logical boundary. The "Hosted" distinction matters. You are not running agent code on your own Kubernetes cluster or App Service. Microsoft manages the runtime. Your responsibility is to provision the surrounding infrastructure correctly: the Microsoft Foundry resource, the project, the model deployment, the identity configuration, and the monitoring resources that back it all. That boundary β the infrastructure you own β is exactly what Terraform manages well. Why Terraform for Hosted Agent Deployments? Infrastructure as Code (IaC) is not a new idea, but its importance grows as AI deployments become more complex. Here is why Terraform is a strong choice for Microsoft Foundry deployments specifically: Repeatability: A Terraform configuration produces the same infrastructure every time. Staging mirrors production. Disaster recovery is a terraform apply away. Governance: Infrastructure definitions live in version control alongside application code. Changes are reviewable, auditable, and reversible. This satisfies most enterprise change-management requirements. Scale: Spinning up per-customer or per-team agent environments using Terraform workspaces or module instantiation is far more manageable than manual provisioning. State management: Terraform tracks the actual state of your Azure resources. It detects drift and reconciles it declaratively. Ecosystem: The AzureRM provider is mature, actively maintained by HashiCorp and Microsoft, and covers the majority of Azure services including the Microsoft Foundry resources. Architecture Overview Before writing any Terraform, it helps to understand the resource hierarchy in Microsoft Foundry and how each layer maps to an Azure resource type. The Foundry Resource Hierarchy Microsoft Foundry uses a two-level hierarchy: 1. Foundry Account ( azurerm_cognitive_account , kind: AIServices ) β The top-level AI Services resource. It provides the model endpoint, manages agent execution, and acts as the logical boundary for all projects beneath it. You must set project_management_enabled = true and provide a custom_subdomain_name to enable project creation. In ARM terms this is a Microsoft.CognitiveServices/accounts resource. 2. Foundry Project ( azurerm_cognitive_account_project ) β A child resource scoped within the Foundry Account. Each project has its own agents, model deployments, connections, and data assets. In production, you typically have one project per application, product team, or environment. Figure 1: The Microsoft Foundry resource hierarchy. A single Foundry Account (Cognitive Services, kind AIServices) acts as the top-level container, with Projects scoped beneath it β one per application, team, or environment. Supporting Resources The following Azure resources make up a complete Hosted Agents deployment: Microsoft Foundry Account (AI Services): A single azurerm_cognitive_account of kind AIServices serves as both the Foundry Account and the model endpoint host. Model deployments (e.g. gpt-4.1 ) are provisioned via azurerm_cognitive_deployment within this account. Log Analytics Workspace + Application Insights: Provides observability for agent traces, request logs, and metrics. User-Assigned Managed Identity: Grants the Foundry Account and Projects access to Azure resources without stored credentials. Role Assignments (RBAC): Wires the managed identity to the Foundry Account with least-privilege Cognitive Services permissions. Figure 2: Supporting infrastructure map. The managed identity holds least-privilege RBAC grants to the Microsoft Foundry Account (AI Services) β enabling model access and project management β all within the same resource group. Reference Architecture (Described) A production-ready layout separates concerns across two resource groups: one for shared infrastructure (networking, monitoring) and one for the Microsoft Foundry Account and its projects. The Foundry resource group houses the azurerm_cognitive_account (kind: AIServices) resource and the azurerm_cognitive_account_project instances. The shared resource group holds Log Analytics and Application Insights. A user-assigned managed identity spans both, holding RBAC grants to each backing service. For a dev/test environment you can collapse both into a single resource group. For production, the separation makes cost attribution, access control, and lifecycle management cleaner. Prerequisites Accounts and Permissions An active Azure subscription with the Owner or Contributor + User Access Administrator roles at the subscription or resource group level (role assignments require elevated permission). Foundry access enabled in your subscription. In some tenants you may need to accept terms or request quota for Azure OpenAI. Azure OpenAI quota for the model you intend to deploy (e.g. gpt-4.1 ). Request this via the Azure portal under Quotas in Azure OpenAI Studio. Local Tools Terraform CLI β₯ 1.9 β Install guide Azure CLI β₯ 2.60 β Install guide A code editor (VS Code with the HashiCorp Terraform extension and the Azure Terraform extension is a strong combination). Authentication For local development, authenticate via the Azure CLI. The AzureRM Terraform provider picks this up automatically: az login az account set --subscription "<your-subscription-id>" For CI/CD pipelines, use a service principal with AZURE_CLIENT_ID , AZURE_CLIENT_SECRET , AZURE_TENANT_ID , and AZURE_SUBSCRIPTION_ID environment variables, or β preferably β a workload identity federation (federated credentials) to avoid storing long-lived secrets. GitHub Actions supports OIDC-based workload identity natively. Terraform Fundamentals for Hosted Agents Provider Configuration The hashicorp/azurerm provider is your primary dependency. The new Microsoft Foundry resources ( azurerm_cognitive_account with kind = "AIServices" and azurerm_cognitive_account_project ) require version 4.x of the provider. Pin your version to avoid unexpected breaking changes: terraform { required_version = ">= 1.9" required_providers { azurerm = { source = "hashicorp/azurerm" version = "~> 4.0" } } } provider "azurerm" { features { key_vault { purge_soft_delete_on_destroy = false } resource_group { prevent_deletion_if_contains_resources = true } } subscription_id = var.subscription_id } The features block is required even when empty. The Key Vault setting prevents accidental secret loss during terraform destroy . The resource group setting adds an extra safety net in production. State Management Never use local state for shared or production environments. Store state in Azure Blob Storage with state locking via Azure Blob lease: terraform { backend "azurerm" { resource_group_name = "rg-terraform-state" storage_account_name = "sttfstate<unique>" container_name = "tfstate" key = "ai-agents/prod.tfstate" } } Create the state storage account and container before running terraform init . A bootstrap script or a separate Terraform workspace dedicated to state management are both valid approaches. Known Limitations and Workarounds Terraform coverage of Foundry is improving rapidly but is not yet complete. You should be aware of the following gaps as of mid-2025: Agent definitions are not in Terraform: The actual agent (its system prompt, instructions, tool configuration, and model binding) is created via the Azure AI Agent Service SDK or the Foundry portal, not via Terraform. Terraform provisions the infrastructure; your application code or a post-provisioning script creates the agent. Connections: Some connection types within a Foundry Project (e.g. Azure AI Search, custom connections) may require the Azure CLI or the Foundry SDK. Verify coverage in the AzureRM provider docs before assuming Terraform handles them. Model deployments: azurerm_cognitive_deployment covers OpenAI model deployments and is well-supported. Use this to deploy your model before referencing it from the agent. Private networking: If you need private endpoints for your Foundry Account, additional VNet, subnet, and DNS zone resources are required. This post focuses on the public networking path; private networking is a follow-on topic. Step-by-Step Implementation The following sections build up a complete Terraform configuration. The recommended project structure is a flat module layout for a single environment, with a separate modules/ai-foundry/ directory when you need to reuse the pattern across environments. ai-agents-infra/ βββ main.tf βββ variables.tf βββ outputs.tf βββ versions.tf βββ terraform.tfvars 1. Variables Define variables first. Parameterising from the start avoids hard-coded values that create technical debt when you replicate the configuration for staging or production: # variables.tf variable "subscription_id" { type = string description = "Azure subscription ID." } variable "location" { type = string default = "eastus" description = "Azure region for all resources." } variable "environment" { type = string default = "dev" description = "Environment label (dev, staging, prod)." } variable "project_name" { type = string description = "Short name for the project. Used in resource naming." } variable "openai_model_name" { type = string default = "gpt-4.1" description = "Azure OpenAI model to deploy for the agent." } variable "openai_model_version" { type = string default = "2025-04-14" description = "Model version to deploy." } variable "openai_sku_capacity" { type = number default = 10 description = "Tokens-per-minute capacity (in thousands) for the deployment." } 2. Resource Group and Core Infrastructure A single resource group keeps things simple for dev. In production, consider splitting as described in the architecture section above. # main.tf β Resource group and naming locals locals { name_prefix = "${var.project_name}-${var.environment}" tags = { environment = var.environment project = var.project_name managed_by = "terraform" } } resource "azurerm_resource_group" "main" { name = "rg-${local.name_prefix}" location = var.location tags = local.tags } 3. Supporting Services Provision Log Analytics and Application Insights for agent observability and diagnostics. Unlike the legacy Hub-based architecture, the azurerm_cognitive_account (kind AIServices ) does not require a dedicated Storage Account or Key Vault as provisioning dependencies. # main.tf β Monitoring infrastructure data "azurerm_client_config" "current" {} # Log Analytics Workspace (required by Application Insights) resource "azurerm_log_analytics_workspace" "main" { name = "law-${local.name_prefix}" resource_group_name = azurerm_resource_group.main.name location = azurerm_resource_group.main.location sku = "PerGB2018" retention_in_days = 30 tags = local.tags } # Application Insights for agent observability resource "azurerm_application_insights" "main" { name = "appi-${local.name_prefix}" resource_group_name = azurerm_resource_group.main.name location = azurerm_resource_group.main.location workspace_id = azurerm_log_analytics_workspace.main.id application_type = "web" tags = local.tags } 4. User-Assigned Managed Identity A managed identity allows the Foundry Account and its projects to authenticate to Azure services without stored credentials. This is a security best practice and is required for several Microsoft Foundry features. # main.tf β Managed identity for the Microsoft Foundry Account resource "azurerm_user_assigned_identity" "foundry" { name = "id-${local.name_prefix}-foundry" resource_group_name = azurerm_resource_group.main.name location = azurerm_resource_group.main.location tags = local.tags } 5. Microsoft Foundry Account and Model Deployment In the current Microsoft Foundry architecture, a single azurerm_cognitive_account of kind AIServices serves as both the Foundry Account and the model endpoint host. Set project_management_enabled = true and provide a globally unique custom_subdomain_name to enable Foundry Project creation beneath it. # main.tf β Microsoft Foundry Account (AI Services) resource "azurerm_cognitive_account" "foundry" { name = "aisa-${local.name_prefix}" resource_group_name = azurerm_resource_group.main.name location = azurerm_resource_group.main.location kind = "AIServices" sku_name = "S0" project_management_enabled = true custom_subdomain_name = "${replace(local.name_prefix, "-", "")}foundry" tags = local.tags identity { type = "UserAssigned" identity_ids = [azurerm_user_assigned_identity.foundry.id] } } # Deploy the model within the Foundry Account resource "azurerm_cognitive_deployment" "agent_model" { name = var.openai_model_name cognitive_account_id = azurerm_cognitive_account.foundry.id model { format = "OpenAI" name = var.openai_model_name version = var.openai_model_version } sku { name = "Standard" capacity = var.openai_sku_capacity } } Note on quota: The capacity value is in thousands of tokens per minute. A value of 10 means 10,000 TPM. If terraform apply fails with a quota error, reduce this value or request a quota increase via the Azure portal. Note on custom_subdomain_name : This must be globally unique across all Azure AI Services accounts. If provisioning fails with a conflict error, adjust the suffix (e.g. append a random string using the random_string resource). 6. Foundry Project Create a Foundry Project beneath the Foundry Account provisioned in Step 5. Each project scopes its own agents, model connections, and data assets. Use one project per application or team. # main.tf β Microsoft Foundry Project resource "azurerm_cognitive_account_project" "agent_project" { name = "proj-${local.name_prefix}-agents" cognitive_account_id = azurerm_cognitive_account.foundry.id location = azurerm_resource_group.main.location display_name = "Agent Project - ${var.project_name}" description = "Hosted agents project for ${var.project_name}" identity { type = "UserAssigned" identity_ids = [azurerm_user_assigned_identity.foundry.id] } tags = local.tags } 7. RBAC Role Assignments Grant the managed identity the permissions it needs. This is the area most commonly misconfigured in manual deployments. Terraform makes it explicit and auditable. # main.tf β RBAC assignments # AI Services: Foundry identity needs Cognitive Services OpenAI User to call model endpoints resource "azurerm_role_assignment" "foundry_openai" { scope = azurerm_cognitive_account.foundry.id role_definition_name = "Cognitive Services OpenAI User" principal_id = azurerm_user_assigned_identity.foundry.principal_id } # AI Services: Foundry identity needs Cognitive Services Contributor to manage projects resource "azurerm_role_assignment" "foundry_contributor" { scope = azurerm_cognitive_account.foundry.id role_definition_name = "Cognitive Services Contributor" principal_id = azurerm_user_assigned_identity.foundry.principal_id } # Optional: grant your own principal the Azure AI Developer role on the Foundry Account # so you can create and manage agents from your local machine or CI pipeline resource "azurerm_role_assignment" "developer_account" { scope = azurerm_cognitive_account.foundry.id role_definition_name = "Azure AI Developer" principal_id = data.azurerm_client_config.current.object_id } 8. Outputs Export the values your application and post-provisioning scripts will need: # outputs.tf output "resource_group_name" { value = azurerm_resource_group.main.name } output "foundry_account_id" { value = azurerm_cognitive_account.foundry.id } output "ai_foundry_project_id" { value = azurerm_cognitive_account_project.agent_project.id } output "foundry_endpoint" { value = azurerm_cognitive_account.foundry.endpoint } output "openai_deployment_name" { value = azurerm_cognitive_deployment.agent_model.name } output "managed_identity_client_id" { value = azurerm_user_assigned_identity.foundry.client_id } 10. Example terraform.tfvars # terraform.tfvars β do NOT commit this file if it contains sensitive values subscription_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" location = "eastus" environment = "dev" project_name = "contoso-agents" openai_model_name = "gpt-4.1" openai_model_version = "2025-04-14" openai_sku_capacity = 10 Figure 3: Terraform deployment workflow. State is stored in an Azure Blob Storage backend, enabling team collaboration and preventing concurrent apply conflicts. Deploying and Validating the Agent Infrastructure Running the Deployment # 1. Initialise β downloads provider plugins and configures the backend terraform init # 2. Validate syntax and configuration terraform validate # 3. Preview what will be created (review carefully before applying) terraform plan -out=tfplan # 4. Apply the plan terraform apply tfplan A full initial apply typically takes 8β15 minutes. The Foundry Account (AI Services) provisioning is the longest step. The model deployment may also take a few minutes to reach a ready state β Terraform handles this with implicit dependency ordering, but you may see brief retries in the output. Verifying the Deployment After apply completes, verify each resource is in a healthy state: # Confirm the resource group and its resources exist az resource list --resource-group "rg-contoso-agents-dev" --output table # Check the Foundry Account (AI Services) is in a Succeeded state az cognitiveservices account show \ --name "aisacontosoagentsdevfoundry" \ --resource-group "rg-contoso-agents-dev" \ --query "properties.provisioningState" # Confirm the model deployment is ready az cognitiveservices account deployment show \ --resource-group "rg-contoso-agents-dev" \ --name "aisacontosoagentsdevfoundry" \ --deployment-name "gpt-4.1" \ --query "properties.provisioningState" Navigate to the Microsoft Foundry portal and confirm your Foundry Account and Project appear. At this point you can create an agent manually in the portal to validate that the model endpoint is reachable and the identity chain works correctly before automating agent creation. Common Deployment Issues Quota exceeded on model deployment: Reduce openai_sku_capacity or request a quota increase in the Azure portal under Azure OpenAI β Quotas. Resource name conflicts: The custom_subdomain_name on the Foundry Account must be globally unique. Use the random_string Terraform resource to append a unique suffix if needed. Role assignment propagation delay: RBAC changes can take 1β2 minutes to propagate. If the Foundry Account cannot access resources immediately after apply, wait a moment and retry. project_management_enabled not set: If azurerm_cognitive_account_project fails with an error about project management, ensure project_management_enabled = true and custom_subdomain_name are set on the parent azurerm_cognitive_account . azurerm_cognitive_account_project not found: Ensure your AzureRM provider version is ~> 4.0 or later. Run terraform init -upgrade if you previously initialised with an older version. Creating an Agent After Infrastructure Provisioning Terraform has provisioned the platform. Now you need to create the agent itself. This is done via the Azure AI Agents SDK (available for Python, C#, JavaScript, and Java) or the Foundry portal. The following Python snippet demonstrates creating a basic agent programmatically after Terraform apply. It uses the outputs from Terraform directly: import os from azure.ai.projects import AIProjectClient from azure.identity import DefaultAzureCredential # These values come from Terraform outputs project_connection_string = os.environ["AI_PROJECT_CONNECTION_STRING"] model_deployment = os.environ["OPENAI_DEPLOYMENT_NAME"] client = AIProjectClient.from_connection_string( credential=DefaultAzureCredential(), conn_str=project_connection_string, ) # Create the hosted agent agent = client.agents.create_agent( model=model_deployment, name="customer-support-agent", instructions=( "You are a helpful customer support assistant. " "Answer questions accurately and concisely. " "If you are unsure, say so rather than guessing." ), ) print(f"Agent created: {agent.id}") Figure 5: Agent runtime architecture. The Foundry Project hosts the Agent Service, which routes requests to the GPT-4.1 model endpoint and optionally invokes tool integrations (Code Interpreter, File Search, Azure Functions, or custom tools). The project connection string is available from the Foundry portal (Project β Overview β Project connection string) or can be constructed from Terraform outputs. Refer to the Azure AI Agents quickstart for the full SDK setup. Operational Considerations Lifecycle Management Terraform's declarative model means updates are incremental by default. To update the OpenAI model version, change openai_model_version in your .tfvars file and run terraform plan to confirm the change before applying. Terraform will delete and recreate the cognitive deployment in-place β be aware this causes brief downtime for the model endpoint. To destroy a complete environment: terraform destroy The prevent_deletion_if_contains_resources feature on the resource group will block destruction if any untracked resources exist, which is a useful safety net in production. Handling Configuration Drift Drift occurs when Azure resources are modified outside of Terraform (portal changes, CLI scripts, other automation). Detect drift with: terraform plan -refresh-only This reports the difference between the Terraform state and the actual resource state without making changes. Schedule this as a drift-detection job in CI to catch out-of-band changes early. Environment Isolation Use Terraform workspaces or separate state files per environment: # Create and switch to a staging workspace terraform workspace new staging terraform workspace select staging terraform apply -var-file="environments/staging.tfvars" Alternatively, use a directory-per-environment layout ( environments/dev/ , environments/prod/ ) with a shared module in modules/ai-foundry/ . The directory layout is more explicit and easier to navigate in a team setting. Cost Control Set a low openai_sku_capacity in dev (e.g. 1 = 1,000 TPM) to limit accidental spend. Tag all resources with environment and project tags (the locals.tags block handles this) to enable cost attribution in Azure Cost Management. Use the Azure Pricing Calculator to estimate monthly costs before deploying to production. The Azure AI Services account (model token usage), Log Analytics, and Application Insights are the primary cost drivers. Consider destroying dev environments overnight using a scheduled CI job that runs terraform destroy and terraform apply on a schedule. CI/CD Integration Automating Terraform via GitHub Actions is straightforward. The following workflow runs plan on pull requests and apply on merge to the main branch: # .github/workflows/terraform.yml name: Terraform Deploy on: push: branches: [main] pull_request: branches: [main] permissions: id-token: write # Required for OIDC workload identity federation contents: read pull-requests: write env: ARM_CLIENT_ID: ${{ secrets.AZURE_CLIENT_ID }} ARM_TENANT_ID: ${{ secrets.AZURE_TENANT_ID }} ARM_SUBSCRIPTION_ID: ${{ secrets.AZURE_SUBSCRIPTION_ID }} ARM_USE_OIDC: "true" jobs: terraform: runs-on: ubuntu-latest environment: ${{ github.ref == 'refs/heads/main' && 'production' || 'staging' }} steps: - uses: actions/checkout@v4 - uses: hashicorp/setup-terraform@v3 with: terraform_version: "~1.9" - name: Terraform Init run: terraform init - name: Terraform Plan run: terraform plan -out=tfplan -var-file="environments/dev.tfvars" - name: Terraform Apply if: github.ref == 'refs/heads/main' run: terraform apply -auto-approve tfplan Figure 4: CI/CD pipeline using GitHub Actions with OIDC workload identity federation. No long-lived secrets are stored β the runner exchanges a JWT for a short-lived Azure token before each Terraform run. Use OIDC workload identity federation to avoid storing long-lived service principal secrets in GitHub. This is the recommended authentication method for GitHub Actions deployments to Azure. Best Practices Modular Terraform Design Once you have a working flat configuration, extract the Foundry resources into a reusable module. A module boundary around the Hub, Project, OpenAI account, and RBAC assignments lets you stamp out new agent environments with a single module call and a new .tfvars file. # environments/staging/main.tf module "agent_platform" { source = "../../modules/ai-foundry" project_name = "contoso-agents" environment = "staging" location = "eastus" subscription_id = var.subscription_id openai_model_name = "gpt-4.1" openai_model_version = "2025-04-14" openai_sku_capacity = 30 } Parameterisation and Environment Configs Never hard-code subscription IDs, tenant IDs, or region names in main.tf . Keep environment-specific values in environments/<env>.tfvars files and commit them to source control (they are config, not secrets). Store actual secrets (service principal credentials, API keys for third-party connections) in Azure Key Vault or GitHub Secrets β not in .tfvars files. Versioning Models and Agent Configurations Treat your openai_model_version and agent instructions as versioned artefacts. When Microsoft releases a new model version, create a pull request that updates the variable value, runs a plan, and documents the expected change. This creates a clear history of when model versions changed and who approved the change. Logging and Monitoring Enable diagnostic settings on the Azure OpenAI account to route request logs and metrics to your Log Analytics workspace. Use Application Insights to capture agent traces from the Azure AI Agents SDK (it integrates with OpenTelemetry). Set up Azure Monitor alerts on OpenAI account errors (4xx/5xx rates) and Log Analytics ingestion failures. Responsible AI Considerations Enable Azure OpenAI content filtering on your deployment. Terraform supports this via the content_filter block in azurerm_cognitive_deployment where the policy allows. Define a clear system prompt that sets agent behaviour boundaries and instructs the agent to decline harmful requests. Log and review agent conversations during early deployment. Microsoft Foundry includes evaluation tools for assessing agent response quality and safety. Apply least-privilege RBAC throughout β the role assignments in this post follow that principle. Conclusion and Next Steps You now have a complete, repeatable Terraform configuration for provisioning the Azure infrastructure required to run Microsoft Hosted Agents via Microsoft Foundry. The key takeaways: Terraform manages the infrastructure layer effectively β the Foundry Account, Project, model deployment, identity, and RBAC. Agent definitions themselves are provisioned via the Azure AI Agents SDK or the Foundry portal as a post-Terraform step. State management, parameterisation, and modular design are non-negotiable for team environments. OIDC-based workload identity is the right authentication model for CI/CD pipelines. Drift detection, environment isolation, and cost tagging are operational necessities, not optional extras. Where to Go Next Add Azure AI Search: Extend the Foundry Project with an Azure AI Search connection and enable the Search tool on your agent for Retrieval-Augmented Generation (RAG). Private networking: Add private endpoints for the Foundry Hub and OpenAI account to lock down ingress to your VNet. Multi-region deployment: Instantiate the Terraform module twice with different regions and use Azure Traffic Manager or Front Door to route requests. GitOps for agents: Store agent definitions (system prompts, tool configurations) as YAML or JSON in your repository and use a CI pipeline to apply them via the Azure AI Agents SDK on every merge, creating a fully declarative agent deployment pipeline. Evaluation pipelines: Use Microsoft Foundry's built-in evaluation capabilities to run automated quality and safety assessments on every new model version or prompt change. References What is Microsoft Foundry? β Microsoft Learn Azure AI Agent Service overview β Microsoft Learn Azure AI Agents quickstart β Microsoft Learn azurerm_cognitive_account β Terraform Registry azurerm_cognitive_account_project β Terraform Registry azurerm_cognitive_deployment β Terraform Registry AzureRM backend β Terraform documentation OIDC workload identity federation with GitHub Actions β Microsoft Learn Azure OpenAI content filtering β Microsoft Learn Install Terraform β HashiCorp Microsoft Foundry portal