hosted agents

15 Topics

Hosted Agent ZIP deployment fails because adduser is missing for fuse-zip
Hi, I am encountering a reproducible issue with a Microsoft Foundry Hosted Agent deployed through the Source Code ZIP deployment path. Environment: - Region: West Europe - Runtime: dotnet_10 - Target framework: net10.0 - Publish runtime: linux-x64 - Dependency resolution: bundled - Protocol: Responses 2.0.0 - API version: 2025-11-15-preview The agent version reaches the "active" status. However, the first invocation fails with: session_not_ready The relevant session logs are: Selecting previously unselected package fuse. Unpacking fuse (2.9.9-5ubuntu3) ... Selecting previously unselected package fuse-zip. Unpacking fuse-zip (0.6.0-0ubuntu3) ... dpkg: dependency problems prevent configuration of fuse: fuse depends on adduser; however: Package adduser is not installed. dpkg: error processing package fuse (--install): dependency problems - leaving unconfigured dpkg: dependency problems prevent configuration of fuse-zip: fuse-zip depends on fuse; however: Package fuse is not configured yet. dpkg: error processing package fuse-zip (--install): dependency problems - leaving unconfigured Errors were encountered while processing: fuse fuse-zip Successfully connected to container No .NET application startup logs appear after this. The application entry point does not appear to be executed, and /readiness never becomes healthy. I have already verified: A clean linux-x64 publish is used. The entry DLL is located directly at the ZIP root. The ZIP contains no nested bin, obj, or artifacts directories. All dependencies are bundled. appsettings.json, .deps.json, and .runtimeconfig.json are present. The exact published application starts successfully locally. The local /readiness endpoint returns HTTP 200. The issue is reproducible with newly deployed agent versions and new sessions. The current Foundry hosting packages and Responses Protocol 2.0 are used. The same Source Code ZIP deployment path worked successfully approximately two weeks ago. Is this a known regression in the managed source-code runtime in West Europe? Is there a workaround other than switching to the container-based deployment path? Session and request IDs can be provided privately to the Microsoft product team if required.
TristanRe
Jul 20, 2026 Place Microsoft Foundry Discussions
16Views
0likes
0Comments
Building AI Agents from Zero to Production
Building AI Agents from Zero to Production Most agent demos stop at "it answered my question." Production doesn't. The gap between a notebook that calls an LLM and a governed, observable, multi-agent system your organisation can actually depend on is where real engineering happens, evaluation, deployment, data sovereignty, tool governance, and cross-team interoperability. Microsoft's open-source course Building AI Agents from Zero to Production walks that entire arc in seven lessons, using one realistic use case and the Microsoft Agent Framework (MAF) plus Microsoft Foundry. This post is a developer-focused tour of what it teaches, the architecture decisions behind each stage, and the code patterns that matter when you move from prototype to production. Who this is for AI engineers building their first or first production, agent system. Backend and full-stack developers integrating agents into real applications and CI/CD. Cloud architects who need data sovereignty, private networking, and governance around agent workloads. Technical leads deciding how to standardise tools and orchestration across multiple teams. The samples are Python 3.12+, served through Microsoft Foundry using GPT-5 series models (for example gpt-5.1 ). Lesson 4 adds a TypeScript/React frontend. You will want an Azure subscription and the Azure CLI. The AI Agent Development Lifecycle The course is organised around a lifecycle rather than a feature list. Each lesson is a stage, and each stage assumes the previous one is solved: # Stage The production question it answers 1 Agent Design What should each agent do, and how do they hand off? 2 Agent Development How do I build and run them with the Agent Framework? 3 Agent Evaluations How do I know they actually work — and keep working? 4 Agent Deployment How do I ship one as a hosted service with a UI and CI gate? 5 Production Hosted Agents How do I meet enterprise data, network, and governance needs? 6 Microsoft Toolbox How do I govern tools once, and reuse them across teams? 7 Multi-Agent & A2A How do agents from different teams interoperate safely? The thread running through all seven is a single scenario: a Developer Onboarding agent system that helps a new hire find the right teammates, get a sensible first task, and pull learning resources and code snippets. It is deliberately mundane, which is exactly why it exposes the production concerns that flashy demos hide. Lesson 1 — Agent Design: three components, one graph The course defines an agent by three parts: an LLM for reasoning, tools to act, and memory to retain context. The design work is context engineering — making sure the right information reaches the model at the right moment, no more and no less. Rather than one monolithic assistant, the onboarding system is split into specialists coordinated by a triage agent using handoff orchestration: Agent Job Tool Employee Search Answer org and people questions Foundry file search over an employee-directory vector store Task Recommendation Suggest 1–3 GitHub issues for the new dev GitHub MCP Server (reads recent commits + open issues) Code Assistant Provide resources and runnable snippets Microsoft Learn MCP + Code Interpreter Architecturally this is a directed graph: User → Triage → [Employee, Learning, Coding] . Splitting responsibilities early pays off later, each agent gets a tightly scoped prompt (less hallucination), can be evaluated independently, and can be upgraded without touching its peers. Lesson 2 — Development: standalone agents with MAF Here the design becomes code. Each specialist is a small, independently runnable service built with the Microsoft Agent Framework, authenticated to Foundry with your Azure CLI login. Setup is deliberately boring: az login az account set --subscription "<your-subscription-id>" cp .env.example .env # Fill FOUNDRY_PROJECT_ENDPOINT and FOUNDRY_MODEL (e.g. gpt-5.1) # Create the employee-directory vector store once; note the printed VECTOR_STORE_ID python lesson-2-agent-development/setup_vector_store.py # Start an agent — serves on http://localhost:8090 python lesson-2-agent-development/employee-search-agent.py The FoundryChatClient auto-reads any FOUNDRY_ -prefixed environment variables and uses AzureCliCredential , so there are no keys in code. The lesson ships six samples, each on its own port, so you can chat with them individually in the local DevUI before wiring them together: Sample Tool Port employee-search-agent.py Foundry file search / vector store 8090 task-recommendation-agent.py GitHub MCP Server 8095 azure-learning-agent.py Microsoft Learn MCP 8092 coding-agent.py Code Interpreter 8093 learning-recommendation-agent.py Learn MCP + reasoning 8091 agent-orchestration.py Multi-agent handoff 8094 Why this matters: keeping each agent as its own process with its own port is a testability decision, not an accident. You can smoke-test one specialist in isolation, then compose them in agent-orchestration.py . Lesson 3 — Evaluation: you can't unit-test a probability distribution This is the lesson that separates a demo from a product. Agents are non-deterministic, so traditional assertions don't fit. The course uses three complementary layers: Observability / tracing — always on, via OpenTelemetry to Application Insights. Smoke tests — fast, run on every deploy. Evaluations — deeper, model-based scoring run on-demand or nightly. Turning on tracing is a single call: from agent_framework.foundry import FoundryChatClient client = FoundryChatClient() client.configure_azure_monitor() # export traces + metrics to Application Insights For quality it uses Foundry's built-in "LLM-as-a-judge" evaluators against real persisted responses (identified by response_id ), not freshly regenerated ones: Evaluator evaluator_name Measures Relevance builtin.relevance Does the response address the request? Groundedness builtin.groundedness Is it supported by retrieved data (no hallucination)? Tool-call accuracy builtin.tool_call_accuracy Were the right tools called with the right arguments? Tool-output utilization builtin.tool_output_utilization Did the agent actually use tool results? The judge model is set independently via AZURE_AI_MODEL_DEPLOYMENT_NAME , so you can evaluate a cheap production model with a stronger one. The run prints a report_url that deep-links into the Foundry portal. Lesson 4 — Deployment: a hosted agent, a UI, and a CI gate Now the agent becomes a managed service. It is deployed as a Foundry Hosted Agent a Microsoft-managed execution environment and fronted by an OpenAI ChatKit React UI talking to a FastAPI backend: ChatKit React (3000) → FastAPI backend (8001) → Foundry Hosted Agent → tools Building the agent is declarative attach tools, name it, serve it: agent = client.as_agent( name="DevOnboardingAgent", instructions="...", tools=[file_search_tool, learn_mcp_tool], ) # served with: from_agent_framework(agent).run() The recommended deploy path is the Azure Developer CLI: cd hosted-agent azd auth login azd agent deploy The genuinely production-minded part is the smoke test as a post-deploy CI gate. Six cases cover reachability, each scenario, off-topic prompt adherence, and multi-turn threading (verifying state via previous_response_id ). The GitHub Action runs them against the freshly deployed agent: export FOUNDRY_TOKEN=$(az account get-access-token \ --resource https://ai.azure.com/ --query accessToken -o tsv) python runner.py \ --project-endpoint "https://<account>.services.ai.azure.com/api/projects/<project>" \ --agent-name dev-onboarding \ --tests-file tests/smoke-tests.json Pitfall to remember: the token audience must be https://ai.azure.com/ . A cognitiveservices.azure.com token is rejected by the Responses API — a mistake that costs many engineers an afternoon. Lesson 5 — Production: separating where an agent runs from where its data lives The pivotal concept for enterprise readiness is the distinction between a Hosted Agent (compute, scaling, identity) and a Capability Host (where conversation history, files, and embeddings actually reside): Concern Hosted Agent Capability Host Compute / scaling / identity ✅ Provided — Conversation history Microsoft-managed default Redirect to your Azure Cosmos DB File uploads Microsoft-managed default Redirect to your Azure Storage Vector embeddings Microsoft-managed default Redirect to your Azure AI Search Required to run the agent? ✅ Yes ❌ Optional Required for data sovereignty? ❌ Not sufficient ✅ Yes "Basic" setup uses Microsoft-managed storage and is perfect for getting started. "Standard" setup redirects each data plane to your own Azure resources through a project-level capability host, this is how you keep customer data in your tenant, inside your network boundary: PUT .../accounts/{account}/projects/{project}/capabilityHosts/{name}?api-version=2025-06-01 { "properties": { "capabilityHostKind": "Agents", "threadStorageConnections": ["my-cosmosdb-connection"], "vectorStoreConnections": ["my-ai-search-connection"], "storageConnections": ["my-storage-connection"] } } Operational constraints worth internalising before you provision: there is one capability host per scope (a second attempt returns 409 Conflict ), configuration is immutable (delete and recreate to change it), deletion is destructive, and the account-level host must exist before the project-level one. Lesson 6 — Toolbox: govern tools once, reuse everywhere Left unchecked, every team re-implements the same tools, scatters credentials, and loses governance visibility. The Microsoft Foundry Toolbox solves this by exposing a curated, versioned set of tools behind a single MCP-compatible endpoint, with credentials held in Foundry connections rather than agent code. You build a toolbox version once: from azure.ai.projects.models import MCPTool, ToolboxSearchPreviewTool, WebSearchTool toolbox_version = project.toolboxes.create_toolbox_version( name="agent-tools", description="Web search + an MCP server + tool search", tools=[ WebSearchTool(), MCPTool( server_label="myserver", server_url="https://your-mcp-server.example.com", require_approval="never", project_connection_id="my-key-auth-connection", # credentials live in Foundry ), ToolboxSearchPreviewTool(), ], ) And every agent consumes it through one endpoint, no per-team tool code: from agent_framework import MCPStreamableHTTPTool mcp_tool = MCPStreamableHTTPTool( name="toolbox", url=TOOLBOX_ENDPOINT, # {project_endpoint}/toolboxes/{name}/mcp?api-version=v1 http_client=http_client, load_prompts=False, ) agent = chat_client.as_agent(name="my-toolbox-agent", instructions="...", tools=[mcp_tool]) Versioning is blue/green: create a new version, test it on its version-specific endpoint, then promote it to default and every consumer picks it up with zero code changes. A Guardrail (RAI) policy can be applied at the toolbox layer, independent of model-level content filters. Note the toolbox management APIs are currently preview; the portal or VS Code Foundry Toolkit are practical alternatives for creation today. Lesson 7 — Multi-Agent & A2A: agents as networked peers The final lesson contrasts two ways agents collaborate: Handoff / Workflow — in-process, same codebase, fastest, tightest coupling. Agent-to-Agent (A2A) — cross-process over an open protocol, so agents from different teams, orgs, or frameworks interoperate. A2A gives each agent a discoverable Agent Card at /.well-known/agent-card.json and a task lifecycle (submitted → working → completed/failed). The elegant part: A2AExecutor wraps an existing MAF agent with no changes to that agent's code. from agent_framework.a2a import A2AExecutor from a2a.server.apps import A2AStarletteApplication from a2a.server.tasks import InMemoryTaskStore agent_card = AgentCard( name="Coding Assistant", url="http://localhost:9000/", version="1.0.0", capabilities=AgentCapabilities(streaming=True), skills=[AgentSkill(id="generate-code", name="Generate code", tags=["code"])], ) request_handler = DefaultRequestHandler( agent_executor=A2AExecutor(agent), # wraps your existing MAF agent unchanged task_store=InMemoryTaskStore(), ) app = A2AStarletteApplication(agent_card=agent_card, http_handler=request_handler).build() Consuming a remote agent then looks exactly like calling a local one: from agent_framework.a2a import A2AAgent remote_agent = A2AAgent(name="remote-coding-assistant", url="http://localhost:9000") result = await remote_agent.run("Write a Python function that reverses a string.") Because an A2AAgent can be a participant inside a HandoffBuilder workflow, you can mix in-process routing with remote services in the same orchestration. For enterprise use, A2AAgent accepts an auth_interceptor for bearer tokens, and the Agent Card carries security_schemes . Responsible and secure by design Production readiness in this course is not just uptime, it is governance: Identity over keys — AzureCliCredential and managed identity throughout; no secrets in code. Least privilege — CI runners get a scoped Azure AI User role assignment on the specific project. Data sovereignty — capability hosts keep conversation history, files, and embeddings in your own Cosmos DB, Storage, and AI Search. Tool approval and guardrails — MCP approval_mode and toolbox-level RAI policy gate what agents can do. Grounded evaluation — groundedness and tool-utilization scoring catch hallucination and unused-tool behaviour before users do. Cost hygiene — the lessons create real Azure resources; delete the resource group when done: az group delete --name <rg> --yes --no-wait . Key takeaways Design as a graph of specialists. Handoff orchestration with tightly scoped agents beats one monolith on reliability and testability. One .run() contract, many backends. The Agent Framework keeps orchestration code stable from local dev to hosted production. Evaluate continuously. Tracing + smoke tests + model-based evaluators are three layers, not alternatives. Separate compute from data. Hosted Agents run the agent; Capability Hosts give you sovereignty — you need both for enterprise. Govern tools centrally. A versioned toolbox behind one MCP endpoint kills tool sprawl and credential duplication. Open protocols for interop. A2A lets agents cross team, org, and framework boundaries without rewrites. Get started Clone the repo (skip the 50+ translations for a faster download) and work through the lessons in order: git clone --filter=blob:none --sparse https://github.com/microsoft/Building-AI-Agents-From-Zero-To-Production.git cd Building-AI-Agents-From-Zero-To-Production git sparse-checkout set --no-cone '/*' '!translations' '!translated_images' References Building AI Agents from Zero to Production — course repo Microsoft Agent Framework Microsoft Foundry documentation Agent-to-Agent (A2A) protocol specification a2a-python SDK AI Agents for Beginners MCP for Beginners Microsoft Foundry Discord
Lee_Stott
Jul 14, 2026 Place Microsoft Developer Community Blog
558Views
0likes
0Comments
Deploying Foundry Hosted Agents from Source Code
Introduction At Microsoft Build, it was announced that Foundry Hosted Agents now support source-code deployments. Previously, Hosted Agents required application code to be packaged in a container for deployment. This new functionality allows you to deploy the agent from a `.zip` file instead of from a container image. This post walks through the process of deploying a source-code Hosted Agent, briefly compares that approach to container-based Hosted Agent deployment, and provides a reusable GitHub Action for CI/CD deployments. It is part of a series of post whose source code is housed in simple-hosted-agent-responses repository. If Hosted Agents are new to you, read the previous posts, "Deploying Foundry Hosted Agents via REST API" and "GitHub Actions for Deploying Hosted Agents." Background A Foundry Hosted Agent helps abstract the management of the compute tier for your agent. It runs in a self-contained Micro-VM sandbox, meaning the Hosted Agent sandbox provides the CPU and memory allocation used to run your agent. Previously, this Micro-VM would download your code from an Azure Container Registry (ACR) and run it on the virtualized platform. Not all customers use container-based workloads today and, let's face it, not everything needs to be a container. So how do those customers and platforms take advantage of Foundry Hosted Agents? The answer is through source-code deployments of Foundry Hosted Agents. What is a Source Code Agent? Source Code Agents are like other Foundry Hosted Agents. The key deployment difference is that the code asset is a .zip file instead of a container image. This also changes the Agent Development Lifecycle compared with the containerized version of Foundry Hosted Agents. An important point of clarity: the way the agent is configured is a data plane operation. As such, taking advantage of Source Code Agent functionality does not require changes to the Foundry infrastructure itself when your Infrastructure as Code (IaC) is only provisioning the supporting resources in Bicep, Terraform, or PowerShell. The deployment change happens through the Foundry data plane. First, let's look at a container-based Foundry Hosted Agent: Now, let's compare it to the source-code version: Deployment Process Now that we've looked at the end result, let's talk through the steps required to deploy a Foundry Hosted Agent via source code. So in Foundry, what does the difference between a container-based and a source-code-based Foundry Hosted Agent look like? The Microsoft Learn docs outline this well: Every source-code deployment follows the same sequence: package -> create or update -> poll until active -> invoke. The source-code path uses `code_configuration` in the agent definition; the image-based path uses `container_configuration` instead--the two are mutually exclusive on a single version. If wanting to confirm and see in more detail one can refer to the Foundry Agent REST API documentation. The source layout can stay familiar, but the deployed artifact changes to a `.zip` file. Packaging the source code into a ZIP is the piece that differs from the container-image flow. The agent deployment to Foundry is also slightly different because it uses source-code configuration instead of container configuration. You can run this via `azd` with a command structured like the following: azd ai agent init --no-prompt --project-id "<project-resource-id>" --deploy-mode code --runtime python_3_13 --entry-point main.py This assumes `azd` is installed and authenticated, and that the authenticated identity has access to the Foundry project. The command initializes a code deployment for the project. However, we recognize that the majority of enterprise organizations will want to use other deployment methods. As such, REST API deployments are supported, as are the Python and C# SDKs for creating the agent. Taking this a step further, and similar to "GitHub Actions for Deploying Hosted Agents," let's create a reusable GitHub Action for deploying source-code-based Hosted Agents. GitHub Action If you are wanting to see the entire action it is part of the repository simple-hosted-agent-responses, which contains source code, IaC, and deployment options. Background First, we need to understand that we cannot reuse the GitHub Action from "GitHub Actions for Deploying Hosted Agents" because, as noted above, the REST API uses mutually exclusive options. In theory, we could add conditional logic across the parameters; however, it is cleaner to create a separate action. Before invoking this action, the workflow must authenticate to Azure because the action calls `az account get-access-token` to acquire a token for the Foundry data plane. Inputs inputs: project_endpoint: description: Foundry project endpoint URL required: true agent_name: description: Name of the hosted agent required: true source_code_zip: description: Path to the local source-code zip artifact required: true model_deployment_name: description: Name of the AI model deployment required: true cpu: description: CPU allocation for the hosted agent container required: false default: '0.25' memory: description: Memory allocation for the hosted agent container required: false default: '0.5Gi' runtime: description: Source-code runtime for the hosted agent required: false default: 'python_3_13' entry_point: description: Source-code entry point command for the hosted agent required: false default: '["python", "main.py"]' dependency_resolution: description: How Agent Service resolves dependencies for the source-code deployment required: false default: 'remote_build' max_polling_seconds: description: Maximum time to wait for the source-code deployment to reach active status required: false default: '600' For our inputs, `project_endpoint`, `agent_name`, `source_code_zip`, and `model_deployment_name` are required. The CPU, memory, runtime, entry point, dependency resolution, and max polling values are configurable properties with defaults set in the action. The source-code-specific inputs populate the `code_configuration` properties of the REST payload. These include `source_code_zip`, `runtime`, `entry_point`, and `dependency_resolution`. This information tells Foundry how to run the code from the `.zip` package. Outputs We should output values that make sense for downstream workflows. Every workflow may not use them, but it is useful to expose non-secret values when they can support later steps. In this case, we are creating a new version of the agent, so let's output that version ID. outputs: agent_version: description: Version ID returned by the Foundry data plane value: ${{ steps.post.outputs.agent_version }} Action The action maps the inputs to environment variables as the first step. After that, it gets an access token from Azure and calls the REST API endpoint. Once we have this, we prepare the body of the call. Verify against the API for all valid properties. For this example, I chose not to set `rai_config` and `tools` to keep things simple. runs: using: composite steps: - name: Create source-code metadata id: metadata shell: bash env: AGENT_NAME: ${{ inputs.agent_name }} MODEL_DEPLOYMENT_NAME: ${{ inputs.model_deployment_name }} CPU: ${{ inputs.cpu }} MEMORY: ${{ inputs.memory }} RUNTIME: ${{ inputs.runtime }} ENTRY_POINT: ${{ inputs.entry_point }} DEPENDENCY_RESOLUTION: ${{ inputs.dependency_resolution }} run: | METADATA_FILE=$(mktemp) ENTRY_POINT_JSON=$(python3 -c 'import json,sys; print(json.dumps(json.loads(sys.argv[1])))' "$ENTRY_POINT") jq -n \ --arg model "$MODEL_DEPLOYMENT_NAME" \ --arg cpu "$CPU" \ --arg memory "$MEMORY" \ --arg runtime "$RUNTIME" \ --arg dep_resolution "$DEPENDENCY_RESOLUTION" \ --argjson entry_point "$ENTRY_POINT_JSON" \ '{ description: "Hosted agent deployed from source code", definition: { kind: "hosted", protocol_versions: [{protocol: "responses", version: "1.0.0"}], cpu: $cpu, memory: $memory, code_configuration: { runtime: $runtime, entry_point: $entry_point, dependency_resolution: $dep_resolution }, environment_variables: {AZURE_AI_MODEL_DEPLOYMENT_NAME: $model} } }' > "$METADATA_FILE" echo "metadata_file=${METADATA_FILE}" >> "$GITHUB_OUTPUT" echo "Metadata file created at ${METADATA_FILE}" - name: Post source-code agent deployment to Foundry data plane id: post shell: bash env: PROJECT_ENDPOINT: ${{ inputs.project_endpoint }} AGENT_NAME: ${{ inputs.agent_name }} SOURCE_CODE_ZIP: ${{ inputs.source_code_zip }} METADATA_FILE: ${{ steps.metadata.outputs.metadata_file }} MAX_POLLING_SECONDS: ${{ inputs.max_polling_seconds }} run: | if [[ ! -f "$SOURCE_CODE_ZIP" ]]; then echo "Error: Source code zip not found at ${SOURCE_CODE_ZIP}" exit 1 fi CODE_ZIP_SHA256=$(sha256sum "$SOURCE_CODE_ZIP" | awk '{print $1}') echo "Source code SHA256: ${CODE_ZIP_SHA256}" FOUNDRY_TOKEN=$(az account get-access-token \ --resource "https://ai.azure.com/" \ --query accessToken -o tsv) # POST /agents/{name}/versions auto-creates the agent if it doesn't # exist and adds a new version if it does, so a single call covers # both first-deploy and update scenarios (matches update-agent). HTTP_STATUS=$(curl -s -o /tmp/source_code_response.json \ -w "%{http_code}" \ -X POST \ "${PROJECT_ENDPOINT}/agents/${AGENT_NAME}/versions?api-version=2025-11-15-preview" \ -H "Authorization: Bearer ${FOUNDRY_TOKEN}" \ -H "Accept: application/json" \ -H "Foundry-Features: CodeAgents=V1Preview,HostedAgents=V1Preview" \ -H "x-ms-agent-name: ${AGENT_NAME}" \ -H "x-ms-code-zip-sha256: ${CODE_ZIP_SHA256}" \ -F "metadata=@${METADATA_FILE};type=application/json" \ -F "code=@${SOURCE_CODE_ZIP};type=application/zip;filename=${AGENT_NAME}.zip") echo "HTTP ${HTTP_STATUS}: $(cat /tmp/source_code_response.json)" if [[ "$HTTP_STATUS" -lt 200 || "$HTTP_STATUS" -ge 300 ]]; then echo "Error: Foundry data plane returned HTTP ${HTTP_STATUS}" exit 1 fi RESPONSE=$(cat /tmp/source_code_response.json) AGENT_VERSION=$(echo "$RESPONSE" | python3 -c 'import sys,json; print(json.load(sys.stdin)["version"])') echo "agent_version=${AGENT_VERSION}" >> "$GITHUB_OUTPUT" echo "Agent version resolved as ${AGENT_VERSION}" START_TIME=$(date +%s) while true; do ELAPSED=$(($(date +%s) - START_TIME)) if [[ $ELAPSED -gt $MAX_POLLING_SECONDS ]]; then echo "Error: Agent version did not reach active state within ${MAX_POLLING_SECONDS} seconds" exit 1 fi VERSION_STATUS=$(curl -s \ -X GET \ "${PROJECT_ENDPOINT}/agents/${AGENT_NAME}/versions/${AGENT_VERSION}?api-version=2025-11-15-preview" \ -H "Authorization: Bearer ${FOUNDRY_TOKEN}" \ -H "Accept: application/json" \ -H "Foundry-Features: CodeAgents=V1Preview,HostedAgents=V1Preview" \ | python3 -c 'import sys,json; data=json.load(sys.stdin); print(data.get("status", "unknown"))' 2>/dev/null) echo "Current status: ${VERSION_STATUS} (elapsed ${ELAPSED}s)" if [[ "$VERSION_STATUS" == "active" ]]; then echo "Agent version ${AGENT_VERSION} is active" break fi if [[ "$VERSION_STATUS" == "failed" ]]; then echo "Error: Agent version reached failed status" exit 1 fi sleep 5 done Building the Source-Code Artifact Before calling the source-code Hosted Agent action, create the ZIP artifact that will be passed into `source_code_zip`. source-code: name: Build source-code artifact runs-on: ubuntu-latest permissions: contents: read steps: - name: Checkout uses: actions/checkout@v6 - name: Create source-code zip artifact run: | git archive --format=zip --output=source-code.zip HEAD:src/agent-framework/responses/basic - name: Upload source-code artifact uses: actions/upload-artifact@v7 with: name: source-code path: source-code.zip Calling the Action Now that we have the action, how can we scale this across multiple workflows? We pass in the required parameters and the ZIP artifact path. - name: Update agent with source code uses: ./.github/actions/update-agent-source-code with: project_endpoint: ${{ needs.deploy-iac.outputs.project_endpoint }} # Source-code agent shares the same Foundry project as the image-based # agent; the `-src` suffix keeps them as distinct agent versions. agent_name: ${{ inputs.agent_name }}-src source_code_zip: ./.artifacts/source-code/source-code.zip model_deployment_name: ${{ needs.deploy-iac.outputs.model_deployment_name }} And just to show we can call the same action multiple times, here are two examples that do just that: Deploy (Bicep) and Deploy (Terraform). Conclusion Source-code deployments give Foundry Hosted Agents another deployment path for teams that do not want, or do not need, to package every agent as a container image. By using a .zip artifact, teams can keep a familiar source-code packaging flow while still taking advantage of the managed compute abstraction that Hosted Agents provide. The reusable GitHub Action shown in this post turns that deployment process into a repeatable CI/CD step: package the source code, post the deployment to the Foundry data plane, poll until the new version is active, and expose the resulting agent version for downstream workflow steps. This keeps the deployment flexible while fitting into existing enterprise pipeline patterns. For organizations already using container-based Hosted Agents, source-code deployments do not replace that model; they expand the options available. Choose the deployment approach that best fits how your teams package, govern, and operate their agent workloads.
j_folberth
Jun 10, 2026 Place Microsoft Developer Community Blog
348Views
2likes
0Comments
Foundry Toolkit for VS Code at //build: Hosted Agents End-to-End, a Smarter Toolbox, and More
We’re excited to share what’s new for Foundry Toolkit for Visual Studio Code at //build 2026. Since going generally available, the toolkit has kept moving fast, and this release is a big one. The headline: a complete, end-to-end Hosted Agent experience, scaffold, run, deploy, and observe without ever leaving VS Code. On top of that, we’ve expanded the Toolbox with native enterprise integrations and shipped a wave of LangGraph samples so every developer has a clear path from idea to production. From your first prompt to a production-grade, observable agent, Foundry Toolkit meets you where you are. Hosted Agents, End to End Building an agent is the easy part; getting it from a first draft to a production-grade, observable service is what matters. This release makes the full Hosted Agent lifecycle available in VS Code, and it follows the way you actually work — scaffold, run, deploy, observe. Scaffold — start from a rich set of samples Hosted Agent creation now opens with a refreshed scaffolding experience and a rich sample selection, so you start from a working, framework-appropriate template instead of a blank file. Creation is smarter, too: we auto-select your subscription when there’s only one, gate tabs more clearly, and tightened spacing for a cleaner setup flow. Run (F5) — inspect as you build Press F5 and your agent runs locally with the Agent Inspector, now aligned with the rest of the extension and featuring Copilot SDK visualization so you can see what the Inspector visualizes as the agent executes. It’s the fastest loop from change to verification before anything leaves your machine. Deploy — a new UX and new ways to ship Different teams ship differently, so deployment got a refreshed UX and two new options for Hosted Agents: ZIP Code Deploy: Package your agent source as a ZIP and deploy it directly to Microsoft Foundry Agent Service. Bring-Your-Own-Image (BYOI): Already have a pre-built container in your own Azure Container Registry? Deploy straight from it. Observe — know it works in production Once deployed, the full observability story is now available: Hosted Agent Tracing: Inspect end-to-end traces of Hosted Agent invocations directly from VS Code — tool calls, delegation chains, and timing for real debugging instead of guesswork. Continuous Evaluation Settings: A new page to configure ongoing evaluation for deployed Hosted Agents, so quality is measured continuously — not just at ship time. Evaluations Node: One-click access to evaluation runs and results right from the Foundry project tree. A Smarter, More Connected Toolbox What it is, and why it matters A Toolbox is how your agent gets its capabilities — the curated set of tools, knowledge sources, and integrations it can call at runtime. Instead of hand-wiring each connection, you assemble a Toolbox once and your agent consumes it consistently across local runs and production. The result: agents that can act on real enterprise data and systems, with the connections managed in one place. From what to how: create, connect, consume Create: Start a new Toolbox from the Foundry Toolkit sidebar “Tools Catalog” and pick the capabilities your agent needs. Connect: Configure and wire in enterprise systems through native, first-class connections once, and use it for all your agents. Consume: Reference the Toolbox from your Hosted Agent so its tools are available the moment the agent runs, locally (F5) and once deployed. New this release Building on that flow, the Toolbox is now richer and more enterprise-ready: WorkIQ as a Built-in Tool: A first-class WorkIQ experience powered by A2A connections — no MCP fallback required. End-to-end toolbox creation with WorkIQ works out of the box. Fabric IQ (OneLake Catalog) Integration: Connect your agents to Microsoft Fabric OneLake catalogs directly from the Toolbox. Toolbox Guardrails: Apply content-safety guardrails to your Toolbox for safer agent execution. Faster discovery: A new Toolbox Search Toggle and Agent Tool Multi-Select let you find and wire in multiple tools in a single action. LangGraph Reaches Parity LangGraph developers, this one is for you. We’ve added five new Hosted Agent samples that bring LangGraph to full parity with the Agent Framework Responses learning path — so you get an equivalent, end-to-end walkthrough no matter which framework you prefer: MCP — tool loading from a remote MCP server (defaults to GitHub Copilot MCP) via MultiServerMCPClient. Workflows — a custom StateGraph chaining three specialized LLM nodes: slogan writer, legal reviewer, and formatter. Files — local filesystem tools plus the Foundry-Toolbox code_interpreter working over session-uploaded files. Human-in-the-Loop — a StateGraph that drafts a proposal and pauses for approval via langgraph.types.interrupt. Observability — GenAI OpenTelemetry tracing with enable_auto_tracing(); spans, metrics, and logs flow to Application Insights. We’ve also refreshed the existing bring-your-own LangGraph samples against the new hosting layer (chat with local tools, Foundry-managed Toolbox loading, and SSE-streamed multi-turn sessions backed by a MemorySaver checkpointer), so every sample reflects how Hosted Agents work today. Polish Across the Board A release is more than headline features. This one also includes a redesigned Prompt Builder “Improve an Instruction” dialog for faster iteration, fixes for MCP toolbox tool icons, clearer ZIP-deploy error surfacing, and assorted Agent Builder and Playground regression fixes — the whole experience feels tighter end to end. Get Started Today Install: Foundry Toolkit on the VS Code Marketplace Quick Start: Follow our getting-started tutorial to build your first Hosted Agent Deep Dive: Explore the documentation, samples, and LangGraph parity walkthroughs Join the Community Share your projects, file issues, or suggest features on our GitHub repository. We can’t wait to see what you build. Welcome to the next chapter of AI development!
leoyao
Jun 04, 2026 Place Microsoft Developer Community Blog
263Views
0likes
0Comments
GitHub Action for Deploying Hosted Agents
Introduction With Microsoft's introduction to Hosted Agents comes a next logical question. How to implement this? Organizations need a method that is quick, repeatable, and requires minimal adjustments to their existing tooling and processes. Thus, we will walk through how to deploy a Hosted Agent through a repeatable GitHub Action. If this is new to you this blog is a follow up to Deploying Foundry Hosted Agents via REST API | Microsoft Community Hub. Before You Start This action assumes the following are already in place in the workflow that calls it: An existing Microsoft Foundry project with a deployed model. A container image already pushed to Azure Container Registry (ACR). An identity with the **Foundry User** role on the Foundry project. See [hosted agent permissions](https://learn.microsoft.com/en-us/azure/foundry/agents/concepts/hosted-agent-permissions) for the full permissions reference. A runner with `az`, `jq`, and `python3` installed. This is true on `ubuntu-latest`; if you self-host, install them explicitly. azure/login configured in the caller workflow **before** this action runs. ⚠️ *Identity prerequisite This action assumes `azure/login` has already run in the caller workflow and that the resulting identity holds a Foundry data-plane role (e.g., Foundry User). Without that, `az account get-access-token` will fail before the REST call is made. Requirements Grounding ourselves in our requirements to implement the deployment processes, in the quickest way that leverages minimal adjustments and a repeatable process, we will leverage GitHub Action and Bash. The Bash script will take a series of arguments that will be used to call the REST API. The action requires four inputs: `project_endpoint`, `agent_name`, `image`, and `model_deployment_name`. The example pipeline wires these from the outputs of a preceding IaC step, but the action itself takes plain strings. These strings can come from any tool that can hand them off as workflow inputs. This keeps it flexible and limits adjustments to existing CI/CD processes. If interested, one can use the Azure Developer CLI (`azd up`) command which is documented via Microsoft official examples and MS Learn. This blog chose not to cover this as the majority of enterprise customers already have tooling they are leveraging other than `azd`. Also, one could use the `azure.ai.projects` library to create an agent. This blog made the decision not to go down this route as not all organizations have adopted the philosophy of allowing application code to create underlying compute infrastructure. Additionally, some organizations desire teams outside of developers to control and set the size of the Micro VM (referred to as the "sandbox" in the Foundry docs) that the Hosted Agent is running on. If your organization does not use GitHub Actions this step should be duplicatable in Azure DevOps leveraging the Bash task. Deployment Steps For us to do this appropriately let's take a step back and evaluate a CI/CD workflow for an Agent whose definition is stored in a container. Ideally a pipeline should follow steps outlined in CI/CD for AI Agents on Microsoft Foundry. Those pipelines typically take the shape build/push → IaC → update agent → smoke test. For our purposes, since we are hyper-focusing on the Hosted Agent Deployment via REST API we are going to focus on the repeatable GitHub Action of deploying the agent. To emphasize this our workflow will focus on the step called "Update agent — Foundry data plane POST `agents/NAME/versions`". Based on organization preference, I can understand the need to break out the update agent step into a separate workflow. We traditionally don't recommend this as keeping everything in one pipeline means one set of failures to triage, one history to read, and one CI/CD surface to keep current. but This action though is structured to support a split if your release process requires it. Hosted Agent REST Deployment Action This is the crux of why the article exists. If you've followed my style of repeatable DevOps process for YAML Pipelines, this action follows similar principles. We will parametrize with defaults to empower minimal configuration while also optimizing for flexibility. To view the full example check out the Update Foundry Agent action . The Inputs, Outputs, and `runs:` blocks shown below all live in a single file: `.github/actions/update-agent/action.yml`. Inputs Here are those parameters with descriptions and defaults: inputs: project_endpoint: description: Foundry project endpoint URL required: true agent_name: description: Name of the hosted agent required: true image: description: Full container image reference (registry/name:tag) required: true model_deployment_name: description: Name of the AI model deployment required: true cpu: description: CPU allocation for the agent container required: false default: '0.25' memory: description: Memory allocation for the agent container required: false default: '0.5Gi' Verify the latest sandbox sizes at hosted-agents#sandbox-sizes There is also guidance on right-sizing your Micro VMs. At the time of this writing here are the available combinations: Outputs We should output values that make sense for subsequent steps in the workflow. Every instance that calls this action may not use them, but it's always good to expose non-secret values just in case. In our case we are creating a new version of the agent, so let's output that agent version: outputs: agent_version: description: Version ID returned by the Foundry data plane value: ${{ steps.post.outputs.agent_version }} `agent_version` is the version identifier returned by the data plane. Capture this in your pipeline (artifact, release tag, etc.) so you have an audit trail and a target to re-deploy against if a future version needs to be rolled back. Subsequent steps in the workflow can reference it via `${{ steps.<step-id>.outputs.agent_version }}`. Action The action will need to map our environment variables being passed into the input as the first step. After that we will need to get an access token from Azure so we can then call the REST API endpoint. Once we have this, we will need to prepare the body of our call. Verify against the API for all valid properties. For our example I chose not to set `rai_config` (Responsible AI overview) and `tools` (function/tool bindings) to keep things simple. runs: using: composite steps: - name: Post agent version to Foundry data plane id: post shell: bash env: PROJECT_ENDPOINT: ${{ inputs.project_endpoint }} AGENT_NAME: ${{ inputs.agent_name }} IMAGE: ${{ inputs.image }} MODEL_DEPLOYMENT_NAME: ${{ inputs.model_deployment_name }} CPU: ${{ inputs.cpu }} MEMORY: ${{ inputs.memory }} run: | FOUNDRY_TOKEN=$(az account get-access-token \ --resource "https://ai.azure.com/" \ --query accessToken -o tsv) AGENT_REQUEST_BODY=$(jq -n \ --arg cpu "$CPU" \ --arg memory "$MEMORY" \ --arg model "$MODEL_DEPLOYMENT_NAME" \ --arg image "$IMAGE" \ '{ definition: { kind: "hosted", container_protocol_versions: [{protocol: "responses", version: "1.0.0"}], cpu: $cpu, memory: $memory, environment_variables: {AZURE_AI_MODEL_DEPLOYMENT_NAME: $model}, image: $image ⚠️ **Heads up on logs.** The line that echoes `HTTP ${HTTP_STATUS}: $(cat /tmp/agent_response.json)` dumps the full response body to the job log. If your request body contains sensitive `environment_variables`, the API may return them in the response, where they will appear in plain text in the workflow log. Either scrub the response before echoing, or echo only the `version` field on success. A 2xx response confirms the data plane accepted the new agent version. Confirming the agent behaves as intended is a separate step. This is done typically with a smoke test against the deployed agent in a later workflow job. If something goes wrong the most common failures are: 401/403- `azure/login` didn't run, the identity is missing a Foundry data-plane role, or the wrong subscription is selected. Check the `azure/login` step and confirm the identity holds **Foundry User** (or higher) on the Foundry project (see the *Before You Start* callout above). 404 - wrong `project_endpoint`, or the agent named in `agent_name` does not yet exist on the project. The agent must exist before posting a new version. 400 - body or model issue: invalid `cpu` / `memory` shape, a required field missing, or `model_deployment_name` pointing at a deployment that isn't reachable from this project. Calling the Action So now that we have the action, how can we scale this across multiple workflows? Simple, we just need to pass in the required parameters. Here is an example, with a stubbed `deploy-iac` step so can the outputs passed into the action as inputs: - name: Deploy Bicep infrastructure id: deploy-iac uses: ./.github/actions/deploy-bicep with: environment_name: ${{ inputs.environment_name || 'main' }} location: ${{ inputs.location || 'swedencentral' }} - name: Update agent uses: ./.github/actions/update-agent with: project_endpoint: ${{ steps.deploy-iac.outputs.project_endpoint }} agent_name: ${{ inputs.agent_name }} image: ${{ steps.deploy-iac.outputs.acr_endpoint }}/${{ inputs.image_name }}:${{ inputs.image_tag }} model_deployment_name: ${{ steps.deploy-iac.outputs.model_deployment_name }} And just to show we can call the same action multiple times here are two examples that do just that: Deploy (Bicep) and Deploy (Terraform). Conclusion The composite action shown above gives organizations what the introduction called for: a quick, repeatable way to deploy a Hosted Agent that requires minimal adjustments to the GitHub Actions tooling and processes already in use. With it wired into a workflow, deploying a new Hosted Agent version becomes a standard step in your pipeline.
j_folberth
Jun 03, 2026 Place Microsoft Developer Community Blog
369Views
0likes
0Comments
Deploying Foundry Hosted Agents via REST API
Learn how to deploy Microsoft Foundry Hosted Agents via REST API, with a practical walkthrough of the request flow, deployment pattern, and key implementation considerations for integrating hosted agents into real-world applications.
j_folberth
Jun 02, 2026 Place Microsoft Developer Community Blog
379Views
0likes
0Comments
Infrastructure as Code for AI: Building and Deploying Microsoft Hosted Agents with Terraform
AI agents are no longer experimental. Teams are shipping production-grade agents that retrieve information, call APIs, reason over documents, and orchestrate multi-step workflows at scale. Microsoft Foundry's Hosted Agents service gives you a fully managed runtime for those agents, built on top of the Microsoft Foundry Agent Service, with Microsoft handling the infrastructure, scaling, and runtime lifecycle. The challenge is that provisioning this infrastructure by hand or clicking through the portal, running one-off CLI commands, or relying on undocumented shell scripts, simply does not scale. It introduces configuration drift, makes reproducing environments painful, and creates real governance risk as teams grow. This post walks through how to provision and manage the Azure infrastructure required to run Microsoft Hosted Agents using Terraform. You will leave with working configuration, a clear understanding of the resource model, and practical guidance on where Terraform can take you all the way and where you will need to supplement with the Azure CLI or the Microsoft Foundry Agent Service SDK. What Are Microsoft Hosted Agents? Microsoft Hosted Agents are AI agents deployed and managed within Microsoft Foundry. Microsoft Foundry is Microsoft's unified platform for building, evaluating, and deploying AI applications and agents. It provides: A managed compute runtime — Microsoft provisions and scales the infrastructure so you do not manage VMs or containers. An agent execution environment — agents are defined with instructions, tools (code interpreter, Bing grounding, Azure AI Search, function calling), and a backing model endpoint. Deep Azure integration — identity via Microsoft Entra ID, secrets via Azure Key Vault, storage via Azure Blob, tracing via Azure Monitor and Application Insights. A project-scoped model — each Microsoft Foundry project encapsulates an agent's resources, connections, and deployments within a logical boundary. The "Hosted" distinction matters. You are not running agent code on your own Kubernetes cluster or App Service. Microsoft manages the runtime. Your responsibility is to provision the surrounding infrastructure correctly: the Microsoft Foundry resource, the project, the model deployment, the identity configuration, and the monitoring resources that back it all. That boundary — the infrastructure you own — is exactly what Terraform manages well. Why Terraform for Hosted Agent Deployments? Infrastructure as Code (IaC) is not a new idea, but its importance grows as AI deployments become more complex. Here is why Terraform is a strong choice for Microsoft Foundry deployments specifically: Repeatability: A Terraform configuration produces the same infrastructure every time. Staging mirrors production. Disaster recovery is a terraform apply away. Governance: Infrastructure definitions live in version control alongside application code. Changes are reviewable, auditable, and reversible. This satisfies most enterprise change-management requirements. Scale: Spinning up per-customer or per-team agent environments using Terraform workspaces or module instantiation is far more manageable than manual provisioning. State management: Terraform tracks the actual state of your Azure resources. It detects drift and reconciles it declaratively. Ecosystem: The AzureRM provider is mature, actively maintained by HashiCorp and Microsoft, and covers the majority of Azure services including the Microsoft Foundry resources. Architecture Overview Before writing any Terraform, it helps to understand the resource hierarchy in Microsoft Foundry and how each layer maps to an Azure resource type. The Foundry Resource Hierarchy Microsoft Foundry uses a two-level hierarchy: 1. Foundry Account ( azurerm_cognitive_account , kind: AIServices ) — The top-level AI Services resource. It provides the model endpoint, manages agent execution, and acts as the logical boundary for all projects beneath it. You must set project_management_enabled = true and provide a custom_subdomain_name to enable project creation. In ARM terms this is a Microsoft.CognitiveServices/accounts resource. 2. Foundry Project ( azurerm_cognitive_account_project ) — A child resource scoped within the Foundry Account. Each project has its own agents, model deployments, connections, and data assets. In production, you typically have one project per application, product team, or environment. Figure 1: The Microsoft Foundry resource hierarchy. A single Foundry Account (Cognitive Services, kind AIServices) acts as the top-level container, with Projects scoped beneath it — one per application, team, or environment. Supporting Resources The following Azure resources make up a complete Hosted Agents deployment: Microsoft Foundry Account (AI Services): A single azurerm_cognitive_account of kind AIServices serves as both the Foundry Account and the model endpoint host. Model deployments (e.g. gpt-4.1 ) are provisioned via azurerm_cognitive_deployment within this account. Log Analytics Workspace + Application Insights: Provides observability for agent traces, request logs, and metrics. User-Assigned Managed Identity: Grants the Foundry Account and Projects access to Azure resources without stored credentials. Role Assignments (RBAC): Wires the managed identity to the Foundry Account with least-privilege Cognitive Services permissions. Figure 2: Supporting infrastructure map. The managed identity holds least-privilege RBAC grants to the Microsoft Foundry Account (AI Services) — enabling model access and project management — all within the same resource group. Reference Architecture (Described) A production-ready layout separates concerns across two resource groups: one for shared infrastructure (networking, monitoring) and one for the Microsoft Foundry Account and its projects. The Foundry resource group houses the azurerm_cognitive_account (kind: AIServices) resource and the azurerm_cognitive_account_project instances. The shared resource group holds Log Analytics and Application Insights. A user-assigned managed identity spans both, holding RBAC grants to each backing service. For a dev/test environment you can collapse both into a single resource group. For production, the separation makes cost attribution, access control, and lifecycle management cleaner. Prerequisites Accounts and Permissions An active Azure subscription with the Owner or Contributor + User Access Administrator roles at the subscription or resource group level (role assignments require elevated permission). Foundry access enabled in your subscription. In some tenants you may need to accept terms or request quota for Azure OpenAI. Azure OpenAI quota for the model you intend to deploy (e.g. gpt-4.1 ). Request this via the Azure portal under Quotas in Azure OpenAI Studio. Local Tools Terraform CLI ≥ 1.9 — Install guide Azure CLI ≥ 2.60 — Install guide A code editor (VS Code with the HashiCorp Terraform extension and the Azure Terraform extension is a strong combination). Authentication For local development, authenticate via the Azure CLI. The AzureRM Terraform provider picks this up automatically: az login az account set --subscription "<your-subscription-id>" For CI/CD pipelines, use a service principal with AZURE_CLIENT_ID , AZURE_CLIENT_SECRET , AZURE_TENANT_ID , and AZURE_SUBSCRIPTION_ID environment variables, or — preferably — a workload identity federation (federated credentials) to avoid storing long-lived secrets. GitHub Actions supports OIDC-based workload identity natively. Terraform Fundamentals for Hosted Agents Provider Configuration The hashicorp/azurerm provider is your primary dependency. The new Microsoft Foundry resources ( azurerm_cognitive_account with kind = "AIServices" and azurerm_cognitive_account_project ) require version 4.x of the provider. Pin your version to avoid unexpected breaking changes: terraform { required_version = ">= 1.9" required_providers { azurerm = { source = "hashicorp/azurerm" version = "~> 4.0" } } } provider "azurerm" { features { key_vault { purge_soft_delete_on_destroy = false } resource_group { prevent_deletion_if_contains_resources = true } } subscription_id = var.subscription_id } The features block is required even when empty. The Key Vault setting prevents accidental secret loss during terraform destroy . The resource group setting adds an extra safety net in production. State Management Never use local state for shared or production environments. Store state in Azure Blob Storage with state locking via Azure Blob lease: terraform { backend "azurerm" { resource_group_name = "rg-terraform-state" storage_account_name = "sttfstate<unique>" container_name = "tfstate" key = "ai-agents/prod.tfstate" } } Create the state storage account and container before running terraform init . A bootstrap script or a separate Terraform workspace dedicated to state management are both valid approaches. Known Limitations and Workarounds Terraform coverage of Foundry is improving rapidly but is not yet complete. You should be aware of the following gaps as of mid-2025: Agent definitions are not in Terraform: The actual agent (its system prompt, instructions, tool configuration, and model binding) is created via the Azure AI Agent Service SDK or the Foundry portal, not via Terraform. Terraform provisions the infrastructure; your application code or a post-provisioning script creates the agent. Connections: Some connection types within a Foundry Project (e.g. Azure AI Search, custom connections) may require the Azure CLI or the Foundry SDK. Verify coverage in the AzureRM provider docs before assuming Terraform handles them. Model deployments: azurerm_cognitive_deployment covers OpenAI model deployments and is well-supported. Use this to deploy your model before referencing it from the agent. Private networking: If you need private endpoints for your Foundry Account, additional VNet, subnet, and DNS zone resources are required. This post focuses on the public networking path; private networking is a follow-on topic. Step-by-Step Implementation The following sections build up a complete Terraform configuration. The recommended project structure is a flat module layout for a single environment, with a separate modules/ai-foundry/ directory when you need to reuse the pattern across environments. ai-agents-infra/ ├── main.tf ├── variables.tf ├── outputs.tf ├── versions.tf └── terraform.tfvars 1. Variables Define variables first. Parameterising from the start avoids hard-coded values that create technical debt when you replicate the configuration for staging or production: # variables.tf variable "subscription_id" { type = string description = "Azure subscription ID." } variable "location" { type = string default = "eastus" description = "Azure region for all resources." } variable "environment" { type = string default = "dev" description = "Environment label (dev, staging, prod)." } variable "project_name" { type = string description = "Short name for the project. Used in resource naming." } variable "openai_model_name" { type = string default = "gpt-4.1" description = "Azure OpenAI model to deploy for the agent." } variable "openai_model_version" { type = string default = "2025-04-14" description = "Model version to deploy." } variable "openai_sku_capacity" { type = number default = 10 description = "Tokens-per-minute capacity (in thousands) for the deployment." } 2. Resource Group and Core Infrastructure A single resource group keeps things simple for dev. In production, consider splitting as described in the architecture section above. # main.tf — Resource group and naming locals locals { name_prefix = "${var.project_name}-${var.environment}" tags = { environment = var.environment project = var.project_name managed_by = "terraform" } } resource "azurerm_resource_group" "main" { name = "rg-${local.name_prefix}" location = var.location tags = local.tags } 3. Supporting Services Provision Log Analytics and Application Insights for agent observability and diagnostics. Unlike the legacy Hub-based architecture, the azurerm_cognitive_account (kind AIServices ) does not require a dedicated Storage Account or Key Vault as provisioning dependencies. # main.tf — Monitoring infrastructure data "azurerm_client_config" "current" {} # Log Analytics Workspace (required by Application Insights) resource "azurerm_log_analytics_workspace" "main" { name = "law-${local.name_prefix}" resource_group_name = azurerm_resource_group.main.name location = azurerm_resource_group.main.location sku = "PerGB2018" retention_in_days = 30 tags = local.tags } # Application Insights for agent observability resource "azurerm_application_insights" "main" { name = "appi-${local.name_prefix}" resource_group_name = azurerm_resource_group.main.name location = azurerm_resource_group.main.location workspace_id = azurerm_log_analytics_workspace.main.id application_type = "web" tags = local.tags } 4. User-Assigned Managed Identity A managed identity allows the Foundry Account and its projects to authenticate to Azure services without stored credentials. This is a security best practice and is required for several Microsoft Foundry features. # main.tf — Managed identity for the Microsoft Foundry Account resource "azurerm_user_assigned_identity" "foundry" { name = "id-${local.name_prefix}-foundry" resource_group_name = azurerm_resource_group.main.name location = azurerm_resource_group.main.location tags = local.tags } 5. Microsoft Foundry Account and Model Deployment In the current Microsoft Foundry architecture, a single azurerm_cognitive_account of kind AIServices serves as both the Foundry Account and the model endpoint host. Set project_management_enabled = true and provide a globally unique custom_subdomain_name to enable Foundry Project creation beneath it. # main.tf — Microsoft Foundry Account (AI Services) resource "azurerm_cognitive_account" "foundry" { name = "aisa-${local.name_prefix}" resource_group_name = azurerm_resource_group.main.name location = azurerm_resource_group.main.location kind = "AIServices" sku_name = "S0" project_management_enabled = true custom_subdomain_name = "${replace(local.name_prefix, "-", "")}foundry" tags = local.tags identity { type = "UserAssigned" identity_ids = [azurerm_user_assigned_identity.foundry.id] } } # Deploy the model within the Foundry Account resource "azurerm_cognitive_deployment" "agent_model" { name = var.openai_model_name cognitive_account_id = azurerm_cognitive_account.foundry.id model { format = "OpenAI" name = var.openai_model_name version = var.openai_model_version } sku { name = "Standard" capacity = var.openai_sku_capacity } } Note on quota: The capacity value is in thousands of tokens per minute. A value of 10 means 10,000 TPM. If terraform apply fails with a quota error, reduce this value or request a quota increase via the Azure portal. Note on custom_subdomain_name : This must be globally unique across all Azure AI Services accounts. If provisioning fails with a conflict error, adjust the suffix (e.g. append a random string using the random_string resource). 6. Foundry Project Create a Foundry Project beneath the Foundry Account provisioned in Step 5. Each project scopes its own agents, model connections, and data assets. Use one project per application or team. # main.tf — Microsoft Foundry Project resource "azurerm_cognitive_account_project" "agent_project" { name = "proj-${local.name_prefix}-agents" cognitive_account_id = azurerm_cognitive_account.foundry.id location = azurerm_resource_group.main.location display_name = "Agent Project - ${var.project_name}" description = "Hosted agents project for ${var.project_name}" identity { type = "UserAssigned" identity_ids = [azurerm_user_assigned_identity.foundry.id] } tags = local.tags } 7. RBAC Role Assignments Grant the managed identity the permissions it needs. This is the area most commonly misconfigured in manual deployments. Terraform makes it explicit and auditable. # main.tf — RBAC assignments # AI Services: Foundry identity needs Cognitive Services OpenAI User to call model endpoints resource "azurerm_role_assignment" "foundry_openai" { scope = azurerm_cognitive_account.foundry.id role_definition_name = "Cognitive Services OpenAI User" principal_id = azurerm_user_assigned_identity.foundry.principal_id } # AI Services: Foundry identity needs Cognitive Services Contributor to manage projects resource "azurerm_role_assignment" "foundry_contributor" { scope = azurerm_cognitive_account.foundry.id role_definition_name = "Cognitive Services Contributor" principal_id = azurerm_user_assigned_identity.foundry.principal_id } # Optional: grant your own principal the Azure AI Developer role on the Foundry Account # so you can create and manage agents from your local machine or CI pipeline resource "azurerm_role_assignment" "developer_account" { scope = azurerm_cognitive_account.foundry.id role_definition_name = "Azure AI Developer" principal_id = data.azurerm_client_config.current.object_id } 8. Outputs Export the values your application and post-provisioning scripts will need: # outputs.tf output "resource_group_name" { value = azurerm_resource_group.main.name } output "foundry_account_id" { value = azurerm_cognitive_account.foundry.id } output "ai_foundry_project_id" { value = azurerm_cognitive_account_project.agent_project.id } output "foundry_endpoint" { value = azurerm_cognitive_account.foundry.endpoint } output "openai_deployment_name" { value = azurerm_cognitive_deployment.agent_model.name } output "managed_identity_client_id" { value = azurerm_user_assigned_identity.foundry.client_id } 10. Example terraform.tfvars # terraform.tfvars — do NOT commit this file if it contains sensitive values subscription_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" location = "eastus" environment = "dev" project_name = "contoso-agents" openai_model_name = "gpt-4.1" openai_model_version = "2025-04-14" openai_sku_capacity = 10 Figure 3: Terraform deployment workflow. State is stored in an Azure Blob Storage backend, enabling team collaboration and preventing concurrent apply conflicts. Deploying and Validating the Agent Infrastructure Running the Deployment # 1. Initialise — downloads provider plugins and configures the backend terraform init # 2. Validate syntax and configuration terraform validate # 3. Preview what will be created (review carefully before applying) terraform plan -out=tfplan # 4. Apply the plan terraform apply tfplan A full initial apply typically takes 8–15 minutes. The Foundry Account (AI Services) provisioning is the longest step. The model deployment may also take a few minutes to reach a ready state — Terraform handles this with implicit dependency ordering, but you may see brief retries in the output. Verifying the Deployment After apply completes, verify each resource is in a healthy state: # Confirm the resource group and its resources exist az resource list --resource-group "rg-contoso-agents-dev" --output table # Check the Foundry Account (AI Services) is in a Succeeded state az cognitiveservices account show \ --name "aisacontosoagentsdevfoundry" \ --resource-group "rg-contoso-agents-dev" \ --query "properties.provisioningState" # Confirm the model deployment is ready az cognitiveservices account deployment show \ --resource-group "rg-contoso-agents-dev" \ --name "aisacontosoagentsdevfoundry" \ --deployment-name "gpt-4.1" \ --query "properties.provisioningState" Navigate to the Microsoft Foundry portal and confirm your Foundry Account and Project appear. At this point you can create an agent manually in the portal to validate that the model endpoint is reachable and the identity chain works correctly before automating agent creation. Common Deployment Issues Quota exceeded on model deployment: Reduce openai_sku_capacity or request a quota increase in the Azure portal under Azure OpenAI → Quotas. Resource name conflicts: The custom_subdomain_name on the Foundry Account must be globally unique. Use the random_string Terraform resource to append a unique suffix if needed. Role assignment propagation delay: RBAC changes can take 1–2 minutes to propagate. If the Foundry Account cannot access resources immediately after apply, wait a moment and retry. project_management_enabled not set: If azurerm_cognitive_account_project fails with an error about project management, ensure project_management_enabled = true and custom_subdomain_name are set on the parent azurerm_cognitive_account . azurerm_cognitive_account_project not found: Ensure your AzureRM provider version is ~> 4.0 or later. Run terraform init -upgrade if you previously initialised with an older version. Creating an Agent After Infrastructure Provisioning Terraform has provisioned the platform. Now you need to create the agent itself. This is done via the Azure AI Agents SDK (available for Python, C#, JavaScript, and Java) or the Foundry portal. The following Python snippet demonstrates creating a basic agent programmatically after Terraform apply. It uses the outputs from Terraform directly: import os from azure.ai.projects import AIProjectClient from azure.identity import DefaultAzureCredential # These values come from Terraform outputs project_connection_string = os.environ["AI_PROJECT_CONNECTION_STRING"] model_deployment = os.environ["OPENAI_DEPLOYMENT_NAME"] client = AIProjectClient.from_connection_string( credential=DefaultAzureCredential(), conn_str=project_connection_string, ) # Create the hosted agent agent = client.agents.create_agent( model=model_deployment, name="customer-support-agent", instructions=( "You are a helpful customer support assistant. " "Answer questions accurately and concisely. " "If you are unsure, say so rather than guessing." ), ) print(f"Agent created: {agent.id}") Figure 5: Agent runtime architecture. The Foundry Project hosts the Agent Service, which routes requests to the GPT-4.1 model endpoint and optionally invokes tool integrations (Code Interpreter, File Search, Azure Functions, or custom tools). The project connection string is available from the Foundry portal (Project → Overview → Project connection string) or can be constructed from Terraform outputs. Refer to the Azure AI Agents quickstart for the full SDK setup. Operational Considerations Lifecycle Management Terraform's declarative model means updates are incremental by default. To update the OpenAI model version, change openai_model_version in your .tfvars file and run terraform plan to confirm the change before applying. Terraform will delete and recreate the cognitive deployment in-place — be aware this causes brief downtime for the model endpoint. To destroy a complete environment: terraform destroy The prevent_deletion_if_contains_resources feature on the resource group will block destruction if any untracked resources exist, which is a useful safety net in production. Handling Configuration Drift Drift occurs when Azure resources are modified outside of Terraform (portal changes, CLI scripts, other automation). Detect drift with: terraform plan -refresh-only This reports the difference between the Terraform state and the actual resource state without making changes. Schedule this as a drift-detection job in CI to catch out-of-band changes early. Environment Isolation Use Terraform workspaces or separate state files per environment: # Create and switch to a staging workspace terraform workspace new staging terraform workspace select staging terraform apply -var-file="environments/staging.tfvars" Alternatively, use a directory-per-environment layout ( environments/dev/ , environments/prod/ ) with a shared module in modules/ai-foundry/ . The directory layout is more explicit and easier to navigate in a team setting. Cost Control Set a low openai_sku_capacity in dev (e.g. 1 = 1,000 TPM) to limit accidental spend. Tag all resources with environment and project tags (the locals.tags block handles this) to enable cost attribution in Azure Cost Management. Use the Azure Pricing Calculator to estimate monthly costs before deploying to production. The Azure AI Services account (model token usage), Log Analytics, and Application Insights are the primary cost drivers. Consider destroying dev environments overnight using a scheduled CI job that runs terraform destroy and terraform apply on a schedule. CI/CD Integration Automating Terraform via GitHub Actions is straightforward. The following workflow runs plan on pull requests and apply on merge to the main branch: # .github/workflows/terraform.yml name: Terraform Deploy on: push: branches: [main] pull_request: branches: [main] permissions: id-token: write # Required for OIDC workload identity federation contents: read pull-requests: write env: ARM_CLIENT_ID: ${{ secrets.AZURE_CLIENT_ID }} ARM_TENANT_ID: ${{ secrets.AZURE_TENANT_ID }} ARM_SUBSCRIPTION_ID: ${{ secrets.AZURE_SUBSCRIPTION_ID }} ARM_USE_OIDC: "true" jobs: terraform: runs-on: ubuntu-latest environment: ${{ github.ref == 'refs/heads/main' && 'production' || 'staging' }} steps: - uses: actions/checkout@v4 - uses: hashicorp/setup-terraform@v3 with: terraform_version: "~1.9" - name: Terraform Init run: terraform init - name: Terraform Plan run: terraform plan -out=tfplan -var-file="environments/dev.tfvars" - name: Terraform Apply if: github.ref == 'refs/heads/main' run: terraform apply -auto-approve tfplan Figure 4: CI/CD pipeline using GitHub Actions with OIDC workload identity federation. No long-lived secrets are stored — the runner exchanges a JWT for a short-lived Azure token before each Terraform run. Use OIDC workload identity federation to avoid storing long-lived service principal secrets in GitHub. This is the recommended authentication method for GitHub Actions deployments to Azure. Best Practices Modular Terraform Design Once you have a working flat configuration, extract the Foundry resources into a reusable module. A module boundary around the Hub, Project, OpenAI account, and RBAC assignments lets you stamp out new agent environments with a single module call and a new .tfvars file. # environments/staging/main.tf module "agent_platform" { source = "../../modules/ai-foundry" project_name = "contoso-agents" environment = "staging" location = "eastus" subscription_id = var.subscription_id openai_model_name = "gpt-4.1" openai_model_version = "2025-04-14" openai_sku_capacity = 30 } Parameterisation and Environment Configs Never hard-code subscription IDs, tenant IDs, or region names in main.tf . Keep environment-specific values in environments/<env>.tfvars files and commit them to source control (they are config, not secrets). Store actual secrets (service principal credentials, API keys for third-party connections) in Azure Key Vault or GitHub Secrets — not in .tfvars files. Versioning Models and Agent Configurations Treat your openai_model_version and agent instructions as versioned artefacts. When Microsoft releases a new model version, create a pull request that updates the variable value, runs a plan, and documents the expected change. This creates a clear history of when model versions changed and who approved the change. Logging and Monitoring Enable diagnostic settings on the Azure OpenAI account to route request logs and metrics to your Log Analytics workspace. Use Application Insights to capture agent traces from the Azure AI Agents SDK (it integrates with OpenTelemetry). Set up Azure Monitor alerts on OpenAI account errors (4xx/5xx rates) and Log Analytics ingestion failures. Responsible AI Considerations Enable Azure OpenAI content filtering on your deployment. Terraform supports this via the content_filter block in azurerm_cognitive_deployment where the policy allows. Define a clear system prompt that sets agent behaviour boundaries and instructs the agent to decline harmful requests. Log and review agent conversations during early deployment. Microsoft Foundry includes evaluation tools for assessing agent response quality and safety. Apply least-privilege RBAC throughout — the role assignments in this post follow that principle. Conclusion and Next Steps You now have a complete, repeatable Terraform configuration for provisioning the Azure infrastructure required to run Microsoft Hosted Agents via Microsoft Foundry. The key takeaways: Terraform manages the infrastructure layer effectively — the Foundry Account, Project, model deployment, identity, and RBAC. Agent definitions themselves are provisioned via the Azure AI Agents SDK or the Foundry portal as a post-Terraform step. State management, parameterisation, and modular design are non-negotiable for team environments. OIDC-based workload identity is the right authentication model for CI/CD pipelines. Drift detection, environment isolation, and cost tagging are operational necessities, not optional extras. Where to Go Next Add Azure AI Search: Extend the Foundry Project with an Azure AI Search connection and enable the Search tool on your agent for Retrieval-Augmented Generation (RAG). Private networking: Add private endpoints for the Foundry Hub and OpenAI account to lock down ingress to your VNet. Multi-region deployment: Instantiate the Terraform module twice with different regions and use Azure Traffic Manager or Front Door to route requests. GitOps for agents: Store agent definitions (system prompts, tool configurations) as YAML or JSON in your repository and use a CI pipeline to apply them via the Azure AI Agents SDK on every merge, creating a fully declarative agent deployment pipeline. Evaluation pipelines: Use Microsoft Foundry's built-in evaluation capabilities to run automated quality and safety assessments on every new model version or prompt change. References What is Microsoft Foundry? — Microsoft Learn Azure AI Agent Service overview — Microsoft Learn Azure AI Agents quickstart — Microsoft Learn azurerm_cognitive_account — Terraform Registry azurerm_cognitive_account_project — Terraform Registry azurerm_cognitive_deployment — Terraform Registry AzureRM backend — Terraform documentation OIDC workload identity federation with GitHub Actions — Microsoft Learn Azure OpenAI content filtering — Microsoft Learn Install Terraform — HashiCorp Microsoft Foundry portal
Lee_Stott
Jun 02, 2026 Place Microsoft Developer Community Blog
671Views
0likes
0Comments
Building and Operating a Microsoft Foundry Hosted Agent with GitOps and GitHub Tasks
The Gap Between Prototype and Production Most AI engineering teams can build a working agent in a day. The hard part is not building it; the hard part is operating it. Prompts drift. Tool configurations change without review. Deployments happen from someone's laptop. There is no audit trail, no rollback plan, and no consistent way to promote a change from a development environment to production. GitOps closes that gap. By treating your agent definition, configuration, and infrastructure as version-controlled source code, you get the same delivery discipline that software engineering teams have applied to application code for years. Every change is reviewed, every deployment is automated, and every environment state is traceable to a specific commit. This post shows you how to apply GitOps principles to a Microsoft Foundry Hosted Agent using GitHub as the source of truth and GitHub Tasks and Actions as the automation layer. The result is a repeatable, governed, production-ready delivery model for AI agents. What Is a Microsoft Foundry Hosted Agent? Microsoft Foundry is Microsoft's platform for building, deploying, and operating AI applications and agents. A Hosted Agent is an agent runtime managed by the Foundry platform rather than self-hosted by your team. You supply the agent logic, configuration, and tools; Foundry handles the runtime lifecycle, scaling, and managed infrastructure. In practical terms, a Foundry Hosted Agent is a containerised agent application. You package your agent code, prompt definitions, tool bindings, and environment configuration into a container image. Foundry deploys and manages that container within a Foundry project, connected to models, tools, and observability infrastructure that the platform provides. Teams choose Hosted Agents over self-hosting because: The platform manages runtime infrastructure, patching, and scaling Integration with Azure AI models, managed identity, and observability is built in You can focus engineering effort on agent logic rather than cluster management Foundry projects provide environment and resource isolation without requiring you to provision and manage separate Azure resources for each environment Hosted Agents are a good fit when your team wants strong operational support with minimal platform overhead, when you need clear separation between environments, and when your agents depend on Azure AI capabilities such as Azure OpenAI Service, Azure AI Search, or Model Context Protocol integrations. Why GitOps Matters Specifically for AI Agents GitOps is straightforward for stateless web services: the code changes, the pipeline runs, the container is deployed. AI agents are more complex because there are multiple distinct artefacts that all affect agent behaviour: System prompts and instruction files Tool definitions and external integrations Model selection and configuration (temperature, max tokens, safety settings) Model Context Protocol (MCP) server definitions Orchestration logic and agent workflow code Safety and policy settings Infrastructure and deployment configuration Any one of these can change the behaviour of your agent in ways that are difficult to detect without structured review. A prompt change that looks harmless can alter tone, scope, or factual grounding. A tool configuration change can expose data to unintended callers. A model upgrade can shift response quality unpredictably. Git gives you a single place to version, review, and approve all of these artefacts together. Pull requests give you a structured review gate. Workflow automation gives you validation before anything reaches a deployed environment. Tags and releases give you deployment markers you can roll back to. The discipline of GitOps turns what is often an ad-hoc AI delivery process into a repeatable engineering practice. Reference Architecture The following diagram shows a practical reference architecture for delivering a Microsoft Foundry Hosted Agent through a GitOps model using GitHub. +---------------------------+ | GitHub Repository | | /src /agents /tools | | /prompts /infra | | /.github/workflows | +---------------------------+ | | Pull Request / Push to main v +---------------------------+ | GitHub Actions | | 1. Validate agent config | | 2. Lint and scan code | | 3. Run unit tests | | 4. Build container image | | 5. Push to registry | +---------------------------+ | | Image tag (SHA or semver) v +---------------------------+ | Azure Container Registry | | myregistry.azurecr.io | | my-agent:<sha> | +---------------------------+ | +------+------+ | | v v +----------+ +----------+ | Foundry | | Foundry | | Dev | | Test | | Project | | Project | +----------+ +----------+ | Approval gate (GitHub env) | v +----------+ | Foundry | | Prod | | Project | +----------+ | v +---------------------------+ | Observability | | Azure Monitor / App | | Insights / Foundry Logs | +---------------------------+ Key design decisions in this architecture: The GitHub repository is the single source of truth for all agent artefacts No human deploys directly to any Foundry project; all changes flow through automation Environment promotion requires a GitHub environment approval, creating a governance gate The container image is built once and promoted across environments; the image is not rebuilt per environment Secrets are stored in Azure Key Vault and accessed by the Foundry agent at runtime via managed identity Figure: GitOps delivery pipeline stages from commit to production Repository Structure A well-structured repository separates agent logic from infrastructure and tooling from prompts. The following structure works well in practice: my-foundry-agent/ ├── .github/ │ ├── workflows/ │ │ ├── validate.yml # Runs on every PR │ │ ├── build-deploy.yml # Runs on merge to main │ │ └── rollback.yml # Manual trigger workflow │ └── CODEOWNERS # Review assignments by path ├── src/ │ ├── agents/ │ │ ├── agent.py # Agent entry point and orchestration │ │ └── agent_config.json # Agent metadata and settings │ ├── tools/ │ │ ├── search_tool.py # Tool implementations │ │ └── data_tool.py │ └── prompts/ │ ├── system.txt # System prompt (versioned as plain text) │ └── instructions.txt # Supplementary instructions ├── tests/ │ ├── unit/ # Unit tests for tools and logic │ ├── integration/ # Integration tests against a running agent │ └── smoke/ # Post-deployment smoke tests ├── infra/ │ ├── main.bicep # Foundry project and resource definitions │ └── environments/ │ ├── dev.parameters.json │ ├── test.parameters.json │ └── prod.parameters.json ├── scripts/ │ ├── validate_agent.py # Config validation script │ └── smoke_test.py # Smoke test runner ├── Dockerfile # Container image definition └── docs/ └── architecture.md # Architecture and runbook documentation What belongs where and why: /src/prompts - System prompts as plain text files. Versioning prompts as files means every change goes through a pull request with a diff review, just as code does. /src/agents - Agent orchestration logic and configuration. Keeps the entry point and agent metadata co-located. /src/tools - Tool implementations separated from agent logic. Tool logic changes independently and should be reviewable in isolation. /infra - Infrastructure as code with per-environment parameter files. Environment-specific values live here, never in source files. /tests - Three layers of testing: unit tests for tools, integration tests for the full agent, and smoke tests that run against a deployed environment. /.github/workflows - All automation defined as code. There should be no manual deployment steps that live outside this directory. GitHub Tasks Across the Delivery Lifecycle GitHub Tasks and Issues provide the work tracking layer on top of the GitOps delivery model. Used well, they connect the intention behind a change to its implementation and deployment history. Practical patterns for using GitHub Tasks with agent delivery: Prompt change task - Open an issue to describe why the system prompt is changing. The pull request that changes system.txt closes that issue, creating a permanent link between the rationale and the diff. Tool integration task - When adding a new MCP server or external tool integration, create a task that captures the design decision, security review outcome, and test evidence before the pull request is merged. Model upgrade task - When upgrading the underlying model version, create a task that includes evaluation results and comparison data. The task becomes part of your change audit trail. Rollback task - If a deployment causes quality regressions, create a task to track the rollback, root cause investigation, and corrective action. Automation can open this task automatically when a deployment fails health checks. Dependency on approval - GitHub Tasks can be linked to environment approvals in GitHub Actions. A task in a specific milestone or project column can gate a promotion workflow. The key insight is that GitHub Tasks are not just work management; they are part of your audit trail. A regulatory or security reviewer can follow the chain from a production deployment back through workflow runs, pull request reviews, and the original task that described the intent of the change. End-to-End GitOps Flow The following walk-through describes a realistic developer experience for changing an agent prompt and promoting it to production. A developer opens a GitHub Issue describing the prompt change required and the expected behaviour improvement. The developer creates a feature branch, edits src/prompts/system.txt , and updates any related unit tests. A pull request is opened. The validate workflow runs immediately, checking prompt length, configuration schema, and lint rules. Unit tests run against the changed files. A code reviewer approves the pull request. The CODEOWNERS file ensures that prompt changes require review from the AI engineering team, not just any contributor. On merge to main, the build workflow runs: the container image is built with the new prompt baked in, tagged with the commit SHA, and pushed to Azure Container Registry. The deployment workflow deploys the new image to the Foundry Dev project automatically. Integration and smoke tests run against the deployed dev agent. If tests pass, the workflow pauses at the Test environment gate and requests approval from a named reviewer. After approval, the same image is deployed to Foundry Test. Smoke tests run again. A second approval gate controls promotion to Foundry Prod. If at any point a health check or smoke test fails, the rollback workflow redeploys the previous image tag from the registry. The image tag of the last known-good deployment is stored as a GitHub environment variable. This flow means that no human ever deploys directly to any environment. Every environment state is traceable to a specific commit, image tag, and workflow run. Security and Governance AI agents often have access to sensitive data and external systems. Security and governance cannot be an afterthought. Identity and Access Use managed identity for the Foundry Hosted Agent to access Azure resources. Avoid service principal secrets where Microsoft Entra Workload Identity or managed identity is available. Apply the principle of least privilege: the agent identity should have read access to data sources and limited write access only where the use case requires it. Tool integrations that require API keys or external credentials should retrieve them from Azure Key Vault at runtime, never from environment variables baked into the image. Secrets and Configuration Store secrets in Azure Key Vault. Reference them in your Foundry project configuration using Key Vault references. Store GitHub Actions secrets using repository or environment-scoped secrets. Never echo secrets in workflow logs. Separate environment configuration (endpoints, resource names, capacity settings) from agent logic. Use the /infra/environments/ parameter files for this. Auditability and Review Enforce pull request reviews for all changes to /src/prompts , /src/agents , and /infra via CODEOWNERS. Require status checks to pass before merging. Blocked merges prevent untested changes reaching production. GitHub's workflow run history gives you a complete deployment audit trail. You can answer "what was deployed to prod on Tuesday and who approved it" in seconds. For regulated environments, consider branch protection rules that require signed commits. Safe Rollout Use canary or blue-green patterns where Foundry supports them for high-traffic agents. Always keep the previous image tag available in the registry. Do not delete images on deployment. Document and test your rollback procedure before you need it in production. Observability and Operational Readiness A deployed agent that you cannot observe is an agent you cannot operate. Build observability in from the start. What to Monitor Deployment health - Track whether each Foundry deployment succeeded and the agent is responding. Wire deployment outcomes back to GitHub workflow run status. Model and tool errors - Log tool call failures, model timeout errors, and safety filter activations. Aggregate these in Azure Monitor or Application Insights. Latency - Track end-to-end response latency per agent version. A latency increase after a model or prompt change is an early signal of a quality regression. Token consumption - Monitor token usage per request and per session. Unexpected increases can indicate prompt injection or runaway orchestration loops. Traceability - Log which agent version handled each request. Correlation between the image tag and request traces is essential for debugging production issues. Debugging and Alerting Use structured logging with a consistent schema. Include fields for agent version, session ID, tool called, and outcome. Set up alerts for error rate thresholds and latency percentiles. Alert before users notice the problem. For failed agent runs, ensure logs capture the full conversation context (within your data retention policy) so that developers can reproduce and diagnose the failure. Microsoft Foundry Toolboxes One of the most important additions to the Foundry platform is Toolboxes, currently in Public Preview. If you have ever seen an agent codebase where three different agents each wire the same search tool with their own credentials and slightly different configurations, you already understand the problem Toolboxes solve. A Toolbox is a named, versioned bundle of tools managed centrally in Microsoft Foundry. You define the tools once, configure authentication and access centrally, and publish a single MCP-compatible endpoint. Any agent in any runtime consumes that endpoint without per-tool wiring, custom SDK integration, or duplicated credential management. Figure: Before and after Foundry Toolboxes. Each agent previously managed its own tool connections. With Toolboxes, agents connect to one governed endpoint. The Four Pillars Discover (coming soon) - Find approved tools without browsing long catalogues. Reduces duplication by surfacing what already exists before developers build something new. Build (available today) - Select tools into a named toolbox. Supported types include built-in tools (Web Search, Code Interpreter, File Search, Azure AI Search), MCP servers, Agent-to-Agent (A2A) endpoints, and OpenAPI-defined services. Consume (available today) - A single MCP-compatible endpoint exposes every tool in the toolbox to any agent runtime. Agents that can speak MCP can use a Foundry Toolbox without any Foundry-specific SDK dependency. Govern (coming soon) - Centralised authentication and observability applied to every tool call flowing through the toolbox. Security and platform teams get consistent controls without asking developers to bolt governance onto every agent individually. Toolboxes and GitOps: A Natural Fit Toolboxes are particularly well-suited to a GitOps delivery model because the toolbox definition is a discrete, versioned artefact. Instead of credentials and tool configuration scattered across agent codebases, the toolbox becomes its own managed entity with its own version history. The key design property is that the toolbox endpoint URL is stable. When you promote a new toolbox version to be the default, agents consuming the endpoint pick up the update without any code changes. This means you can update tool configuration, add a new MCP server, or rotate credentials in the toolbox without redeploying every agent that uses it. Figure: Toolbox versioning in a GitOps model. Commits trigger CI validation and deployment of new toolbox versions. The stable endpoint URL allows agents to consume updates without redeployment. Adding a Toolbox to Your Repository In your GitOps repository, toolbox definitions belong in /src/tools/toolbox_config.py or as a declarative configuration file checked into version control. The following example creates a toolbox that combines web search, Azure AI Search over internal documentation, and a GitHub MCP server: # src/tools/toolbox_config.py # Run this via CI to create or update a toolbox version in Foundry. from azure.identity import DefaultAzureCredential from azure.ai.projects import AIProjectClient import os client = AIProjectClient( endpoint=os.environ["FOUNDRY_PROJECT_ENDPOINT"], credential=DefaultAzureCredential() ) toolbox_version = client.beta.toolboxes.create_toolbox_version( toolbox_name="customer-feedback-toolbox", description="Tools for triaging customer feedback: search, docs, and GitHub.", tools=[ { "type": "web_search", "description": "Search approved public documentation sites.", "custom_search_configuration": { "project_connection_id": os.environ["BING_CONNECTION_NAME"], "instance_name": os.environ["BING_INSTANCE_NAME"] } }, { "type": "azure_ai_search", "name": "product-manuals-search", "description": "Search internal product documentation.", "azure_ai_search": { "indexes": [ { "index_name": os.environ["SEARCH_INDEX_NAME"], "project_connection_id": os.environ["SEARCH_CONNECTION_ID"] } ] } }, { "type": "mcp", "server_label": "github", "server_url": "https://api.githubcopilot.com/mcp", "project_connection_id": os.environ["GITHUB_CONNECTION_ID"] } ], ) print(f"Toolbox version created: {toolbox_version.version}") print(f"MCP endpoint: {toolbox_version.mcp_endpoint}") To promote a toolbox version to be the default (the endpoint agents use without specifying a version), add this to your deployment workflow: # Promote toolbox version to default after validation toolbox = client.beta.toolboxes.update( toolbox_name="customer-feedback-toolbox", default_version=toolbox_version.version, ) print(f"Default version is now: {toolbox.default_version}") The stable endpoint for agents consuming this toolbox is: https://<your-project>.services.ai.azure.com/api/projects/<project>/toolbox/customer-feedback-toolbox/mcp?api-version=v1 Attaching the Toolbox to Your Hosted Agent In your agent code, connect to the toolbox via a single MCP tool definition. The agent gains access to every tool in the toolbox without knowing their individual configurations: # src/agents/agent.py (relevant excerpt) from agent_framework import MCPStreamableHTTPTool import httpx, os toolbox_endpoint = os.environ["FOUNDRY_TOOLBOX_ENDPOINT"] http_client = httpx.AsyncClient( auth=_ToolboxAuth(token_provider), # Microsoft Entra bearer token timeout=120.0, ) mcp_tool = MCPStreamableHTTPTool( name="toolbox", url=toolbox_endpoint, http_client=http_client, load_prompts=False, ) # Agent now has access to web search, AI Search, and GitHub MCP # through one tool definition and one authenticated connection. GitOps Workflow Extension for Toolboxes Add a dedicated job to your build-deploy workflow to create and promote toolbox versions as part of the same CI/CD pipeline: deploy-toolbox: name: Deploy Toolbox Version needs: validate runs-on: ubuntu-latest environment: dev permissions: id-token: write contents: read steps: - uses: actions/checkout@v4 - name: Azure login (OIDC) uses: azure/login@v3 with: client-id: ${{ secrets.AZURE_CLIENT_ID_DEV }} tenant-id: ${{ secrets.AZURE_TENANT_ID }} subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }} - name: Create toolbox version in Foundry env: FOUNDRY_PROJECT_ENDPOINT: ${{ vars.FOUNDRY_PROJECT_ENDPOINT_DEV }} BING_CONNECTION_NAME: ${{ vars.BING_CONNECTION_NAME }} BING_INSTANCE_NAME: ${{ vars.BING_INSTANCE_NAME }} SEARCH_INDEX_NAME: ${{ vars.SEARCH_INDEX_NAME }} SEARCH_CONNECTION_ID: ${{ vars.SEARCH_CONNECTION_ID }} GITHUB_CONNECTION_ID: ${{ vars.GITHUB_CONNECTION_ID }} run: python src/tools/toolbox_config.py Key points to note: Toolbox configuration is Python code in source control, reviewed through pull requests like any other change Connection IDs and index names are environment variables from GitHub Actions variables, not hardcoded in the script The same script runs for dev, test, and prod with different environment variable bindings Toolbox version promotion is a separate step from agent deployment, so you can update tools independently of the agent container Because the toolbox endpoint is stable, rolling back a toolbox version does not require rolling back the agent image Common Pitfalls Teams adopting this pattern commonly make the following mistakes. Identifying them early saves significant operational pain later. Treating prompts as unmanaged text. If your system prompt lives in a portal text box rather than a versioned file, you have no history, no review process, and no rollback capability. Move prompts into source control on day one. Deploying manually from the portal. Even one manual deployment breaks the GitOps contract. Your repository no longer reflects the true state of the environment. Automate everything and remove portal deployment permissions from individuals. Mixing environment configuration into source files. Hardcoded endpoint URLs or model deployment names in agent_config.json mean your dev and prod configurations diverge at the source level. Use parameter files and environment variables resolved at deployment time. Poor separation between agent logic and tool logic. When agents and tools are tightly coupled in a single file, a tool change requires a full agent review and redeployment. Keep them separate so they can evolve independently. Not versioning your Toolbox definition. Defining a Foundry Toolbox interactively through the portal gives you no audit trail and no rollback path. The toolbox configuration script belongs in source control alongside your agent code. Skipping evaluation before promotion. Deploying a prompt change without running a structured evaluation against a representative test set is how regressions reach production. Build evaluation into the pull request workflow, not just the deployment workflow. No rollback plan. If your first rollback is unplanned and urgent, it will be slow and stressful. Test your rollback procedure in a non-production environment and document the steps. Ignoring token and cost signals. AI workloads have variable cost profiles. A change that doubles average token consumption per request may be functionally correct but economically unsustainable. Monitor consumption as a first-class signal. Example GitHub Actions Workflow The following workflow runs on pull request validation and on merge to main. It covers the core delivery lifecycle: validate, build, deploy to dev, and smoke test. # .github/workflows/build-deploy.yml name: Build and Deploy Foundry Hosted Agent on: push: branches: - main pull_request: branches: - main env: REGISTRY: myregistry.azurecr.io IMAGE_NAME: my-foundry-agent jobs: validate: name: Validate Agent Configuration runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Set up Python uses: actions/setup-python@v5 with: python-version: "3.12" - name: Install dependencies run: pip install -r requirements.txt - name: Validate agent config schema run: python scripts/validate_agent.py - name: Run unit tests run: pytest tests/unit/ -v - name: Lint code run: ruff check src/ build: name: Build and Push Container Image needs: validate runs-on: ubuntu-latest if: github.ref == 'refs/heads/main' permissions: id-token: write contents: read outputs: image_tag: ${{ steps.meta.outputs.version }} steps: - uses: actions/checkout@v4 - name: Azure login (OIDC) uses: azure/login@v3 with: client-id: ${{ secrets.AZURE_CLIENT_ID }} tenant-id: ${{ secrets.AZURE_TENANT_ID }} subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }} - name: Log in to Azure Container Registry run: az acr login --name ${{ env.REGISTRY }} - name: Extract metadata id: meta uses: docker/metadata-action@v5 with: images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }} tags: | type=sha,format=short - name: Build and push image uses: docker/build-push-action@v7 with: context: . push: true tags: ${{ steps.meta.outputs.tags }} deploy-dev: name: Deploy to Foundry Dev needs: build runs-on: ubuntu-latest environment: dev permissions: id-token: write contents: read steps: - uses: actions/checkout@v4 - name: Azure login (OIDC) uses: azure/login@v3 with: client-id: ${{ secrets.AZURE_CLIENT_ID_DEV }} tenant-id: ${{ secrets.AZURE_TENANT_ID }} subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }} - name: Deploy agent to Foundry Dev project run: | az ai foundry agent deploy \ --project ${{ vars.FOUNDRY_PROJECT_DEV }} \ --image ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ needs.build.outputs.image_tag }} \ --environment dev - name: Run smoke tests against dev run: pytest tests/smoke/ -v --base-url ${{ vars.AGENT_URL_DEV }} deploy-test: name: Deploy to Foundry Test needs: deploy-dev runs-on: ubuntu-latest environment: test permissions: id-token: write contents: read steps: - uses: actions/checkout@v4 - name: Azure login (OIDC) uses: azure/login@v3 with: client-id: ${{ secrets.AZURE_CLIENT_ID_TEST }} tenant-id: ${{ secrets.AZURE_TENANT_ID }} subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }} - name: Deploy agent to Foundry Test project run: | az ai foundry agent deploy \ --project ${{ vars.FOUNDRY_PROJECT_TEST }} \ --image ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ needs.build.outputs.image_tag }} \ --environment test - name: Run smoke tests against test run: pytest tests/smoke/ -v --base-url ${{ vars.AGENT_URL_TEST }} Key decisions in this workflow: Validation runs on every pull request, not just on merge. Fast feedback catches problems before review. The container image is built once and the image tag is passed forward to deployment jobs. The same artefact is promoted across environments. Authentication uses OIDC federated credentials via azure/login@v3 with id-token: write permissions. No long-lived secrets are stored in GitHub for Azure authentication. The environment: test directive in the deploy-test job triggers a GitHub environment approval gate. A named reviewer must approve before the job runs. Smoke tests run after every deployment. A failed smoke test prevents further promotion. Best Practices Checklist Use this checklist when adopting the GitOps pattern for a Microsoft Foundry Hosted Agent: All agent artefacts, including prompts, tool definitions, model configuration, and Toolbox configuration scripts, are committed to source control No manual deployments to any environment; all changes flow through GitHub Actions workflows Pull request reviews are enforced for all changes to agent logic, prompts, and infrastructure via CODEOWNERS Unit tests cover tool logic; integration tests cover end-to-end agent behaviour; smoke tests cover deployed environments Container images are built once per commit and promoted across environments; images are not rebuilt per environment Environment configuration (endpoints, resource names) lives in parameter files, never in source code Secrets are stored in Azure Key Vault and accessed via managed identity at runtime GitHub environment approval gates control promotion from dev to test to prod Foundry Toolboxes are used to centralise tool definitions, credentials, and access governance across all agents; the toolbox configuration script is version-controlled and deployed through CI/CD Toolbox versions are promoted via the update default_version API step in the deployment workflow, not manually through the portal Latency, error rate, and token consumption are monitored with alerting thresholds The rollback procedure is documented, automated, and has been tested in a non-production environment GitHub Issues are used to record the intent behind significant changes and link to the pull requests that implement them Branch protection rules prevent direct pushes to main and require status checks to pass before merge The previous image tag is retained in the registry and stored as a GitHub environment variable for rollback Conclusion A Microsoft Foundry Hosted Agent is not something you deploy once and forget. Prompts evolve, tools change, models are upgraded, and policy requirements shift. Every one of those changes has the potential to alter agent behaviour in ways that affect users, costs, and compliance posture. GitOps, implemented through GitHub and GitHub Tasks, gives you the operational discipline to manage that complexity. Source control for all artefacts. Pull request review for every change. Automated validation, build, and deployment. Environment promotion gates. A complete audit trail from task to production. These are not bureaucratic overhead; they are the foundation of reliable, trustworthy AI agent operations. The teams that operate AI agents well are the ones that treat them like production software from the start. The investment in pipeline, structure, and governance pays back every time a change goes smoothly, every time a rollback takes minutes rather than hours, and every time a security or compliance reviewer can answer their question from a pull request history rather than a support ticket. Build the discipline in early. Your future self, and your production environment, will benefit from it. References Microsoft Foundry documentation Microsoft Foundry Agent Service documentation Microsoft Foundry Toolboxes documentation Introducing Toolboxes in Foundry (Microsoft Developer Blog) GitHub Actions documentation GitHub Projects and Tasks documentation Azure Container Registry documentation Azure Key Vault documentation Microsoft Entra Managed Identities documentation OpenGitOps Principles
Lee_Stott
Jun 02, 2026 Place Microsoft Developer Community Blog
707Views
0likes
0Comments
Building a hands-free voice concierge with Microsoft Foundry Voice Live and a Hosted Agent
This post walks through a small, working sample that wires the browser microphone to Azure AI Speech Voice Live, binds the realtime session to a Foundry hosted agent, and lets the agent answer travel questions using tool calls. The full source, infrastructure, and labs live in the repository linked at the end. Why this combination matters Voice user interfaces have historically been hard to build well. Streaming audio, partial transcripts, barge-in, voice activity detection, tool dispatch, and audio playback have traditionally meant stitching together five or six services. The combination of Voice Live and a Foundry hosted agent collapses that into one realtime WebSocket session with a single binding field. Voice Live owns the audio loop: speech to text, neural text to speech, semantic turn detection, noise suppression, and echo cancellation. The Foundry hosted agent owns the brain: instructions, memory, model selection, evaluators, and tool calling. The link between them is one query parameter on the WebSocket URL. What this means in practice: the browser never sees a model API key, never instantiates a tool, and never owns the agent prompt. The browser does microphone capture and audio playback. Everything else lives server-side. The scenario The sample is called Contoso Travel Concierge. The user is mid-journey, hands and eyes busy, and wants to ask things like: What is the weather in Tokyo this weekend? Is BA005 from Heathrow on time? What time is check-in at the Marriott Marquis? Each question triggers a tool call on the hosted agent. The reply is short, speakable, and synthesised back to the user in under a second on a warm connection. Architecture There are four moving parts. Three of them are managed Azure services. Only the broker is your code. Browser client – captures PCM16 audio at 24 kHz and streams it over a WebSocket to the broker. Plays back audio chunks the broker forwards from Voice Live. Session broker (FastAPI) – authenticates to Azure with DefaultAzureCredential , builds the Voice Live WebSocket URL with a short-lived bearer token, and relays frames in both directions. Voice Live – the Azure AI Speech realtime endpoint. Transcribes the user, hands the text to the bound agent, and synthesises the agent’s reply. Foundry hosted agent – a prompt-kind agent in Azure AI Foundry with instructions, tool definitions, and the microsoft.voice-live.enabled metadata flag set to true . Two design choices are worth calling out. The broker is small on purpose. It does authentication, URL construction, and WebSocket relay. It does not transcode audio, run business logic, or hold conversation state. Voice Live and the agent already do those things well. The agent binding is a URL query parameter, not an SDK call. There is no per-turn HTTP request to the agent runtime. Voice Live opens a session against the agent once and streams turns through it for the lifetime of the WebSocket. That is what keeps latency low. The Voice Live URL contract This is the single most important thing to get right. The public Microsoft sample that ships under liupeirong/ai-foundry-voice-agent targets a different URL shape ( services.ai.azure.com host, agent-id + agent-access-token parameters, an Authorization header). That shape is rejected by Foundry resources that expose voice-live-enabled agents. The shape below is the one the portal itself uses, and the one this sample dials. Three details cause most failures: The host must be <resource>.cognitiveservices.azure.com , not services.ai.azure.com . The broker rewrites this automatically from VOICE_LIVE_ENDPOINT . The bearer token travels in the authorization query parameter, URL-encoded, with a literal Bearer prefix and a + (or %20 ) before the token. No Authorization header is sent. agent-name and model are both the agent’s display name. agent-version is empty when you want the latest published version. Walkthrough: from clone to spoken reply Prerequisites Python 3.11 or later (the sample is developed on 3.13). The Azure CLI, signed in with az login --tenant <your-tenant-id> . An Azure AI Foundry project in a Voice Live region ( eastus2 , swedencentral , or westus2 ). A deployed prompt-kind agent in that project with Enable Voice Live turned on. The Cognitive Services User role on the Foundry resource for the identity the broker will use. Configure the broker Copy .env.sample to .env and fill in four values: AZURE_AI_PROJECT_ENDPOINT=https://<your-resource>.services.ai.azure.com AZURE_AI_PROJECT_NAME=<your-foundry-project-name> VOICE_LIVE_ENDPOINT=wss://<your-resource>.services.ai.azure.com/voice-live/realtime VOICE_LIVE_API_VERSION=2025-10-01 FOUNDRY_AGENT_ID=<your-agent-name> The agent name is what the Foundry portal shows on the agent card. The broker uses it for both the agent-name and model query parameters. Install and run python -m venv .venv .\.venv\Scripts\Activate.ps1 pip install -r requirements.txt .\scripts\start-local.ps1 The broker exposes three endpoints: GET /healthz – liveness probe. GET /config – returns the session.update the browser sends as its first frame. WS /ws – the bi-directional relay to Voice Live. Smoke test .\scripts\test-session.ps1 A successful run prints: [OK] /ws upgraded -> sent session.update <- {"type":"session.created",…} <- {"type":"session.updated",…} [OK] session.updated received -- E2E works This confirms the entire chain: local broker, DefaultAzureCredential token, Foundry Portal URL shape, Voice Live handshake, and the bound agent acknowledging the session. Open the browser UI Browse to http://localhost:8000/ , click Start talking, and ask one of the sample questions. Transcripts appear in real time and the spoken reply plays back through the audio context. Inside the broker The relay logic is tiny – the heavy lifting is the URL construction. The function below is the canonical reference; copy it if you are porting the pattern to another language. def build_voice_live_ws_url(agent_access_token: str) -> str: """ Build the Foundry Portal style Voice Live WebSocket URL. Auth lives in the query string only. No Authorization header is sent. """ host = _ws_host_from_endpoint(VOICE_LIVE_ENDPOINT) qs = urlencode( { "trafficType": "FoundryPortal", "agent-name": FOUNDRY_AGENT_ID, "agent-version": "", "agent-project-name": AZURE_AI_PROJECT_NAME, "api-version": VOICE_LIVE_API_VERSION, "model": FOUNDRY_AGENT_ID, "client-request-id": str(uuid.uuid4()), "authorization": f"Bearer {agent_access_token}", }, quote_via=quote, ) return f"wss://{host}/voice-live/realtime?{qs}" The relay itself is a pair of asyncio tasks: one forwarding browser frames upstream, one forwarding Voice Live frames back. Audio bytes are passed straight through – the broker never decodes them. Deploying the hosted agent The most reliable way to create a voice-live-enabled agent is the Foundry portal. Agents created via the Assistants v2 SDK do not carry the required metadata by default and will be rejected by the Voice Live URL shape above. The portal steps are: Open the Foundry project, go to Agents, and click New agent. Choose Prompt agent as the kind, name it (for example travel-concierge ), and pick a model deployment. Paste the contents of agent/src/prompts/system.txt into the instructions box. On the Voice tab, switch Enable Voice Live on. This is what sets the microsoft.voice-live.enabled = true metadata. Add the three tools ( get_weather , get_flight_status , get_hotel_info ) from agent/agent.yaml on the Tools tab. Publish the version and write the agent name back to .env as FOUNDRY_AGENT_ID . The full deployment guide, including how to host the broker on Azure Container Apps with a managed identity, is in docs/deployment.md in the repository. Three lessons from getting this to production 1. Voice output must be written for speech, not for screens Foundry agents tend to format answers in markdown with citations like ([data.jma.go.jp](https://…)) . When Voice Live synthesises that text, the user hears the URL read aloud, character by character. The fix is to write the agent instructions so the spoken text never contains URLs, markdown, or symbols. A short block at the end of the agent instructions does the job: Voice output rules - This output is read aloud by TTS. Never include URLs, domain names, or citation markers like "(source.com)" in your reply. Cite by speakable source name only. - Never use markdown for formatting. No asterisks, brackets, backticks, bullets, or hashes. Write in plain spoken sentences. - Keep numbers speakable: say "thirty degrees Celsius", not "30C / 86F". - Keep replies under about 40 words unless the user asks for detail. The browser transcript can still render markdown for the eyes. The sample does so with a small, escaping markdown renderer that whitelists bold, italic, code, and http(s) links only, so the same agent reply looks polished on screen even though the spoken version contains none of it. 2. Identity is simpler than it looks The broker uses DefaultAzureCredential and requests the https://ai.azure.com/.default scope. Locally that resolves to your az login credentials. In Azure Container Apps it resolves to the user-assigned managed identity. In both cases the only role assignment you need on the Foundry account is Cognitive Services User. There is no API key path on the working URL shape – it is bearer tokens all the way down. 3. The wrong sample wastes a day If you start from the public liupeirong/ai-foundry-voice-agent repository against a portal-provisioned voice-live agent, the WebSocket either returns HTTP 400 or closes silently with code 1006. The cause is the URL shape, not your code. The reference probe in scripts/probe_portal_shape.py is the single source of truth for the working contract – keep it as a regression test. Responsible AI and security notes Credentials never reach the browser. Tokens are minted server-side and travel only on the upstream Voice Live URL. No secrets in source. The .env file is gitignored. The .env.sample contains only placeholders. Markdown rendering is escape-first. The browser HTML-escapes the agent reply before applying its small markdown whitelist, and links are restricted to http(s) URLs so the rule cannot emit javascript: hrefs. Tool calls are auditable. Every turn shows up as a run in the Foundry portal under the agent, with the prompt, model output, and tool inputs and outputs visible for review. Voice biometric considerations. If you plan to handle account verification by voice, plug in dedicated speaker recognition rather than relying on the conversational model. Key takeaways Voice Live plus a Foundry hosted agent is a session-level integration, not an API integration. One URL, one binding field, one WebSocket. The browser is a thin client. Authentication, URL construction, and relay all live in a small FastAPI broker. Get the URL shape right ( cognitiveservices.azure.com , token in the query string, agent-name equals model equals the agent display name) and the rest is plumbing. Use the Foundry portal to create the agent so the voice-live metadata is set correctly. Write agent instructions for the ear, not the eye, then layer screen formatting on top in the browser. Get the code and try it Repository: github.com/microsoft/foundry-agent-voice-mode-sample Deployment guide: docs/deployment.md in the repository. Labs: three progressive workshops under labs/ – basic voice, adding tools, and binding a hosted agent. Reference docs: Voice Live in Azure AI Speech and Agents in Microsoft Foundry. If you build something on top of this pattern, open an issue or pull request on the repository. The sample is intentionally small so it stays easy to fork.
Lee_Stott
May 29, 2026 Place Educator Developer Blog
272Views
0likes
0Comments
CI/CD for AI Agents on Microsoft Foundry
Introduction Building an AI agent is the straightforward part. Shipping it reliably to production with version control, evaluation-driven quality gates, multi-environment promotion, and enterprise governance is where most teams run into friction. Microsoft Foundry changes this. It is Microsoft's AI app and agent factory: a fully managed platform for building, deploying, and governing AI agents at scale. It provides a first-class agent runtime with built-in lifecycle management, making it possible to apply the same CI/CD rigour you already use for application software to AI agents — regardless of whether you are building containerised hosted agents or declarative prompt-based agents. This post walks through a complete, production-ready reference architecture for doing exactly that. You will find the GitHub Actions workflow, the Azure DevOps pipeline YAML, and the architecture diagram linked throughout. Reference implementation repository: foundry-agents-lifecycle and CI/CD for AI Agents on Microsoft Foundry Why Agent CI/CD Is Different Traditional software pipelines gate releases on test pass/fail. Agent pipelines require an additional, critical layer: evaluation-driven quality gates. Before any agent version can be promoted to the next environment, it must pass three categories of evaluation: Quality — answer correctness, task completion rate, hallucination rate Safety — grounded responses, policy compliance, tool usage validation Performance — token usage per query, p95 response latency A second key difference is the deployment unit. You are not deploying a binary or a container tag in isolation. You are deploying an agent version — an immutable artefact that bundles the model selection, system instructions, tool definitions, and configuration together. This is what enables deterministic promotion and full auditability across environments. "Agents follow a standard CI/CD pattern, but with a critical shift: promotion happens at the agent version level, and release gates are driven by evaluation outcomes, not just test results." Reference Architecture Figure 1: End-to-end CI/CD reference architecture for hosted and prompt-based agents on Microsoft Foundry. The architecture has five logical layers, flowing from developer commit to production monitoring: Layer 1 — Developer Layer The developer layer is a standard source-controlled repository in GitHub or Azure DevOps. It contains: Agent code written in Python or .NET agent.yaml or prompt definition files for prompt-based agents Tool configurations: MCP servers, REST API connectors, or other integrations Infrastructure as Code: Bicep or ARM templates for provisioning the Foundry project and dependencies Layer 2 — CI Pipeline (Build · Validate · Evaluate) Every push or pull request triggers the CI pipeline. It performs five steps: Docker build — for hosted agents, build and tag the container image Static checks — lint with ruff , security scan with bandit , agent YAML schema validation Unit and tool tests — pytest suites covering agent logic and tool integrations Evaluation gate — run evaluation datasets; fail the pipeline if thresholds are breached Image push — push the validated container to Azure Container Registry (ACR) Prompt-based agents skip the Docker build step. Instead, the YAML definition and prompt bundle are validated against schema and evaluated against golden datasets. Layer 3 — CD Pipeline (Multi-stage Promotion) A single agent version is promoted through three Foundry project environments: Stage Environment Activities Gate Stage 1 Dev Foundry Project Deploy vNext version, smoke tests, developer evals Eval quality thresholds Stage 2 Test / QA Foundry Project Scenario tests, HITL validation, safety evaluation Eval gates + human approval Stage 3 Production Foundry Project Promote version, enable endpoint, post-deploy smoke test Required reviewer approval Rollback is straightforward: switch the active version pointer back to the previous agent version. No re-deployment is needed. Layer 4 — Microsoft Foundry Agent Service The Foundry Agent Service runtime provides: Hosted agent runtime — managed container execution supporting Agent Framework, LangGraph, Semantic Kernel, or custom code Prompt-based agent runtime — declarative agent definitions, no container required Built-in lifecycle operations — version, start, stop, rollback Entra Agent Identity — each deployed version receives a dedicated Microsoft Entra managed identity RBAC and policy enforcement — Azure role-based access controls per project Observability — distributed traces, structured logs, and evaluation signals Layer 5 — Monitoring, Governance, and Control Plane Foundry control plane: agent registry, environment configuration, version history OpenTelemetry forwarded to Azure Monitor and Application Insights Continuous evaluation pipelines for ongoing quality, grounding, and safety monitoring Azure Policy and RBAC enforcement at the platform level Environment Topology There are two topology options. We recommend Option A for all production workloads: Option Structure Best for Trade-off A — Recommended Dev Project → Test Project → Prod Project (separate Foundry projects) Enterprise workloads Full isolation, clean RBAC boundaries, easier governance B — Lightweight Single Foundry project with agent version tags (dev/test/prod) Small teams, prototyping Simpler setup, but weaker environment separation Separate projects mean separate RBAC policies, separate connection strings, and separate evaluation signals. A developer service principal has access only to the Dev project; the CI/CD identity has restricted access to promote to Test and Production. Evaluation Gates — The Core Difference Evaluation gates transform a standard software pipeline into an AI-safe deployment pipeline. They run at two points: pre-merge (CI) and pre-promotion (CD). Defining the Gates Category Metric CI threshold Prod threshold Quality Hallucination rate < 5% < 3% Quality Task completion rate > 90% > 95% Safety Grounded response rate > 95% > 98% Safety Policy violations 0 0 Performance p95 latency < 4 000 ms < 3 000 ms Cost Token usage per query Track only Alert on > 20% regression Gate Enforcement (Python) import json import sys def check_gates(results_path: str) -> None: with open(results_path) as f: results = json.load(f) failures = [] if results["hallucination_rate"] > 0.05: failures.append(f"Hallucination rate {results['hallucination_rate']:.1%} exceeds 5% threshold") if results["task_completion_rate"] < 0.90: failures.append(f"Task completion {results['task_completion_rate']:.1%} below 90% threshold") if results["latency_p95_ms"] > 4000: failures.append(f"p95 latency {results['latency_p95_ms']}ms exceeds 4000ms threshold") if results.get("policy_violations", 0) > 0: failures.append(f"Policy violations detected: {results['policy_violations']}") if failures: for f in failures: print(f"GATE FAILED: {f}", file=sys.stderr) sys.exit(1) print("All evaluation gates passed — proceeding to deployment") if __name__ == "__main__": check_gates(sys.argv[1]) Hosted vs Prompt-Based Agents — Pipeline Differences Capability Hosted Agents Prompt-Based Agents Deployment unit Container image + agent definition YAML / prompt configuration bundle Build step required Yes — Docker build + ACR push No — YAML validation only Supported frameworks Agent Framework, LangGraph, Semantic Kernel, custom Foundry declarative runtime Promotion artefact Versioned agent with container image reference Versioned prompt/config bundle CI focus Code quality, tool tests, evaluation Prompt schema validation, evaluation Rollback mechanism Switch active agent version Switch active agent version Runtime management Foundry manages container lifecycle Foundry manages declarative runtime CI Pipeline Walkthrough The following steps are representative of the full GitHub Actions workflow available in github-actions-pipeline.yml alongside this post. Hosted Agent CI # 1. Static checks ruff check . bandit -r src/ -ll python scripts/validate_agent_config.py --config agent.yaml # 2. Tests pytest tests/unit/ -v --tb=short pytest tests/tools/ -v --tb=short # 3. Evaluation gate python scripts/run_evaluations.py \ --dataset eval/datasets/golden_set.jsonl \ --output eval/results/results.json python scripts/check_eval_gates.py \ --results eval/results/results.json \ --max-hallucination 0.05 \ --min-task-completion 0.90 \ --max-latency-p95 4000 # 4. Push container image az acr build \ --registry myregistry.azurecr.io \ --image "myagent:$SHA" \ --file Dockerfile . Prompt-Based Agent CI # Validate YAML / prompt definitions python scripts/validate_agent_config.py --config agent.yaml # Evaluation against golden dataset python scripts/run_evaluations.py \ --dataset eval/datasets/golden_set.jsonl \ --output eval/results/results.json python scripts/check_eval_gates.py \ --results eval/results/results.json CD Pipeline Walkthrough Stage 1 — Dev Deployment python scripts/deploy_agent.py \ --env dev \ --image "myregistry.azurecr.io/myagent:$SHA" \ --foundry-endpoint $FOUNDRY_ENDPOINT_DEV \ --agent-config agent.yaml # Returns the new agent version ID, stored for promotion AGENT_VERSION=$(python scripts/get_active_version.py --env dev) Stage 2 — Promote to Test (after approval gate) python scripts/promote_agent.py \ --from-env dev \ --to-env test \ --agent-version $AGENT_VERSION \ --foundry-endpoint $FOUNDRY_ENDPOINT_TEST # Run scenario tests and safety evaluation python scripts/run_evaluations.py \ --dataset eval/datasets/scenario_set.jsonl \ --output eval/results/test-results.json python scripts/check_eval_gates.py \ --results eval/results/test-results.json \ --max-hallucination 0.03 \ --min-task-completion 0.95 Stage 3 — Promote to Production (after required reviewer approval) python scripts/promote_agent.py \ --from-env test \ --to-env prod \ --agent-version $AGENT_VERSION \ --foundry-endpoint $FOUNDRY_ENDPOINT_PROD # Enable the production endpoint python scripts/enable_agent_endpoint.py \ --agent-version $AGENT_VERSION \ --foundry-endpoint $FOUNDRY_ENDPOINT_PROD Rollback # Switch the active version to the previous known-good version python scripts/promote_agent.py \ --from-env prod \ --to-env prod \ --agent-version $PREVIOUS_AGENT_VERSION \ --foundry-endpoint $FOUNDRY_ENDPOINT_PROD # OR delete the failing version python scripts/delete_agent_version.py \ --agent-version $AGENT_VERSION \ --foundry-endpoint $FOUNDRY_ENDPOINT_PROD Deployment Using the Azure AI Projects SDK The azure-ai-projects SDK provides programmatic control over the full agent lifecycle. This is the recommended approach for CI/CD scripts where you need deterministic, scriptable deployment. from azure.identity import DefaultAzureCredential from azure.ai.projects import AIProjectClient # Connect to the Foundry project client = AIProjectClient( endpoint=FOUNDRY_PROJECT_ENDPOINT, credential=DefaultAzureCredential() ) # List existing agents (useful for idempotent deploy scripts) for agent in client.agents.list(): print(f"Agent: {agent.name} version: {agent.id}") # Create a new agent version (hosted agent) agent = client.agents.create_agent( model="gpt-4o", name="my-enterprise-agent", instructions="You are a helpful assistant ...", tools=[...], # tool definitions metadata={"version": GIT_SHA, "environment": "dev"} ) print(f"Created agent version: {agent.id}") For hosted agents, the SDK call also references the container image pushed to ACR. Refer to the Deploy a hosted agent — Microsoft Foundry documentation for the full SDK flow including container image registration and version polling. Reference Implementation Stack Concern Technology Source control and pipelines GitHub Actions or Azure DevOps Pipelines Infrastructure and agent deployment Azure Developer CLI ( azd up ) Programmatic agent lifecycle azure-ai-projects Python SDK Agent evaluation azure-ai-evaluation Python SDK Agent runtime Microsoft Foundry Agent Service Container registry Azure Container Registry (hosted agents only) Observability OpenTelemetry, Azure Monitor, Application Insights Identity and access Microsoft Entra (Agent ID, OIDC workload identity federation) Governance Azure Policy, RBAC, Foundry control plane Governance and Responsible AI Shipping AI agents at enterprise scale requires governance beyond what a traditional CI/CD pipeline provides. Microsoft Foundry addresses this at the platform level: RBAC per environment — each Foundry project has independent access controls. Developers deploy to Dev; only CI/CD service principals (with audited OIDC tokens) can promote to Test and Production. Agent registry and audit trail — the Foundry control plane records which agent version is active in each environment, who deployed it, and when. This satisfies enterprise audit requirements without additional tooling. Content safety and policy enforcement — Azure Policy governs model access, data handling, and content safety rules at the infrastructure level, not just at the application code level. Policy violations block deployment automatically. Entra Agent Identity — each deployed agent version receives a dedicated, short-lived managed identity. Agents authenticate to downstream services using least-privilege credentials scoped to that specific deployment. Continuous evaluation in production — evaluation pipelines run on sampled production traffic, alerting when quality, safety, or cost metrics drift from their baseline. A key trade-off to be transparent about: evaluation datasets must be maintained and updated as the agent's tasks evolve. Stale datasets produce misleading pass/fail signals. Treat your golden evaluation set as a first-class engineering artefact alongside the agent code itself. Pipeline Files Two pipeline files accompany this reference architecture. Both implement the same four-stage pipeline (CI Build, CI Evaluate, CD Dev, CD Test, CD Production) with environment-appropriate approval gates. github-actions-pipeline.yml — GitHub Actions workflow. Uses GitHub Environments for approval gates and OIDC Workload Identity Federation for passwordless Azure authentication. No stored Azure credentials required. azure-devops-pipeline.yml — Azure DevOps multi-stage YAML pipeline. Uses ADO Environments with required approvers and variable groups per environment. Both pipelines share these security practices: OIDC / Workload Identity Federation — no long-lived Azure credentials stored in pipeline secrets Per-environment variable groups, each with scoped connection strings and endpoints Evaluation quality gates enforced before every promotion step Mandatory human approval before production deployment Summary The full pipeline in one view: Developer commit | CI Pipeline ├── Docker build (hosted agents) / YAML validation (prompt agents) ├── Static checks + unit tests + tool tests └── Evaluation gate ← quality · safety · performance | Agent Version created ← immutable, versioned artefact | CD Pipeline ├── Deploy to Dev → smoke tests + eval gate ├── Promote to Test → scenario tests + HITL + approval gate └── Promote to Prod → enable endpoint + monitoring | Microsoft Foundry Agent Service └── Versioned runtime · Entra identity · RBAC · Observability | Control Plane └── Agent registry · Governance · Continuous evaluation Microsoft Foundry provides the platform primitives — versioned agent deployments, multi-environment Foundry projects, built-in lifecycle management, and an enterprise observability stack — needed to operate AI agents with the same confidence as any production software system. The key takeaway: treat the agent version as your deployment artefact, and evaluation outcomes as your release gate. The rest follows familiar CI/CD patterns you already know and trust. Next Steps Clone the CI/CD Repo at leestott/foundry-cicd Clone the reference demo: foundry-agents-lifecycle on GitHub Set up your environment: Set up your environment for Foundry Agent Service Deploy your first hosted agent: Quickstart: Deploy your first hosted agent Understand hosted agent concepts: Foundry Hosted Agents concepts Automate deployments in CI/CD: Automate deployment of Microsoft Foundry agents Manage agent versions: Manage hosted agents — Microsoft Foundry Deploy via SDK: Deploy a hosted agent — Microsoft Foundry SDK and endpoint reference: Microsoft Foundry SDK and Endpoints reference Azure AI Projects SDK: azure-ai-projects Python SDK Azure Developer CLI: Azure Developer CLI (azd) overview Microsoft Foundry documentation hub: Microsoft Foundry on Microsoft Learn
Lee_Stott
May 22, 2026 Place Educator Developer Blog
13KViews
8likes
0Comments