get started

126 Topics

From Multi-Model Chaos to a Governed AI Gateway: Cost Optimization on Azure
What is Multi-Model Chaos, and what cost and security challenges does it pose? Multi-model chaos describes the sprawl that emerges when an organization rapidly adopts many large language and foundation models—OpenAI, Anthropic, Meta Llama, Mistral, and a long tail of open-source and fine-tuned variants—across teams and applications without any unifying control plane. Instead of a single governed entry point, each team wires its own keys, endpoints, SDKs, and prompts directly to whichever provider it prefers, leaving the enterprise with a fragmented, duplicated, and largely invisible AI estate. On the cost side, this fragmentation makes spend almost impossible to predict or contain, identical workloads run against premium models when cheaper ones would suffice, token consumption goes unmeasured, redundant calls and missing caching inflate bills, and finance teams have no consolidated view to attribute usage back to a team, product, or customer. On the security and governance side, the risks compound: API keys are scattered across code and config files, sensitive or regulated data flows to external endpoints with no data-loss prevention or residency guarantees, prompt-injection and jailbreak attempts go unmonitored, and there is no centralized authentication, rate limiting, auditing, or content filtering. The net effect is an uncontrolled attack surface and a compliance blind spot—precisely the conditions that motivate consolidating model access behind a governed AI gateway. In short, multi-model chaos trades short-term speed for runaway costs and an unmanaged security risk, making a governed AI Gateway essential. What is a Governed AI Gateway, and how do they help reduce cost and improve security? A governed AI gateway is an enterprise control plane built on Azure API Management (APIM) that consolidates every model behind a single, governed endpoint. It unifies Azure OpenAI (the gpt-5.4 family) and Azure AI Foundry (open-source and partner models such as grok-4.3 and DeepSeek-V4-Pro), so consumers reach any of them through one consistent, policy-enforced entry point rather than a tangle of direct connections. Every backend is password-less, authenticated through managed identity, which eliminates scattered API keys. On top of this foundation, the gateway enforces per-consumer model permissions, token-based rate limits, and cost-based budget downgrade—automatically routing teams to more economical models as they approach their spend limits—all administered from a self-service Admin UI. One governed endpoint for every backend. Azure OpenAI and Azure AI Foundry (OSS and partner) models are bundled behind a single governance endpoint. Each backend is reachable only over a private endpoint with key authentication disabled, so APIM authenticates using its own managed identity—no model keys ever live on the gateway. Per-consumer governance, edited live in the Admin UI with no redeployment: Allowed models — a consumer can call only the models explicitly granted to it; anything else returns a 403. Rate limits — per-consumer TPM and token-quota tiers (small / medium / large), returning a 429 once exceeded. Cost budget — a daily USD spend limit; when it is exceeded, requests are automatically downgraded to a cheaper model along a configured ladder, including cross-backend downgrades (e.g. gpt → OSS or OSS → gpt). Self-service Admin UI (React + FastAPI, Entra ID login, gated to an admin group) to issue consumer keys, set model, limit, and budget policies, and review the usage dashboard and request logs. Built-in observability — per-call token metrics, broken down by consumer and model, stream to Application Insights, surfaced through the Admin UI's usage dashboard and a request / blocked & downgrade-event log. Flexible client authentication — an APIM subscription key by default, or an Entra ID JWT (client_auth_mode). How is it different from APIM AI Gateway? APIM already provides useful GenAI gateway primitives: token rate limiting, token-usage metrics, semantic caching, backend routing, endpoint import, authentication, authorization, and monitoring. The difference is that APIM enables the enforcement runtime and policy control point, but not the full operating model required to run a shared, multi-tenant AI platform across teams, models, and budgets. Inside the policy pipeline, APIM remains the load-bearing layer: llm-token-limit enforces per-consumer token-per-minute and quota limits, llm-emit-token-metric streams token usage into our metrics namespace, and standard APIM capabilities handle endpoint exposure, access control, and platform monitoring. The governed AI Gateway adds the governance layer APIM does not provide out of the box: Self-service onboarding — a platform team can issue or revoke consumer keys and manage access from the Admin UI, without raising a pull request or redeploying infrastructure. Per-consumer model entitlements — every consumer has an explicit allow-list of model deployments. The gateway calculates the effective allowed set per request and returns 403 when a caller asks for a model it is not entitled to use. Live configuration without redeployment — entitlements, rate tiers, token quotas, budgets, and downgrade levels live in the configuration store. A sync worker projects those values into APIM named values continuously, so operational changes can take effect without a terraform apply while the policy logic stays version-controlled in IaC. Managed-identity-only, private backends — key-based authentication is disabled on Azure OpenAI and Azure AI Foundry. APIM injects a managed identity token on every backend call, and the backends are reachable only over private endpoints. Cost-based downgrade across backends — when a consumer approaches its budget, the gateway can route to a cheaper model while preserving availability, including cross-backend downgrades between Azure OpenAI and Azure AI Foundry. APIM’s AI gateway is the enforcement runtime while the governed AI Gateway is the platform operating model around it. APIM handles the gateway primitives extremely well, while our governance layer adds identity, self-service administration, entitlement management, live configuration, cost controls, and cross-model routing so teams can safely consume multiple models without creating new cost, security, or compliance sprawl. Solution overview Figure 1 shows the end-to-end architecture of the governed AI gateway. Client applications never talk to the models directly; instead, every request passes through Azure API Management, which acts as the single governed entry point that authenticates callers, applies per-consumer policy, and routes traffic privately to the appropriate model backend. Around this gateway sit the supporting planes for administration, identity, and observability, giving the organization one consistent place to control access, contain cost, and monitor usage across both Azure OpenAI and Azure AI Foundry models. This solution is also completely serverless. Key components: Client / consumer applications — the apps and services that call for model inference, each identified by its own consumer key or Entra ID identity. Azure API Management (the gateway) — the single governance endpoint that handles authentication, allowed-model checks, rate limiting, and cost-based budget downgrade before any request reaches a model. Model backends — Azure OpenAI (the gpt-5.4 family) and Azure AI Foundry (OSS and partner models such as grok-4.3 and DeepSeek-V4-Pro), each reachable only over a private endpoint. Microsoft Entra ID — provides identity for both clients (optional JWT auth) and the gateway's own managed identity used to reach the backends without password credentials. Admin UI (React + FastAPI) — the self-service control plane for issuing consumer keys and setting model, rate-limit, and budget policies. Application Insights — collects per-call token metrics by consumer and model, powering the usage dashboard and request / blocked-event logs. 1: Solution architecture diagram Request flow Authenticate — a client calls the gateway with an APIM subscription key (or an Entra ID JWT) instead of any model key. Authorize the model — APIM checks whether the consumer is permitted to call the requested model; if not, it returns 403. Enforce limits — the gateway applies the consumer's TPM and token-quota tier, returning 429 when the limit is exceeded. Apply the cost budget — if the consumer's daily USD budget is exhausted, the request is automatically downgraded to a cheaper model along the configured ladder. Route to the backend — APIM forwards the request over a private endpoint, authenticating with its managed identity to Azure OpenAI or Azure AI Foundry. Return and record — the model response is returned to the client while per-call token metrics are emitted to Application Insights and surfaced in the Admin UI dashboard and logs. Implement the solution This section describes how to deploy the solution architecture. In this post, you’ll perform the following tasks: Create APIM Create Cosmos DB Create Microsoft foundry with Gpt-5.4, Gpt-5.4-mini, DeepSeek-V4-Pro and Grok-4.3 deployed Create the Admin UI on container apps Create a consumer with an APIM subscription key on the Admin UI Integrate APIM endpoint with Github Copilot chat and Copilot CLI Create a budget and rate limit in the Admin UI Simulate and validate auto downgrade feature Ensure that you have the following prerequisites deployed before moving to the next section An Azure subscription with model quota (Azure OpenAI and, optionally, Azure AI Foundry models). Tools: Git, Terraform ≥ 1.7, Azure CLI, and az login to the subscription. Container images are built remotely in Azure Container Registry, so Docker is not required. VScode and Copilot CLI Deploy the Azure AI Gateway Clone the repository from https://github.com/microsoft/apim-foundry-governance git clone https://github.com/microsoft/apim-foundry-governance git checkout english By default the solution deploys in koreacentral region. Export your custom variables if needed. export location=eastus2 export backend-rg=rg-aigw-tfstate-dev-eastus2 export storage-prefix=staigwtfstate export state-key=ai-gateway-eus2.tfstate Bootstrap the Terraform state backend (once per subscription) This creates an eastus2 resource group + storage account for remote state (Entra auth, public blob access blocked). ./scripts/bootstrap-backend.sh \ --location $location \ --backend-rg $backend-rg \ --storage-prefix $storage-prefix \ --state-key $state-key Set Terraform variables cp infra/terraform.tfvars.example infra/terraform.tfvars # Edit infra/terraform.tfvars: prefix, location, owner, cost_center, apim_publisher_*, budget_* Create the Gateway Core On the first apply, leave worker_image and admin_ui_image empty (default ""). The images don't exist yet, and the worker Job / Admin UI app are count-gated on these variables. cd infra terraform init # If you are moving an existing state from another backend, run `terraform init -migrate-state` instead. terraform apply Build and push the container images with app registrations After the registry is created, build the worker and Admin UI images remotely (no local Docker needed). acr=$(terraform output -raw registry_login_server) reg=$(terraform output -raw registry_name) az acr build --registry $reg --image config-sync-worker:latest ../app/config-sync-worker az acr build --registry $reg --image admin-ui:latest ../app/admin-ui The worker and Admin UI requires entra app registrations for a user to access the frontend. Create the admin security group, BFF API App registrations and SPA public-client app registrations. ./scripts/app-registration.sh Enable the worker and Admin UI From the output above, populate the image references and the three Entra variables from the prerequisites into infra/terraform.tfvars and apply again. worker_image = "<registry_login_server>/config-sync-worker:latest" admin_ui_image = "<registry_login_server>/admin-ui:latest" admin_ui_public = true # external FQDN (still Entra-gated). false = VNet-only admin_group_object_id = "<entra security group object id>" bff_api_audience = "api://<bff app id>" spa_client_id = "<spa app id>" entra_tenant_id = "<tenant id>" CosmosDB Seed configuration Cosmos is private with key auth disabled, so the initial config is seeded from a jumpbox inside the VNet. Default confguration of enable_jumpbox = true in infra/terraform.tfvars triggers Terraform to: provision the jumpbox VM, grant it’s managed identity the Cosmos DB Built-in Data Contributor role (scoped to the config container), and runs a VM run-command that seeds both documents automatically: Global config (id=global) — allowed models + token limits. Per-model pricing (id=pricing) — prompt/completion rates for cost-based budgeting. To seed manually instead (jumpbox connected via Bastion), the same scripts can be run directly: # Global allowed models + limits ./scripts/seed-cosmos-jumpbox.sh https://<cosmos-account>.documents.azure.com:443/ # Per-model pricing (for cost-based budgeting) ./scripts/seed-pricing-jumpbox.sh https://<cosmos-account>.documents.azure.com:443/ Access the AdminUI Update the SPA with your containerapps url spa_app_id="$(az ad app list --display-name "AI Gateway SPA" --query "[].appId" -o tsv)" # spa_client_id fqdn=$(terraform output -raw admin_ui_fqdn) # run from infra/ oid=$(az ad app show --id "$spa_app_id" --query id -o tsv) az rest --method PATCH \ --uri "https://graph.microsoft.com/v1.0/applications/$oid" \ --headers 'Content-Type=application/json' \ --body "{\"spa\":{\"redirectUris\":[\"https://$fqdn\"]}}" Browse to the admin_ui_fqdn, which is also the container apps fqdn. You will need to login via EntraID (Users will need to be added to the Entra group for them to login). Go ahead and register the consumer with a name and issue the API key. The API key is the APIM subscription key and will only be shown once on the UI, so copy and paste it somewhere safe. 2: AI Gateway Consumers and Keys Next, on the left hand tab, click on budgets. This will set the daily budget limit a user is allowed to consume in a day and is also where the model downgrade logic resides. For the purpose of demonstration, set a low budget of $1.8 and select the model priority that you want the downgrade to occur. In this case, gpt-5.4 will be used first, followed by gpt-5.4-mini, DeepSeek then Grok. 3: AI Gateway Budgets Lastly, on the land hand tab, select Rate limits. This sets the amount of tokens a user can consume in a day. It is a daily limit and resets after 24 hours. Select the large tier. 4: AI Gateway Rate Limits Browse to Dashboard, it shows you all the token information, request status codes and group them by consumer and model. You can also view the budget downgrade for a specific user. 5: AI Gateway Captions Integrate endpoint with github copilot chat in vscode In VScode, type “Ctrl + Shift + p” and select “Chat: Manage Language Model”. Select Add Models and choose Azure. 6: Add models toGithubCopilot Chat Follow through the prompts. It will create or edit a chatLanguageModels.json file. Your file should look like this. Take note that you will need to use the /vscode path. [     {         "name": "Azure",         "vendor": "azure",         "models": [             {                 "id": "gpt-5.4",                 "name": "gpt-5.4 (APIM)",                 "url": "https://<REPLACE WITH YOUR APIM ENDPOINT>.azure-api.net/vscode/openai/deployments/gpt-5.4/chat/completions?api-version=2025-01-01-preview",                 "toolCalling": true,                 "vision": true,                 "maxInputTokens": 128000,                 "maxOutputTokens": 16000,                 "requestHeaders": {                     "Ocp-Apim-Subscription-Key": "<REPLACE WITH YOUR SUBSCRIPTION KEY"                 }             }         ]     } ] Now select the gpt-5.4 (APIM) model and ask it a question. Integrate endpoint with copilot cli As copilot only accepts api-key headers, a separate api is used. Replace and export the following variables before using copilot cli. export COPILOT_PROVIDER_TYPE="azure" export COPILOT_PROVIDER_BASE_URL="<REPLACE WITH YOUR APIM ENDPOINT>" export COPILOT_PROVIDER_API_KEY="<REPLACE WITH YOUR SUBSCRIPTION KEY>" export COPILOT_MODEL="gpt-5.4" export COPILOT_PROVIDER_AZURE_API_VERSION="2025-01-01-preview" export COPILOT_PROVIDER_MODEL_ID="gpt-5.4" You should see a similar response. 7: Integration of APIM to copilot cli Simulate downgrade feature Continue to ask more questions to consume more tokens. Once it hits the 80% cost threshold, you should see that the tag has been switched to “Auto-switch level 1”, meaning it will downgrade to gpt-5.4-mini for future requests. 8: AI Gateway Downgrade Feature Validate by running this command in your terminal with your own endpoints and api-key. curl -sS -i -X POST "https://<REPLACE>.azure-api.net/openai/deployments/gpt-5.4/chat/completions?api-version=2025-01-01-preview" -H "api-key: <REPLACE WITH API KEY>" -H "Content-Type: application/json" -d '{"messages":[{"role":"user","content":"hi"}],"max_completion_tokens":8}' Inspect the headers, you should see that the downgrade level is 1 and the effective model is gpt-5.4-mini despite hitting the same endpoint of gpt-5.4. 9: Model downgrade Conclusion This post started with the problem of multi-model chaos: teams moving quickly with different models, endpoints, SDKs, keys, quotas, and cost profiles, but without a common control plane resulting in ineffective cost control and potential security leaks with model API keys. The governed AI Gateway addresses that by putting Azure OpenAI and Azure AI Foundry behind a single APIM-based entry point, where access, limits, routing, identity, telemetry, and budget behavior can be applied consistently for every consumer. We also walked through how the gateway is different from APIM’s native AI gateway capabilities. APIM provides the enforcement runtime and the GenAI policy primitives, such as token limits, token metrics, semantic caching, and backend routing. The governed AI Gateway builds the operating model around those primitives: self-service onboarding, per-consumer model entitlements, live configuration without redeployment, managed-identity-only private backends, per-call cost telemetry, and cost-based downgrade across model providers. From there, we integrated the APIM endpoint with Github Copilot Chat and Copilot CLI, and validated the downgrade behavior when spend thresholds were reached. The result is not just an AI proxy, but a reusable enterprise pattern for running AI access as a governed platform: developers keep a simple model endpoint experience, while the platform team keeps control over security, cost, observability, and operational change. Overall, this post helps organizations bring multi-model AI usage under one governed entry point, reducing sprawl across endpoints, keys, policies, and cost controls. It also gives platform teams centralized control over model access, rate limits, budgets, telemetry, and private backend access while preserving a simple endpoint experience for developers. References AI gateway capabilities in Azure API Management Policies in Azure API Management Azure API Management policy reference - llm-emit-token-metric Using GitHub Copilot CLI - GitHub Docs AI language models in VS Code
jisunchoi
Jul 13, 2026 Place Microsoft Developer Community Blog
303Views
3likes
0Comments
Microsoft Leads a New Era of Software Supply Chain Transparency
Microsoft announces the general availability of Microsoft’s Signing Transparency (MST) – a first-of-its-kind capability that brings unprecedented visibility and trust to our software supply chain. With this release, Microsoft is leading the industry by recording the build of critical cloud services into a publicly readable and verifiable SCITT standard (Supply Chain Integrity, Transparency, and Trust) compliant ledger. This means every production software build for in scope services like Azure Attestation and Azure Managed HSM (Hardware Security Module), Azure confidential ledger, Microsoft Signing Transparency itself (and others over time) – is now logged in an immutable, tamper-evident record. Only builds that are in the MST ledger are deployed to production; this gives customers confidence that the supply chain for these critical services can be audited at anytime. Notably, the MST ledger is fully open source and built to align with the emerging IETF SCITT standard. By embracing SCITT’s principles and open protocols, Microsoft ensures that MST not only secures our own ecosystem but also contributes to a broader industry movement toward standardized supply chain transparency. The open-source MST ledger serves as a verifiable trust anchor that any organization or researcher can inspect, audit, or even integrate with their own tooling. MST itself meets the highest levels of transparency, backed by a tamper-proof confidential ledger, open-source, and independently verified. Specifically, we are making the foundation of our trust model transparent and accessible to everyone – reinforcing that trust must be earned through proof, not just promises. This launch marks a major milestone in our commitment to Zero Trust principles, extending “never trust, always verify” all the way into the build itself. Building on a public preview introduced late last year, MST’s general availability delivers verifiable transparency at the software level. It transforms traditional code signing with an additive trust layer that is accessible via an open verification model. Every new software update is accompanied by a publicly auditable proof of integrity, enabling security teams to proactively confirm that each update is authentic and unaltered. To help organizations get the most out of this capability, we are also introducing a free tool to explore the contents – Ledger Explorer – an offline tool that allows security teams to examine MST ledger entries, verify cryptographic proofs, and even validate the ledger’s integrity independently. This tool, combined with MST’s open design, ensures that every Microsoft customer – and the broader community – can hold us accountable in real time for the software we run on their behalf. Key Benefits of Microsoft’s Signing Transparency (MST) Verified Code Integrity – Every software release is cryptographically logged in MST’s ledgers. This makes each build tamper-evident and traceable. If an attacker attempts to inject malicious code or sign an unauthorized update, it will be evident through the well-defined validation step built into the SCITT standard. Organizations gain the assurance that code integrity can be independently confirmed at any time. Independent Verification & Zero Trust – MST enables customers and auditors to verify software authenticity on their own, without having to solely rely on vendor attestations. For each update, Microsoft provides a transparency “receipt” (proof of logging) that you can use to prove the update was officially published and unaltered. This fosters a “don’t just trust, verify” approach, empowering security teams to double-check everything running in their environment aligns with what Microsoft intended. Audit-Trail & Compliance – The transparency ledger creates a permanent, auditable timeline of code deployments. Every entry is a record of what was released and when, backed by cryptographic proofs. This simplifies compliance reporting and accelerates forensic analysis. In the event of an incident, you can quickly audit the ledger to see if any unexpected code was introduced. For highly regulated industries, MST offers concrete evidence of software integrity and policy compliance over time. Leadership & Open Standards – We are delivering real transparency now, encouraging a future where all critical software is released with verifiable integrity. MST’s open source implementation and SCITT-compliant design exemplify our commitment to openness and collaboration. We believe widespread adoption of these standards will strengthen supply chain security for everyone, making trust verification a universal practice. Next Steps Microsoft’s Signing Transparency is more than a new security feature and shapes the advances in trust technology. As threats grow more sophisticated, we must evolve the way we assure our customers about the software they depend on. With MST now generally available, we are leading by example: proving that it is possible to open up the traditionally opaque process of software deployment and turn it into a source of strength and trust, i.e. empowering each person with verifiable transparency. We invite the industry to join us on this journey and get started by reading the documentation and exploring Ledger Explorer today! Together, by embracing transparency and open standards, we can turn “trust but verify” from a slogan into an everyday reality for digital infrastructure.
ShubhraS
Jun 19, 2026 Place Microsoft Developer Community Blog
153Views
0likes
0Comments
We Gave Ourselves 20 Minutes to Build an AI Agent for a Lumber Company. The Timer's Still on Screen.
Here's a confession: most "build with AI" webinars are 60 minutes of slides, 5 minutes of a polished demo someone rehearsed for a week, and a closing CTA. You leave inspired but not really sure what you saw. So we tried something different. We put a visible countdown timer on the screen and gave ourselves 20 minutes to do two things, live: Build an AI agent that solves a real business problem Deploy a working AI application to Azure No edits to hide the awkward parts. No "and here's one I prepared earlier." Just the timer, the screen, and a working app at the end. The on-demand recording is up now. Here's what's in it and why you should carve out 20 minutes for it this week. The setup: why lumber? 🏘️ We needed a real business problem, not a toy one. So for the demo, we role-play as the owner of Contoso Lumber — a regional lumber business with a very specific, very real headache: Should we sell our inventory now, or hold it longer? Sell too early, miss a better price. Hold too long, eat storage costs. Lumber prices fluctuate with global competition, macro shifts, even the weather. In the past, decisions like this came from morning meetings and gut instinct, or maybe the occasional ad-hoc spreadsheet that nobody could reuse a month later. It's the kind of decision that should have an analyst behind it — except most growing businesses can't afford to hire one full-time. So we build the AI agent that does. (Yes, lumber. We know. Stick with us — the boring industry is exactly the point. If it works here, it works for your business too.) What we actually build (in 20 minutes flat) The webinar walks through the entire flow, end to end: Part 1 — The agent. We open Microsoft Foundry at ai.azure.com, browse the model leaderboard (there are over 11,000 models to choose from — we compare a few on the cost-vs-quality chart), pick one, write a plain-English instruction for the agent, upload a CSV of historical lumber pricing, and ask it a real question: "If I cannot sell one of my products today unless I offer my clients a 35% discount, and knowing the historical pricing data, should I still sell it?" The agent runs a break-even analysis and comes back with a reasoned recommendation — hold for 3–6 months, here's the math on why, here's where storage costs start eating the upside. Then we add voice mode (now you can ask the agent for pricing recs from a coffee shop on your phone), and lock down guardrails to block jailbreaks, prompt injection, data leakage, and — because we're feeling fancy — profanity in responses. Part 2 — The app. With the agent done, we pivot to deploying a full AI chat application to Azure. From scratch. Using exactly five commands in Azure Cloud Shell: azd auth login git clone <repo> cd <folder> azd up azd down # (this one's for when you're done — kills everything to avoid surprise bills) That's it. The template handles the Container Apps setup, the architecture-aligned-to-Well-Architected-Framework stuff, all the boilerplate that usually eats half a sprint. By the end of the segment, there's a working AI chatbot running on a real Azure URL. We even pause the timer when we're just explaining things, so you know the 20-minute clock is honest about build time, not talk time. Why this format is more useful than another slide deck A few things this webinar shows that a written tutorial can't: The Foundry UI is super navigable. You watch someone do it. You see where the buttons are. You see what the leaderboard looks like when you're comparing GPT-5.3 Codex against Kimi K2.5 on a cost-to-quality chart. (Spoiler: Kimi wins this particular trio. Your mileage will vary depending on your workload.) The "no-stitching" claim is real. Models, data, agents, guardrails, deployment — all in one place. You don't need to leave Foundry to wire seven products together. The webinar makes that concrete by showing you the actual flow without cutting. Five commands really is five commands. This is the part people are most skeptical about until they see it. azd up does the work. The infrastructure provisioning, the container app, the AI service hookup — all of it. You can delete it just as fast. azd down tears everything back down. Useful when you're experimenting and don't want a $40 surprise on your Azure bill next month. What's on screen at the end By the 20-minute mark: A published AI agent named for the lumber business, with guardrails, voice mode enabled, ready to be called from Teams, Microsoft 365 Copilot, or any application via endpoint A separate AI chat application deployed to Azure Container Apps, with a live URL Logs, observability, the full Foundry control plane — all available out of the box And in the closing minutes, four very concrete next steps for what you do next if this sparked an idea for your own business — including Azure Accelerate (if you want Microsoft experts in the room with you), the partner network, and the Microsoft marketplace if you'd rather buy than build. Watch the recording The on-demand recording is available now. Block 20 minutes — that's literally all it takes — and ideally watch with your Azure portal open in another tab so you can follow along. If you're the kind of person who learns by doing, pause at the agent-building section and try it yourself in parallel. Foundry is free to explore; the agent we build in the webinar costs cents to run. → Watch the on-demand webinar A few things we'd love feedback on If you watch it, we'd genuinely love to know: Did the timer help or distract? (We thought it would feel gimmicky. It turned out to be the most-mentioned thing in early feedback.) What use case from your business would you want to see in the next one? We're picking the next demo problem from comments. Was the lumber thing weirdly compelling or were you just here for the Azure parts? Drop a comment, tag us, or grab a partner and try building your own version this week. The timer's reset. Your 20 minutes start whenever you press play. Want to go deeper than the webinar? Two companion reads: From Idea to Impact: How Growing Businesses Scale with Azure (five real customer stories with the full architectures) and AI Made Simple: 3 Practical Moves for Growing Businesses (the structured playbook for figuring out what to build first).
JoshuaHuang
Jun 10, 2026 Place Microsoft Developer Community Blog
176Views
0likes
0Comments
Building an On-Device Voice Assistant with Microsoft Foundry Local
Why on-device voice still matters Most "voice AI" tutorials assume your audio leaves the machine. You ship a WAV to Whisper-API, your transcript to GPT-4, and a synthesized response back over the wire. That works — but it also means three round trips, three per-token bills, and three places your user's voice gets logged. The new wave of small, hardware-optimised models changes the trade-off. NVIDIA's Nemotron Speech Streaming En 0.6B is a 600M-parameter streaming ASR model published into the Microsoft Foundry Local catalog. Paired with a small chat model like qwen2.5-0.5b or phi-4-mini , you can run the entire capture → transcribe → reason → respond loop in-process on a developer laptop, with no API keys and no network egress. This post walks through how the fl-nemotron sample does it, the SDK pitfalls we hit on the way, and the design decisions that made the pipeline reliable. What we're building A browser-hosted assistant served by FastAPI at http://127.0.0.1:8000 . The page captures microphone audio, posts it to /api/transcribe , then streams the chat reply back over Server-Sent Events from /api/chat . All inference runs locally through two Foundry Local models loaded into the same process. The shape of the pipeline: Microphone (browser MediaRecorder) │ WebM/Opus blob ▼ Client-side WAV encoder (16 kHz, mono, PCM-16) │ multipart/form-data ▼ FastAPI /api/transcribe │ ▼ Nemotron Speech Streaming En 0.6B (Foundry Local audio client) │ transcript text ▼ Chat LLM e.g. qwen2.5-0.5b (Foundry Local chat client) │ streamed tokens ▼ FastAPI /api/chat → SSE → browser bubble The version that bit us: foundry-local-sdk >= 1.1.0 Before any code, the single most important fact about this project: The Nemotron Speech Streaming model only appears in the Foundry Local 1.1.x catalog. Older SDKs (0.5.x / 0.6.x) cannot resolve the alias nemotron-speech-streaming-en-0.6b and fail with model not found . The module name also changed in 1.1.0 — it is now foundry_local_sdk (with the underscore- sdk suffix), not foundry_local . The pip wheel for foundry-local-core is bundled, so there is no separate MSI / winget install to worry about. Pin it explicitly: pip install --upgrade "foundry-local-sdk>=1.1.0,<2" And verify before anything else: python -c "import importlib.metadata as m; print('sdk', m.version('foundry-local-sdk'))" # expect: sdk 1.1.0 Loading both models from one manager The 1.1.x SDK exposes a single FoundryLocalManager that owns the runtime. Each loaded model gives you back a per-model OpenAI-compatible client — get_chat_client() for text models and get_audio_client() for ASR. There is no need to bring your own openai Python package; the SDK ships its own thin client. The wrapper used in the repo ( src/foundry_client.py ) does this: from foundry_local_sdk import Configuration, FoundryLocalManager FoundryLocalManager.initialize(Configuration(app_name="fl-nemotron")) manager = FoundryLocalManager.instance chat_model = manager.load_model("qwen2.5-0.5b") stt_model = manager.load_model("nemotron-speech-streaming-en-0.6b") chat_client = chat_model.get_chat_client() audio_client = stt_model.get_audio_client() Both models are downloaded on first use into the Foundry Local cache and stay resident for the lifetime of the process. On a laptop with 16 GB RAM, the combined working set sits comfortably under 4 GB. The transcription surprise The first naive approach was the obvious one: with open(wav_path, "rb") as f: result = audio_client.transcribe(file=f, model="nemotron-speech-streaming-en-0.6b") That call fails on Nemotron. The bundled ONNX Runtime GenAI in foundry-local-core does not register the nemotron_speech multi-modal model type that the standard AudioClient.transcribe() path tries to instantiate. The error surfaces as a cryptic model-type registration failure deep inside the native runtime. The fix is to use the streaming session API instead — a different native entry point ( core_interop.start_audio_stream ) that the streaming model does support. The repo isolates this in src/_nemotron_live.py : def transcribe_wav_live(audio_client, wav_path, *, language="en"): with wave.open(str(wav_path), "rb") as w: sample_rate = w.getframerate() channels = w.getnchannels() sample_width = w.getsampwidth() pcm = w.readframes(w.getnframes()) session = audio_client.create_live_transcription_session() session.settings.sample_rate = sample_rate session.settings.channels = channels session.settings.bits_per_sample = sample_width * 8 session.settings.language = language session.start() # Feed PCM in ~100 ms chunks from a worker thread, then stop. bytes_per_sec = sample_rate * channels * sample_width chunk_bytes = max(bytes_per_sec // 10, 1024) def _pusher(): try: for offset in range(0, len(pcm), chunk_bytes): session.append(pcm[offset:offset + chunk_bytes]) finally: session.stop() threading.Thread(target=_pusher, daemon=True).start() parts = [] for resp in session.get_stream(): for cp in getattr(resp, "content", []) or []: text = getattr(cp, "text", "") or getattr(cp, "transcript", "") or "" if text: parts.append(text) return " ".join(p.strip() for p in parts if p.strip()).strip() Two things to notice: Push from a thread, read from the main coroutine. session.append() is a blocking write into the native stream and session.get_stream() is a blocking generator. Run one in a worker thread so the other can drain in parallel — otherwise you deadlock the session. Chunk to ~100 ms. Smaller chunks (e.g. 10 ms) spend more time crossing the FFI boundary than transcribing; larger chunks (e.g. 1 s) hold back partial results and hurt perceived latency. Always session.stop() . Without it the generator never terminates and the request hangs. The other transcription surprise: browsers don't send WAV Inside the browser, MediaRecorder defaults to audio/webm; codecs=opus . That's great for size but bad for our STT model, which expects a 16-bit mono PCM WAV at a known sample rate. Decoding WebM/Opus server-side would require ffmpeg as a runtime dependency — which is exactly the kind of friction this project exists to remove. The cleaner solution is to encode WAV on the client. AudioContext.decodeAudioData already understands WebM/Opus, so the page can decode the recording, resample to 16 kHz, mix to mono, and emit a PCM-16 WAV blob in 30 lines of JavaScript: // Inside src/static/index.html async function webmToWav(blob) { const ctx = new (window.AudioContext || window.webkitAudioContext)({ sampleRate: 16000 }); const buf = await ctx.decodeAudioData(await blob.arrayBuffer()); // Mix to mono const ch = buf.numberOfChannels; const mono = new Float32Array(buf.length); for (let c = 0; c < ch; c++) { const data = buf.getChannelData(c); for (let i = 0; i < data.length; i++) mono[i] += data[i] / ch; } return encodeWav(mono, 16000); } function encodeWav(samples, sampleRate) { const buffer = new ArrayBuffer(44 + samples.length * 2); const view = new DataView(buffer); // RIFF header writeStr(view, 0, "RIFF"); view.setUint32(4, 36 + samples.length * 2, true); writeStr(view, 8, "WAVE"); // fmt chunk writeStr(view, 12, "fmt "); view.setUint32(16, 16, true); // PCM chunk size view.setUint16(20, 1, true); // PCM format view.setUint16(22, 1, true); // mono view.setUint32(24, sampleRate, true); view.setUint32(28, sampleRate * 2, true); // byte rate view.setUint16(32, 2, true); // block align view.setUint16(34, 16, true); // bits per sample // data chunk writeStr(view, 36, "data"); view.setUint32(40, samples.length * 2, true); // PCM-16 samples let o = 44; for (let i = 0; i < samples.length; i++, o += 2) { const s = Math.max(-1, Math.min(1, samples[i])); view.setInt16(o, s < 0 ? s * 0x8000 : s * 0x7FFF, true); } return new Blob([view], { type: "audio/wav" }); } Now the server's /api/transcribe endpoint just writes the bytes to a temp file and hands them to transcribe_wav_live() — no audio decoding libraries on the Python side. Wiring it into FastAPI The server ( src/app.py ) is deliberately small. The notable detail is that the same process holds both Foundry Local model handles for its entire lifetime, so there is no warm-up cost per request: @app.post("/api/transcribe") async def transcribe(audio: UploadFile = File(...)): data = await audio.read() with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f: f.write(data); path = f.name text = _ai_client.transcribe(path) return {"text": text} @app.post("/api/chat") async def chat(req: ChatRequest): if req.stream: return StreamingResponse( _sse(_ai_client.stream_completion(req.messages)), media_type="text/event-stream", ) return {"text": _ai_client.chat_completion(req.messages)} Streaming uses Server-Sent Events because they are trivially supported in both fetch() and the FastAPI runtime, and they don't require a WebSocket upgrade through any proxy a developer might have in front of localhost . What it looks like The repo includes screenshots of the running UI: a welcome screen with both models loaded, a streamed haiku reply, an inline code block with copy-to-clipboard, and the recording state for the microphone. Performance, honestly This is a small-model, CPU-friendly stack. On an Arm64 Surface running the x64 SDK under emulation: First model load (cold cache): tens of seconds — downloads ~600 MB for Nemotron and ~400 MB for qwen2.5-0.5b . Subsequent loads (warm cache): a few seconds per model. End-to-end transcription of a 5-second utterance: well under a second after warm-up. First chat token from qwen2.5-0.5b : typically 200–500 ms; full short reply within 1–2 s. On x64 silicon with a recent CPU the numbers improve substantially, and the SDK will pick the best execution provider it finds (CPU / DirectML / CUDA) for each model. Trade-offs to know about Model quality. qwen2.5-0.5b is a 500M-parameter model. It is fast and small enough to ship on a laptop, but it is not GPT-4. Swap in phi-4-mini or mistral-nemo-12b-instruct if you have the RAM and want better reasoning — the wrapper accepts any chat alias in the Foundry Local catalog. STT is English-only here. The current Nemotron streaming model in the catalog is ...-en-0.6b . Multilingual variants are likely to follow. Browser microphone needs a real browser. Headless / automated browsers (Playwright, Puppeteer) deny getUserMedia by default. Open the page in Edge / Chrome / Firefox to grant the permission and capture audio for real. No agent framework yet. This sample is deliberately a single-turn loop over a chat client — there is no tool calling, planning, or multi-agent orchestration. Adding the Microsoft Agent Framework on top would be a natural next step for richer behaviour. Responsible AI considerations Running locally removes the cloud-egress class of privacy concerns, but it does not remove responsibility: Disclose recording. The browser prompts for mic permission; your UI should make it obvious when capture is active. The sample shows a red ⏹ button and a "Recording…" banner for that reason. Don't log raw audio. The sample writes audio to a per-request NamedTemporaryFile and deletes it after transcription. Treat the WAV as sensitive data even when it never leaves the device. Small models hallucinate. A 0.5B chat model is great for snappy local replies, but unsuitable for high-stakes answers. Pair it with retrieval, ground it on your own data, or escalate to a larger model when accuracy matters. Try it Clone github.com/leestott/fl-nemotron. ./setup.ps1 (or ./setup.sh ) to create a virtualenv and install the pinned SDK. python scripts/prefetch.py nemotron-speech-streaming-en-0.6b qwen2.5-0.5b to download both models. .venv\Scripts\uvicorn.exe app:app --app-dir src --port 8000 Open http://127.0.0.1:8000 in a real browser and click the 🎤 button. Where to go next Foundry Local documentation — official docs for the runtime, catalog, and SDK. microsoft/Foundry-Local — upstream samples and issue tracker. NVIDIA Nemotron model family — background on the speech and language models being published into the catalog. leestott/fl-nemotron — the full source for this post. Key takeaways Pin foundry-local-sdk >= 1.1.0 . Earlier SDKs cannot see the Nemotron Speech Streaming model. Use the LiveAudioTranscriptionSession API for Nemotron, not AudioClient.transcribe() . Encode WAV in the browser. It eliminates a heavy server-side ffmpeg dependency for a few lines of JS. Push audio chunks on a worker thread and drain the response generator on the main one to avoid deadlocks. A small Foundry Local chat model plus Nemotron STT gives you a credible local voice loop in a single Python process — no cloud, no keys, no data egress.
Lee_Stott
May 26, 2026 Place Microsoft Developer Community Blog
480Views
0likes
0Comments
You Asked—We Answered: Inside the Agent 365 AMA
Watch the replay of the live Agent 365 demo and expert panelist answers to our Microsoft Tech Community's real-world questions about licensing, third-party agents, governance and security.
Nichole_Peterson
May 13, 2026 Place Agent 365 Blog
628Views
0likes
0Comments
From Prompt to Production: Open in VS Code for Terraform in Azure Copilot
We’re excited to introduce a new step in the Terraform on Azure experience: Open in VS Code, now available directly from Azure Copilot in the Azure Portal. This capability helps you move seamlessly from AI‑generated Terraform code to real Azure deployments - within a connected, guided workflow designed for enterprise scenarios. Why This Matters Infrastructure as Code with Terraform is powerful, but moving from generated configuration to a deployed environment typically involves multiple tools and handoffs. Teams need to understand Terraform state, work with remote backends, and integrate their code into version‑controlled CI/CD pipelines - often backed by Terraform Cloud or Azure‑native backends in enterprise environments. Open in VS Code brings these steps together. It bridges the gap between AI‑assisted authoring in the Azure Portal and the real‑world workflows required to validate, manage state, and deploy infrastructure with confidence. Continue Your Workflow in VS Code With Azure Copilot, you can describe your infrastructure in natural language and generate Terraform configurations in seconds. For example: “Create an Azure Container App using Terraform with a managed environment, Log Analytics enabled, and a system‑assigned managed identity to securely pull images from Azure Container Registry.” Copilot generates the Terraform configuration for you. From there, you can select Open full view to enter a full‑screen Terraform editor, and then choose Open in VS Code to launch the configuration in an Azure‑hosted VS Code environment. There’s no need to download files or set up a local development environment. VS Code for the web opens with Azure authentication already configured, along with commonly used extensions, so you can immediately focus on refining, validating, and preparing your infrastructure for deployment. Built‑in Guidance for Real Deployments Beyond editing, the VS Code experience includes built‑in, step‑by‑step guidance to help you deploy your Terraform configuration into your own Azure environment - whether you’re experimenting or preparing for production. Because Terraform relies on state management, the workflow starts by helping you choose and configure a backend. Backend Options Option 1: Azure Storage Account as a remote backend A natural fit for Azure‑native and enterprise environments. The experience guides you through creating or selecting a storage account and configuring Terraform to store state securely in Azure. Option 2: HCP Terraform (Terraform Cloud) as a remote backend Ideal for teams already using Terraform Cloud. The guided flow helps you authenticate, connect to an existing organization and workspace, and generate the required backend configuration directly into your Terraform files. Option 3: Temporary workspace for quick validation Designed for learning and experimentation. You can run terraform plan and terraform apply directly in the Azure workspace with temporary state, without committing to a long‑term backend - ideal for quick validation, but not intended for production use. Each option includes an end‑to‑end walkthrough, so you can complete backend setup and run Terraform commands without leaving the VS Code environment or searching through external documentation. Connecting Code, State, and Deployment This experience connects three essential parts of the Terraform workflow: AI‑assisted code generation in Azure Portal Copilot Interactive editing and guided execution in VS Code for the web Flexible backend options for managing Terraform state Together, these pieces make it easier to move from idea to infrastructure in a structured, supported way—whether you’re new to Terraform or managing production workloads with established CI/CD pipelines. Available Now - and What’s Next The Open in VS Code experience for Terraform is now public preview in Azure Portal Copilot. We’re continuing to invest in this workflow, including clearer deployment guidance, future integration with GitHub Actions and other CI/CD pipelines, and deeper enhancements to the full‑screen Terraform editor experience. If you haven’t tried it yet, generate a Terraform configuration with Azure Copilot and open it in VS Code to go from prompt to production end to end in one connected workflow.
Jingwei_Wang
May 12, 2026 Place Azure Tools Blog
1.2KViews
0likes
0Comments
Ask questions: May 12 live Agent 365 AMA
May 12, 2026 at 9 AM PT is our next "Ask Microsoft Anything" with the Agent 365 team.
Nichole_Peterson
May 06, 2026 Place Agent 365 Blog
732Views
0likes
0Comments
From Terminal to Autonomous Coding: Mastering GitHub Copilot CLI ACP Server
Introduction The rise of AI-powered development is no longer just about autocomplete—it’s about autonomous agents that can think, act, and collaborate. At the center of this transformation is the Agent Client Protocol (ACP) and its integration with GitHub Copilot CLI. If you’ve ever wanted to: Integrate Copilot into your own tools Build custom AI-driven developer workflows Orchestrate coding agents in CI/CD Then understanding the GitHub Copilot CLI ACP Server is a game-changer. This article will take you from zero to advanced, covering concepts, architecture, setup, and real-world use cases. What Is Agent Client Protocol (ACP)? The Agent Client Protocol (ACP) is an open standard designed to connect clients (like IDEs or tools) with AI agents in a consistent and interoperable way. Why ACP Exists Before ACP: Every IDE needed custom integration for each AI agent Every agent needed custom APIs per editor ACP solves this by introducing a universal communication layer. Key Idea “Any editor can talk to any agent.” Core Capabilities ACP enables: Standardized messaging between client and agent Streaming responses Tool execution with permissions Session lifecycle management Multi-agent coordination This makes ACP a foundation layer for the agentic developer ecosystem. What Is GitHub Copilot CLI ACP Server? GitHub Copilot CLI can run as an ACP-compatible server, exposing its AI capabilities programmatically. 👉 In simple terms: It turns Copilot into a backend AI agent service that any tool can connect to. According to GitHub Docs: Copilot CLI can run in ACP mode using a flag It supports standardized communication via ACP It enables integration with IDEs, pipelines, and custom tools Architecture: How ACP + Copilot CLI Works Components Component Role Client Sends prompts, receives responses ACP Protocol Standard communication layer Copilot CLI AI agent executing tasks System Files, repos, tools Getting Started (Beginner Level) Install GitHub Copilot CLI Ensure: Copilot subscription is active CLI installed and authenticated Start ACP Server Default (stdio mode – recommended) TCP Mode (for remote systems) stdio: Best for IDE integration TCP: Best for distributed systems Connect Using ACP Client (Example) Using TypeScript SDK: You: Start Copilot as a process Create streams Initialize connection Send prompt Receive streaming response ACP uses: NDJSON streams over stdin/stdout Event-driven communication ACP Workflow Explained A typical flow looks like this: Step-by-step lifecycle Initialize connection Create session Send prompt Agent processes task Streaming updates returned Optional tool execution (with permissions) Session ends ACP supports: Text + multimodal inputs Incremental responses Cancellation and control Real-World Use Cases IDE Integration (Custom Editors) Build your own AI-powered editor: Connect via ACP Send code context Receive suggestions CI/CD Automation Imagine: Use ACP to: Auto-fix bugs Generate tests Refactor code Multi-Agent Systems ACP enables: Copilot + other agents working together Task delegation Workflow orchestration Custom Developer Tools Examples: AI code review dashboards Internal dev assistants ChatOps integrations Advanced Concepts Session Management ACP allows: Isolated sessions Custom working directories Context persistence Streaming Responses Instead of waiting: Receive responses in chunks Build real-time UIs Permission Handling ACP includes: Tool execution approvals Security boundaries Controlled automation Extensibility ACP supports: Multiple SDKs (TypeScript, Python, Rust, Kotlin) Custom clients Future protocol evolution ACP vs Traditional Integration Feature Traditional APIs ACP Integration Custom per tool Standardized Streaming Limited Native Multi-agent Hard Built-in Extensibility Low High Interoperability Poor Excellent Why ACP + Copilot CLI Is a Big Deal This combination unlocks: ✅ Platform-level AI integration No more vendor lock-in per editor ✅ True agentic workflows Agents don’t just suggest—they act ✅ Ecosystem growth Any tool can plug into Copilot Challenges & Considerations ACP is still in public preview Requires understanding of: Streams Async communication Debugging agent workflows can be complex Future of Developer Experience ACP represents a shift toward: “AI-native development platforms” Future possibilities: Fully autonomous CI/CD pipelines Cross-agent collaboration Self-healing codebases Final Thoughts The GitHub Copilot CLI ACP Server is not just a feature—it’s a foundation for the next generation of software development. If you are: A developer → build smarter tools A tech lead → design AI-driven workflows A CTO aspirant → understand this deeply Then ACP is something you must master early. Quick Summary ACP = Standard protocol for AI agents Copilot CLI = Can run as ACP server Enables = IDEs, CI/CD, multi-agent systems Key power = interoperability + automation
Ravindra_Kumar_Vishwakarma
May 01, 2026 Place Microsoft Developer Community Blog
693Views
2likes
0Comments
Get to know these Agent 365 community all-stars
Microsoft MVPs and influencers are posting helpful content for IT and Security admins who want to learn more about Microsoft Agent 365. We've selected a few favorites to get you started.
Nichole_Peterson
Apr 30, 2026 Place Agent 365 Blog
453Views
1like
1Comment
If You're Building AI on Azure, ECS 2026 is Where You Need to Be
Let me be direct: there's a lot of noise in the conference calendar. Generic cloud events. Vendor showcases dressed up as technical content. Sessions that look great on paper but leave you with nothing you can actually ship on Monday. ECS 2026 isn't that. As someone who will be on stage at Cologne this May, I can tell you the European Collaboration Summit combined with the European AI & Cloud Summit and European Biz Apps Summit is one of the few events I've seen where engineers leave with real, production-applicable knowledge. Three days. Three summits. 3,000+ attendees. One of the largest Microsoft-focused events in Europe, and it keeps getting better. If you're building AI systems on Azure, designing cloud-native architectures, or trying to figure out how to take your AI experiments to production — this is where the conversation is happening. What ECS 2026 Actually Is ECS 2026 runs May 5–7 at Confex in Cologne, Germany. It brings together three co-located summits under one roof: European Collaboration Summit — Microsoft 365, Teams, Copilot, and governance European AI & Cloud Summit — Azure architecture, AI agents, cloud security, responsible AI European BizApps Summit — Power Platform, Microsoft Fabric, Dynamics For Azure engineers and AI developers, the European AI & Cloud Summit is your primary destination. But don't ignore the overlap, some of the most interesting AI conversations happen at the intersection of collaboration tooling and cloud infrastructure. The scale matters here: 3,000+ attendees, 100+ sessions, multiple deep-dive tracks, and a speaker lineup that includes Microsoft executives, Regional Directors, and MVPs who have built, broken, and rebuilt production systems. The Azure + AI Track - What's Actually On the Agenda The AI & Cloud Summit agenda is built around real technical depth. Not "intro to AI" content, actual architecture decisions, patterns that work, and lessons from things that didn't. Here's what you can expect: AI Agents and Agentic Systems This is where the energy is right now, and ECS is leaning in. Expect sessions covering how to design agent workflows, chain reasoning steps, handle memory and state, and integrate with Azure AI services. Marco Casalaina, VP of Products for Azure AI at Microsoft, is speaking if you want to understand the direction of the Azure AI platform from the people building it, this is a direct line. Azure Architecture at Scale Cloud-native patterns, microservices, containers, and the architectural decisions that determine whether your system holds up under real load. These sessions go beyond theory you'll hear from engineers who've shipped these designs at enterprise scale. Observability, DevOps, and Production AI Getting AI to production is harder than the demos suggest. Sessions here cover monitoring AI systems, integrating LLMs into CI/CD pipelines, and building the operational practices that keep AI in production reliable and governable. Cloud Security and Compliance Security isn't optional when you're putting AI in front of users or connecting it to enterprise data. Tracks cover identity, access patterns, responsible AI governance, and how to design systems that satisfy compliance requirements without becoming unmaintainable. Pre-Conference Deep Dives One underrated part of ECS: the pre-conference workshops. These are extended, hands-on sessions typically 3–6 hours that let you go deep on a single topic with an expert. Think of them as intensive short courses where you can actually work through the material, not just watch slides. If you're newer to a particular area of Azure AI, or you want to build fluency in a specific pattern before the main conference sessions, these are worth the early travel. The Speaker Quality Is Different Here The ECS speaker roster includes Microsoft executives, Microsoft MVPs, and Regional Directors, people who have real accountability for the products and patterns they're presenting. You'll hear from over 20 Microsoft speakers: Marco Casalaina — VP of Products, Azure AI at Microsoft Adam Harmetz — VP of Product at Microsoft, Enterprise Agent And dozens of MVPs and Regional Directors who are in the field every day, solving the same problems you are. These aren't keynote-only speakers — they're in the session rooms, at the hallway track, available for real conversations. The Hallway Track Is Not a Cliché I know "networking" sounds like a corporate afterthought. At ECS it genuinely isn't. When you put 3,000 practitioners, engineers, architects, DevOps leads, security specialists in one venue for three days, the conversations between sessions are often more valuable than the sessions themselves. You get candid answers to "how are you actually handling X in production?" that you won't find in documentation. The European Microsoft community is tight-knit and collaborative. ECS is where that community concentrates. Why This Matters Right Now We're in a period where AI development is moving fast but the engineering discipline around it is still maturing. Most teams are figuring out: How to move from AI prototype to production system How to instrument and observe AI behaviour reliably How to design agent systems that don't become unmaintainable How to satisfy security and compliance requirements in AI-integrated architectures ECS 2026 is one of the few places where you can get direct answers to these questions from people who've solved them — not theoretically, but in production, on Azure, in the last 12 months. If you go, you'll come back with practical patterns you can apply immediately. That's the bar I hold events to. ECS consistently clears it. Register and Explore the Agenda Register for ECS 2026: ecs.events Explore the AI & Cloud Summit agenda: cloudsummit.eu/en/agenda Dates: May 5–7, 2026 | Location: Confex, Cologne, Germany Early registration is worth it the pre-conference workshops fill up. And if you're coming, find me, I'll be the one talking too much about AI agents and Azure deployments. See you in Cologne.
Lee_Stott
Apr 22, 2026 Place Microsoft Developer Community Blog
301Views
2likes
0Comments