best practices
108 TopicsBuilding Agentic Systems on Azure: Microsoft Foundry Agents SDK vs Microsoft Agent Framework
In my recent experience as a Senior Consultant at Microsoft, I’ve been actively involved in designing and delivering AI-driven solutions, with a strong focus on building intelligent agents using modern frameworks. Along the way, I've built agents using both Microsoft Foundry Agents SDK (hereafter "Agents SDK") and Microsoft Agent Framework (MAF) Both approaches are powerful and capable. However, once you move beyond simple proofs of concept, the developer experience and architectural patterns start to differ significantly. This article provides a practical comparison based on real implementation experience and aims to help developers choose the right approach. Approach 1: Agents SDK Agents SDK provides a straightforward way to create agents with integrated tools and models. Example: Creating an Agent from azure.ai.projects import AIProjectClient from azure.ai.agents.models import AzureAISearchTool, AzureAISearchQueryType from azure.identity import DefaultAzureCredential client = AIProjectClient(credential=DefaultAzureCredential(), endpoint=os.getenv("AZURE_AI_PROJECT_ENDPOINT")) # Configure tools ai_search = AzureAISearchTool( index_connection_id=conn_id, index_name="my-index", query_type=AzureAISearchQueryType.SEMANTIC, ) # Create agent (persisted in Foundry portal) agent = client.agents.create_agent( model=os.getenv("AZURE_AI_AGENT_DEPLOYMENT_NAME"), name="MyAgent", instructions="You are a helpful assistant.", tool_resources=ai_search.resources, tools=ai_search.definitions, ) # Run conversation thread = client.agents.threads.create() client.agents.messages.create(thread_id=thread.id, role="user", content="Hello") run = client.agents.runs.create(thread_id=thread.id, agent_id=agent.id) What this approach provides Native integration with Azure AI services (OpenAI, AI Search, MCP) Managed execution environment Simple and quick agent setup Conceptually, this approach can be summarized as: Model + Tools + Execution Strengths ✅ Rapid development and onboarding ✅ Strong integration within the Azure ecosystem ✅ Well-suited for single-agent or tool-driven use cases ✅ Minimal infrastructure overhead Challenges observed in practice As the complexity of scenarios increases, certain limitations become more visible: Multi-agent workflows require custom orchestration logic Agent handoffs must be implemented manually Context sharing across agents requires additional design effort While this approach offers flexibility, it shifts orchestration complexity to the developer. Approach 2: Microsoft Agent Framework (MAF) Microsoft Agent Framework introduces a higher-level abstraction, focused on agent orchestration and system design. Creating an Agent from agent_framework import Agent, WorkflowBuilder, Message from agent_framework.foundry import FoundryChatClient from azure.identity import DefaultAzureCredential client = FoundryChatClient( project_endpoint=os.getenv("FOUNDRY_PROJECT_ENDPOINT"), model=os.getenv("FOUNDRY_MODEL_DEPLOYMENT_NAME"), credential=DefaultAzureCredential(), ) # Create agents (in-process only, not persisted in portal) researcher = Agent(client, name="ResearcherAgent", instructions="Research topics thoroughly.") writer = Agent(client, name="WriterAgent", instructions="Write concise summaries.") # Build and run multi-agent workflow workflow = WorkflowBuilder(start_executor=researcher).add_edge(researcher, writer).build() async for event in workflow.run(Message("user", "Summarize migration best practices"), stream=True): print(event.content) What this approach provides Built-in orchestration capabilities Native support for multi-agent workflows Structured agent lifecycle management Context and memory handling Conceptually, this can be viewed as: Agents + Orchestration + System Design Observations from implementation When implementing similar use cases using MAF: Agent responsibilities became clearly defined Routing and delegation patterns were significantly simplified Overall system architecture became easier to maintain and scale This approach encourages thinking in terms of agent ecosystems rather than isolated agents. Architecture Comparison Agents SDK Microsoft Agent Framework (MAF) Choosing the Right Approach Use Agents SDK when: You need rapid development for a single-agent use case The workflow is relatively straightforward You prefer flexibility and lower-level control Use Microsoft Agent Framework when: You are designing multi-agent systems Your solution requires routing, delegation, or handoffs Long-term scalability and maintainability are essential Pros and Cons Summary Agents SDK Pros Easy to get started Strong Azure integration Flexible design Cons Manual orchestration required Limited native multi-agent support Complexity increases as scenarios grow Microsoft Agent Framework (MAF) Pros Built-in orchestration Native multi-agent support Scalable and structured architecture Cons Learning curve for new developers More opinionated framework design Reduced low-level control compared to SDK-based approach References and Repositories 🔗 Microsoft Agent Framework (MAF) Microsoft Agent Framework – GitHub Repository Microsoft Agent Framework Samples – Tutorials & Examples Workflow Samples (Multi-agent patterns) FoundryChatClient sample (Python) Agent Framework demos - GitHub Source 📘 Documentation Microsoft Agent Framework Overview (Microsoft Learn) Agent Framework + Microsoft Foundry provider docs 🔗 Azure AI Projects / Agents SDK Azure AI Projects SDK – Python (GitHub Source) Azure AI Projects Agents (.NET SDK repo) 📘 Documentation Azure AI Projects SDK (Python) – Microsoft Learn Azure AI Agents SDK – Microsoft Learn Conclusion Azure AI Projects and Microsoft Agent Framework both play important roles in the modern agent development landscape. Agents SDK enables quick and flexible agent development Microsoft Agent Framework enables structured, scalable agent systems In practice, the choice depends on whether you are building a single agent feature or a multi-agent system. Final Thought Agents SDK helps you get started quickly. Microsoft Agent Framework helps you scale with confidence In a follow-up blog, I’ll dive into how the M365 Agents SDK compares with Microsoft Agent Framework, especially in the context of enterprise productivity and Copilot experiences.Building AI Agents with Microsoft Foundry: A Progressive Lab from Hello World to Self-Hosted
AI agent development has a steep on-ramp. The combination of new SDKs, tool-calling patterns, model selection decisions, retrieval-augmented generation, and deployment concerns means most developers spend more time wiring things together than actually building anything useful. The Microsoft Foundry Agent Lab is a structured, open-source demo series designed to change that — nine self-contained demos, each adding exactly one new concept, all built on the same Microsoft Foundry SDK and a single model deployment. This post walks through what the lab contains, how each demo works under the hood, and the architectural decisions that make it a useful reference for AI engineers building production agents. Why a Progressive Lab? Agent frameworks can be overwhelming. A developer who opens a rich example with RAG, tool-calling, streaming, and a custom UI all at once has no clear line of sight to which parts are essential and which are embellishments. The Foundry Agent Lab takes the opposite approach: start with the absolute minimum and introduce one new primitive per demo. By the time you reach Demo 8, you have seen every major capability — not in one monolithic sample, but in a layered sequence where each addition is visible and understandable. # Demo New Concept Tool Used UX 0 hello-demo Agent creation, Responses API, conversations None Terminal 1 tools-demo Function calling, tool-calling loop, live API FunctionTool Terminal 2 desktop-demo UI decoupling — same agent, different surface None Desktop (Tkinter) 3 websearch-demo Server-side built-in tools, no client loop WebSearchTool Terminal 4 code-demo Code execution in sandbox, Gradio web UI CodeInterpreterTool Web (Gradio) 5 rag-demo Document upload, vector stores, RAG grounding FileSearchTool Terminal 6 mcp-demo MCP servers, human-in-the-loop approval MCPTool Terminal 7 toolbox-demo Centralized tool governance, Toolbox versioning Toolbox Terminal 8 hosted-demo Self-hosted agent with Responses protocol Custom server Terminal + Agent Inspector The Model Router: One Deployment to Rule Them All Before diving into the demos, it is worth understanding the one architectural decision that ties the entire lab together: every agent uses model-router as its model deployment. MODEL_DEPLOYMENT=model-router Model Router is a Microsoft Foundry capability that inspects each request at inference time and routes it to the optimal available model — weighing task complexity, cost, and latency. A simple factual question goes to a fast, cheap model. A complex tool-calling chain with code generation gets routed to a frontier model. You write zero routing logic. The lab's MODEL-ROUTER.md file contains empirical observations from running all nine demos. A sample of what the router selected: Demo Query Task Type Model Selected hello "What's the capital of WA state?" Factual recall grok-4-1-fast-reasoning hello "Summarize our conversation" Summarization gpt-5.2-chat-2025-12-11 tools "What's the weather in Seattle?" Tool-using gpt-5.4-mini-2026-03-17 code Data analysis with code generation Code generation + execution gpt-5.4-2026-03-05 rag HR policy document question Retrieval + synthesis gpt-5.3-chat-2026-03-03 This is the strongest signal in the lab: you do not need to reason about model selection. You declare what your agent needs to do; the router handles the rest, and it chooses correctly. Demo 0: The Minimum Viable Agent The hello-demo establishes the baseline pattern used by every subsequent demo. Two files: one to register the agent, one to chat with it. Registering the agent from azure.identity import DefaultAzureCredential from azure.ai.projects import AIProjectClient from azure.ai.projects.models import PromptAgentDefinition credential = DefaultAzureCredential() project = AIProjectClient(endpoint=PROJECT_ENDPOINT, credential=credential) agent = project.agents.create_version( agent_name=AGENT_NAME, definition=PromptAgentDefinition( model=MODEL_DEPLOYMENT, instructions="You are a helpful, friendly assistant.", ), ) Authentication uses DefaultAzureCredential , which works with az login locally and with managed identity in production — no API keys anywhere in the code. Chatting with the agent # Create a server-side conversation (persists history across turns) conversation = openai.conversations.create() # Each turn sends the user message; the agent sees full history response = openai.responses.create( input=user_input, conversation=conversation.id, extra_body={"agent_reference": {"name": AGENT_NAME, "type": "agent_reference"}}, ) print(response.output_text) The conversation object is server-side. You pass its ID on every turn; the history lives in Foundry, not in a local list. This is the Responses API pattern — distinct from the older Completions or Chat Completions APIs. Demo 1: Function Tools and the Tool-Calling Loop Demo 1 adds function calling against a real weather API. The key insight here is that the model does not execute the function — it requests the execution, and your code executes it locally, then feeds the result back. Declaring a function tool from azure.ai.projects.models import FunctionTool, PromptAgentDefinition func_tool = FunctionTool( name="get_weather", description="Get the current weather for a given city.", parameters={ "type": "object", "properties": {"city": {"type": "string", "description": "City name"}}, "required": ["city"], }, strict=True, ) agent = project.agents.create_version( agent_name=AGENT_NAME, definition=PromptAgentDefinition( model=MODEL_DEPLOYMENT, tools=[func_tool], instructions="You are a weather assistant...", ), ) The tool-calling loop response = openai.responses.create(input=user_input, conversation=conversation.id, ...) # Loop while the model is requesting tool calls while any(item.type == "function_call" for item in response.output): input_list = [] for item in response.output: if item.type == "function_call": args = json.loads(item.arguments) result = get_weather(args["city"]) # execute locally input_list.append(FunctionCallOutput(call_id=item.call_id, output=result)) # Send results back to the agent response = openai.responses.create(input=input_list, conversation=conversation.id, ...) print(response.output_text) The strict=True parameter on FunctionTool enforces structured outputs — the model must return arguments that match the declared JSON schema exactly. This eliminates argument parsing errors in production. Demo 2: UI Is Not Your Agent Demo 2 runs the exact same agent as Demo 1 but surfaces it in a Tkinter desktop window. The point is pedagogical: your agent definition, conversation management, and tool-calling logic are entirely independent of your UI layer. Swapping from terminal to desktop requires changing only the presentation code — nothing in the agent or conversation path changes. This is a principle worth internalising early: agent logic and UI logic should never be entangled. The lab enforces this separation structurally. Demo 3: Server-Side Built-In Tools The web search demo introduces a sharp contrast with Demo 1. With WebSearchTool , the tool-calling loop disappears entirely from client code: from azure.ai.projects.models import WebSearchTool agent = project.agents.create_version( agent_name="Search-Agent", definition=PromptAgentDefinition( model=MODEL_DEPLOYMENT, tools=[WebSearchTool()], instructions="You are a research assistant...", ), ) The agent decides when to search, executes the search server-side, and returns a grounded response with citations. Your client code looks identical to Demo 0 — a simple responses.create() call with no tool loop. The distinction matters architecturally: Function tools (Demo 1) — tool execution happens on your client; you control the code, the API call, the error handling. Built-in tools (Demo 3+) — tool execution happens inside Foundry; you get results without managing execution. Demo 4: Code Interpreter and the Gradio Web UI Demo 4 attaches CodeInterpreterTool , which gives the agent a sandboxed Python execution environment inside Foundry. The agent can write code, run it, observe output, and iterate — all server-side. Combined with a Gradio web interface, this demo shows an agent that can perform data analysis, generate charts, and explain results through a browser UI. Model Router is particularly interesting here: the empirical data shows it selects a more capable frontier model ( gpt-5.4-2026-03-05 ) for code-generation tasks, while simpler conversational turns stay on lighter models. Demo 5: Retrieval-Augmented Generation with FileSearchTool Demo 5 introduces RAG. The setup phase uploads a document, creates a vector store, and attaches it to the agent: # Upload document and create a vector store vector_store = openai.vector_stores.create(name="employee-handbook-store") with open("data/employee-handbook.md", "rb") as f: openai.vector_stores.files.upload_and_poll( vector_store_id=vector_store.id, file=f ) # Attach the vector store to the agent agent = project.agents.create_version( agent_name="RAG-Agent", definition=PromptAgentDefinition( model=MODEL_DEPLOYMENT, tools=[FileSearchTool(vector_store_ids=[vector_store.id])], instructions="Answer questions using only the provided documents...", ), ) At query time, the agent embeds the question, searches the vector store semantically, retrieves matching chunks, and generates an answer grounded in the retrieved content — entirely server-side. The client code remains a plain responses.create() call. An important detail: the .vector_store_id file is written to disk during setup and read back during the chat session, so the demo survives process restarts without re-uploading the document. The .gitignore excludes this file from source control. Demo 6: Model Context Protocol Demo 6 connects the agent to a GitHub MCP server, giving it access to repository and issue data via the open Model Context Protocol standard. MCP servers expose tools over a standardised wire protocol; the agent discovers and calls them without any client-side function declarations. The demo also demonstrates human-in-the-loop approval: before executing any MCP tool call, the agent surfaces the proposed action and waits for the user to confirm. This is an important safety pattern for agents that can trigger side effects on external systems. Demo 7: Toolbox — Centralised Tool Governance Where Demo 6 connects to a single MCP server directly, Demo 7 uses a Toolbox — a managed Microsoft Foundry resource that bundles multiple tools into a single, versioned, MCP-compatible endpoint. The Toolbox in this demo exposes both GitHub Issues and GitHub Repos tools, curated into an immutable versioned snapshot. This pattern is significant for production multi-agent systems: Centralised governance — one team owns the tool definitions; all agents consume them via a single endpoint. Versioned snapshots — promoting a new Toolbox version is explicit; agents pin to a version and upgrade intentionally. MCP compatibility — any MCP-capable agent or framework can connect, not just Foundry SDK agents. from azure.ai.projects.models import McpTool toolbox_tool = McpTool( server_label="toolbox", server_url=TOOLBOX_ENDPOINT, allowed_tools=[], # empty = all tools in the Toolbox version headers={"Authorization": f"Bearer {token}"}, ) Demo 8: Self-Hosted Agent with the Responses Protocol The final demo departs from the prompt-agent pattern. Instead of registering a declarative agent in Foundry, Demo 8 implements a custom agent server using the Responses protocol. The server exposes a streaming HTTP endpoint; Foundry's Agent Inspector can connect to it and route user turns to it just as it would to a hosted prompt agent. This demo includes a Dockerfile and an agent.yaml , enabling deployment to Foundry's container hosting service. It uses gpt-4.1-mini directly rather than the model router, because the custom server owns the entire inference path. When to consider this pattern: Your agent requires custom pre- or post-processing logic that cannot be expressed in a system prompt. You need to integrate with infrastructure that is not reachable through MCP or built-in tools. You want to own the inference call for cost control, A/B testing, or compliance reasons. You are building a multi-agent orchestrator that needs to expose itself as an agent to other orchestrators. Getting Started The lab requires Python 3.10 or higher, an Azure subscription with a Microsoft Foundry project, and the Azure CLI. 1. Clone and set up the virtual environment git clone https://github.com/microsoft-foundry/Foundry-Agent-Lab.git cd Foundry-Agent-Lab # Create and activate the virtual environment python -m venv .venv # Windows Command Prompt .venv\Scripts\activate.bat # Windows PowerShell .venv\Scripts\Activate.ps1 # macOS / Linux source .venv/bin/activate pip install -r requirements.txt 2. Configure a demo copy hello-demo\.env.sample hello-demo\.env # Edit hello-demo\.env and set PROJECT_ENDPOINT Your PROJECT_ENDPOINT is on the Overview page of your Foundry project in the Azure portal. It takes the form https://your-resource.ai.azure.com/api/projects/your-project . 3. Run the demo az login 0-hello-demo Each numbered batch file at the root activates the virtual environment, runs create_agent.py , and launches chat.py . Append log to capture the full session transcript: 0-hello-demo log Reset between runs hello-demo\reset.bat Every demo includes a reset.bat that deletes the registered agent and any associated resources (vector stores, uploaded files). Demos are fully repeatable. Architecture Principles Demonstrated Across the nine demos, the lab illustrates a set of design principles that apply directly to production agent systems: Keyless authentication throughout Every demo uses DefaultAzureCredential . No API keys appear anywhere in the code. Locally, az login provides credentials. In production, managed identity takes over automatically — same code, no secrets to rotate. Server-side conversation state The Responses API stores conversation history server-side. Your application passes a conversation ID; Foundry maintains the thread. This eliminates the common bug of truncating history due to local list management and makes multi-process or multi-instance deployments straightforward. Client-side vs server-side tool execution The lab makes the distinction explicit. Function tools execute in your process — you control the code, the external call, and the error handling. Built-in tools (WebSearch, CodeInterpreter, FileSearch) execute inside Foundry — you get results without managing execution infrastructure. MCP tools (Demo 6, 7) fall between these: they execute in a separately deployed server, with the protocol mediating the call. Progressive tool introduction Each demo's create_agent.py registers the agent once. The chat.py file handles the conversation loop. These two responsibilities are always separate, making it easy to update agent definitions without modifying conversation logic, and vice versa. Security Considerations When building agents for production, keep the following in mind: Never commit .env files. The .gitignore excludes them, but verify this before pushing. Use Azure Key Vault or environment variable injection in CI/CD pipelines. Use managed identity in production. DefaultAzureCredential automatically picks up managed identity when deployed to Azure, eliminating the need for any stored credentials. Apply human-in-the-loop for side-effecting tools. Demo 6 demonstrates this pattern for MCP tool calls. Any agent that can modify external state (create issues, send emails, write files) should surface proposed actions for confirmation. Validate tool outputs before use. Treat data returned by external tools (weather APIs, search results, document retrieval) as untrusted input. Prompt injection through tool results is a real attack surface; grounding instructions in your system prompt reduce but do not eliminate this risk. Scope Toolbox permissions narrowly. When using a Toolbox (Demo 7), use allowed_tools to restrict which tools the agent can call, rather than granting access to all tools in a Toolbox version. Key Takeaways Start with the minimum. A prompt agent with no tools requires fewer than 30 lines of code using the Foundry SDK. Add tools only when the use case demands them. Use model-router unless you have a specific reason not to. The empirical data in the lab shows the router selects appropriate models across all task types — factual, creative, tool-calling, RAG, and code generation. Understand the client/server tool boundary. Function tools give you control; built-in tools give you simplicity. MCP and Toolbox give you governance and interoperability. Choose based on where you need control and where you need scale. Conversation state belongs on the server. Do not maintain conversation history in application memory if you can avoid it. The Responses API conversation object is designed for this. The hosted-demo pattern is for when you need to own the inference path. For most use cases, a declarative prompt agent is sufficient and far simpler to operate. Next Steps Explore the repo: github.com/microsoft-foundry/Foundry-Agent-Lab Microsoft Foundry SDK documentation: learn.microsoft.com/azure/ai-studio/ Responses API quickstart: Prompt agent quickstart Model Router conceptual documentation: Model Router for Microsoft Foundry Model Context Protocol: modelcontextprotocol.io Azure Identity SDK (DefaultAzureCredential): azure-identity Python SDK The Foundry Agent Lab is open source under the MIT licence. Contributions, bug reports, and feature requests are welcome through GitHub Issues. See CONTRIBUTING.md for guidelines.OIDC vs SPN: Securing Azure Deployments with GitHub Actions & Terraform
From Secrets to Trust: Modernizing CI/CD Authentication When building infrastructure pipelines on Microsoft Azure using GitHub Actions and Terraform, one design choice quietly determines your entire security posture: How does your pipeline authenticate to Azure? For years, the answer was simple: Use a Service Principal (SPN) Store a client secret in GitHub Authenticate using credentials It works—but it doesn’t scale securely. This article walks through a real, production-ready implementation comparing: SPN (Client Secret – legacy pattern) OIDC (Federated Identity – modern standard) Backed by a working repo: WorkFlowBasedDeployment Architecture Overview This repository implements a workflow-driven Terraform deployment model with modular Azure infrastructure. Repository Structure .github/workflows/ deploy-infrastructure.yml # OIDC deployment deploy-infrastructure-spn.yml # SPN deployment destroy-infrastructure.yml # OIDC destroy destroy-infrastructure-spn.yml # SPN destroy Deployment/ main.tf providers.tf variables.tf terraform.tfvars modules/ Azure Resources Provisioned Resource Module Resource Group Virtual Network + NSGs vnet rg-network Storage Account sa rg-data Container Apps containerapps rg-compute AI Foundry aifoundry rg-data AI Search aisearch rg-data Azure Container Registry acr rg-compute Key Vault azkeyvault rg-data Monitoring azmonitor rg-compute Private Endpoints private_endpoints rg-network Authentication Models Service Principal (SPN) – The Traditional Way How it works Create App Registration Generate client secret Store it in GitHubTerraform authenticates using environment variables env: ARM_CLIENT_ID: ${{ secrets.AZURE_CLIENT_ID }} ARM_CLIENT_SECRET: ${{ secrets.AZURE_CLIENT_SECRET }} ARM_TENANT_ID: ${{ secrets.AZURE_TENANT_ID }} The problem Risk Impact Long-lived secrets Can be leaked Manual rotation Operational burden Repo compromise Full environment exposure This model is still supported—but increasingly considered legacy for secure pipelines. OIDC (OpenID Connect) – The Modern Approach How it works GitHub Actions generates a short-lived identity token Microsoft Entra ID validates it Azure issues a temporary access token Terraform executes using that token No secrets. No storage. No rotation. Authentication Models Compared OIDC Flow (Mental Model) Think of OIDC like this: GitHub → Identity Provider Azure → Trust Authority Workflow → Temporary Identity OIDC Implementation (From the Repo) Workflow Configuration permissions: id-token: write contents: read env: ARM_CLIENT_ID: ${{ secrets.AZURE_CLIENT_ID }} ARM_SUBSCRIPTION_ID: ${{ secrets.AZURE_SUBSCRIPTION_ID }} ARM_TENANT_ID: ${{ secrets.AZURE_TENANT_ID }} ARM_USE_OIDC: true Azure Login - name: Azure Login (OIDC) uses: azure/login@v2 with: client-id: ${{ secrets.AZURE_CLIENT_ID }} tenant-id: ${{ secrets.AZURE_TENANT_ID }} subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }} Backend (Terraform State with OIDC) terraform init \ -backend-config="use_oidc=true" Even your state storage is secretless Azure Setup for OIDC Create App Registration No client secret required Configure Federated Credential Example: Issuer: https://token.actions.githubusercontent.com Subject: repo:<org>/<repo>:ref:refs/heads/master You can restrict by: Branch Environment Repository Assign RBAC: Grant roles like: Contributor Or scoped resource-level access CI/CD Workflow Design Both SPN and OIDC pipelines follow a 2-stage pattern: Plan Stage terraform fmt terraform validate terraform plan Upload plan artifact Apply Stage Triggered only on main Downloads plan Runs apply -auto-approve Protected via environment approvals This ensures safe, auditable deployments OIDC vs SPN — Real Comparison Feature SPN OIDC Secrets Stored in GitHub None Token lifetime Long-lived Short-lived Rotation Manual Not required Security Medium High Setup Simple Slightly complex Recommended No Yes Common Pitfalls (Real-World Lessons) Missing id-token permission Without this, OIDC fails silently. Federated credential mismatch Wrong branch Incorrect repo name Case sensitivity issues Azure rejects the token completely. RBAC delay Role assignments can take time → causes confusing failures. Backend misconfiguration Forgetting use_oidc=true breaks Terraform state auth. Debugging Tips Enable debug logs in GitHub Actions Check Sign-in logs in Microsoft Entra ID Validate federated credential subject format Always isolate: Identity issue vs Permission issue Migration Strategy (SPN → OIDC) A safe transition looks like this: Keep SPN as fallback Add OIDC alongside Test in DEV environment Remove client secret Revoke old credentials No downtime, no risk. Where This Fits in Modern Azure Architecture This pattern integrates naturally with: Azure Container Apps AI/ML workloads (AI Foundry, Search) Multi-environment deployments Zero-trust enterprise architectures Authentication becomes identity-driven, not secret-driven When NOT to Use OIDC Legacy CI/CD systems without OIDC support Organisations with strict identity federation constraints Cross-tenant scenarios with limited trust setup Note: These cases are becoming increasingly rare in modern cloud setups. Security Perspective Threat SPN Risk OIDC Risk Secret leak High None Credential reuse High Low Token replay Possible Limited Repo compromise Full access Scoped Final Takeaway This repository demonstrates a key shift in modern DevOps: Secrets were a workaround for identity. OIDC replaces that workaround with trust. By combining: GitHub Actions OIDC federation Azure RBAC You get: Secure pipelines Scalable deployments Zero secret management In enterprise environments, moving to OIDC can eliminate secret rotation pipelines entirely, reducing operational overhead and significantly lowering breach risk. Reference Implementation GitHub Repository: WorkFlowBasedDeployment Closing Thought OIDC doesn’t just improve authentication, it fundamentally changes how trust is established in cloud systems. In a world moving toward zero-trust architectures, identity is the new perimeter and OIDC is how you enforce it.How to Visualize Your Azure AI Workloads Usage for Observability
This article assumes you already have an Azure Foundry project and resource deployed in Microsoft Foundry. The options referenced here are documented in detail in the linked articles; this post serves as a consolidated step by step guide bringing them all together and explaining where each option is most useful. A Summary: Need Best Option Quick day-over-day visual, minimal setup Grafana Dashboard (Option 3) Custom growth % calculations App Insights + KQL in Log Analytics (Option 4) Shareable, interactive report Azure Workbooks (Option 5) Per-user/per-agent granularity APIM + App Insights (Option 6) Quick one-off chart, export to Excel Microsoft Foundry Monitor tab or App Insights Metrics Explorer (Option 1 and 2) Option 1. Within the Microsoft Foundry Portal (Quickest, No Setup) If you have models deployed in Microsoft Foundry and would like to monitor its usage, go to the New Foundry Portal → Build → Models → Monitor tab. View metrics such as: Estimated cost Total token usage Input vs. output tokens Number of requests This is the simplest way to monitor both model and agent usage. For PAYG plans: You can also view your total allocated quota (and figure out which Tier you are on) using the Quota Management Screen (New Foundry Portal → Operate → Quota tab). This screen shows how much your total allocated quota is, per model in a given subscription + region + Deployment Type (Global, Data Zones or Regional). For eg., in the image below, for gpt-4o, I am allocated 7M total TPM in my subscription. I am only using 150K TPM of the allocated 7M TPM amount. Which means, my requests will get throttled if I exceed the 150K TPM limit. To avoid throttling, I would need to increase my shared allocation limit. NOTE: you are charged for usage, so if you allow more capacity, you use more, so you pay more. Option 2: Azure Monitor Metrics Explorer This is already built into the Azure Portal and gives you time-series charts out of the box. Go to Azure Portal → your Azure OpenAI / Foundry resource → Monitoring → Metrics Select a metric like AzureOpenAIRequests or TokenTransaction Set Aggregation to Sum (total) or Max and Time granularity to 1 day Split by ModelDeploymentName to see per-model trends Adjust the time range (e.g., last 30 days) — you'll see day-over-day bars/lines Tip: You can pin these charts to an Azure Dashboard for a persistent view, or click Share → Download to Excel to get the raw data for your own analysis. Option 3: Azure Managed Grafana (Best Pre-Built Dashboard) This is the best option for a polished, real-time, day-over-day dashboard with no custom code. There's a pre-built AI Foundry dashboard ready to import. [grafana.com], [Create a M...ed Grafana] How to set it up: Create an Azure Managed Grafana workspace (if you don't have one) In Grafana, go to Dashboards → New → Import → enter dashboard ID 24039 (for Foundry) Select your Azure Monitor data source and point it to your Foundry resource Tip: You can also import this directly from the Azure Portal: Monitor → Dashboards with Grafana → AI Foundry. That's it — the dashboard gives you (per model deployment): Token trends over time (inference, prompt, completion — day over day) Request trends over time (AzureOpenAIRequests as a time series) Latency trends (bonus) NOTE: Default time range is 7 days — adjust to 30/60/90 days for growth trends Option 4: Application Insights + KQL Queries (Most Flexible, Custom Reports) If you want fully custom day-over-day growth calculations (e.g., % change day-to-day), this is the way. [azurefeeds.com] Setup: Ensure your Foundry project is connected to an Application Insights resource (Foundry → Settings → Connected Resources). Open up App Insights resource → Logs → New Query or choose a sample query. In the images below, we simply ran 'requests' and set the time range to 24 hours. There is also a Kusto Query Language (KQL) mode or Simple mode on the right-hand side: Simple mode will let you run out of the box samples. KQL mode will open up a query window for you to enter custom queries. Below are the results in grid view. Same view but showing a chart: Export options: Another way to get the above graphs are via Log Analytics. Simply enable Diagnostic Settings on your Azure OpenAI resource → send to a Log Analytics workspace. Open Log Analytics → Logs and try our your sample queries. Sample KQL for day-over-day token usage (adjust to your needs): AzureMetrics | where MetricName in ("TokenTransaction", "ProcessedPromptTokens", "GeneratedTokens") | where TimeGenerated > ago(30d) | summarize DailyTokens = sum(Total) by bin(TimeGenerated, 1d), MetricName | order by TimeGenerated asc | render timechart Result: Sample KQL for day-over-day growth % (adjust to your needs): AzureMetrics | where MetricName == "TokenTransaction" | where TimeGenerated > ago(30d) | summarize DailyTokens = sum(Total) by Day = bin(TimeGenerated, 1d) | sort by Day asc | extend PrevDay = prev(DailyTokens) | extend GrowthPct = round((DailyTokens - PrevDay) / PrevDay * 100, 2) | project Day, DailyTokens, GrowthPct Option 5: Azure Monitor Workbooks (Custom Dashboards, Shareable) Workbooks let you build interactive, parameterized dashboards that combine metrics and KQL logs. What's more, you can select resources from multiple subscriptions and visualize them all in one place using Workbooks! Go to Azure Portal → Monitor → Workbooks → New Add a Metrics query panel → select your Log Analytics or App Insights or Foundry resource -> Enter the same query you used in Option 4. Do a test run and view the graphs (this can be viewed as charts or a list (grid view)): 4. Save and share with your team. Option 6: APIM + Application Insights (Granular Per-Caller/Per-Agent Tracking) 1. If your app routes requests through Azure API Management, you can use the azure-openai-emit-token-metric policy to send per-request token metrics to Application Insights with custom dimensions (User ID, Subscription ID, Agent, etc.). [Azure API...osoft Docs] This is ideal for scenarios like: "Which agent consumed the most tokens last week?" "What's the token usage per API consumer/team?" NOTE: Microsoft Foundry resources do not track usage by users. So, fronting your Foundry resource with an APIM could be a way to track users provided you pass the username/id in the request context. How you implement this is upto your app design. Ref: AI-Gateway/labs/token-metrics-emitting/token-metrics-emitting.ipynb at main · Azure-Samples/AI-Gateway · GitHub Bonus: Check out all other APIM + AI related policies here: AI-Gateway/labs/semantic-caching at main · Azure-Samples/AI-Gateway AI-Gateway/labs/token-rate-limiting at main · Azure-Samples/AI-Gateway AI-Gateway/labs/token-metrics-emitting/token-metrics-emitting.ipynb at main · Azure-Samples/AI-Gateway · GitHubMoving Beyond Prompts: A Practical Introduction to Spec-Driven Development
In the last year, many of us have started writing code differently. We describe what we want, let AI generate an answer, review it, tweak the prompt, and try again. This loop—prompt, retry, adjust—has quietly become part of our daily workflow. At first, it feels incredibly productive. But as the complexity of the task increases, something changes. The iteration cycle becomes longer, outputs become inconsistent, and the effort shifts from solving the problem to refining the prompt. This is where a subtle but important shift in approach can help: moving from prompt-driven development to spec-driven development. The Problem: Prompt → Retry → Guess Most AI-assisted workflows today look something like this: Write a prompt describing the task Review the generated output Adjust the prompt Repeat until it looks acceptable In practice, this often simplifies to: Prompt → Retry → Guess Figure: Prompt-driven vs spec-driven workflow comparison For simple tasks, this works well. But for anything involving multiple inputs, constraints, or edge cases, the process can become unpredictable. In my experience, the challenge is not the model—it is the lack of structure in how we describe the problem. A Shift in Thinking: From Prompts to Specifications Instead of asking AI to “figure it out,” spec-driven development introduces a simple idea: Define the problem clearly before asking for a solution. A specification (spec) is not a long document—it is a structured way of describing: Inputs Outputs Constraints Edge cases When this structure is provided upfront, the interaction changes significantly. Rather than iterating on vague prompts, you are guiding the system with a clear contract. What This Looks Like in Practice Let’s take a simple example: an order summary API (for example, a backend service hosted on Azure App Service). Without a Spec (Typical Prompt) “Write an API that returns order details for a user.” A model can generate something reasonable, but in practice, the responses often vary: Field names may be inconsistent Pagination may be missing Edge cases (no orders, large datasets) may not be handled Structure may change across iterations Example response (typical output): { "userId": 123, "orders": [ { "id": 1, "amount": 250 } ] } With a Spec (Structured Input) Now consider providing a simple specification: Specification: Input: userId page pageSize Output: userId orders[] orderId totalAmount orderDate pagination page pageSize totalRecords Constraints: Default pageSize = 10 Return empty list if no orders Handle large datasets efficiently Example response (based on the spec): { "userId": 123, "orders": [ { "orderId": 1, "totalAmount": 250, "orderDate": "2024-01-10" } ], "pagination": { "page": 1, "pageSize": 10, "totalRecords": 50 } } Why This Tends to Work The difference here is not just stylistic—it is structural. An unstructured prompt leaves room for interpretation. A spec reduces ambiguity by defining expectations explicitly. In practice, I have observed that providing structured inputs like this often leads to the following: More consistent field naming Better handling of edge cases Reduced need for repeated prompt refinement Rather than relying on trial-and-error, the interaction becomes more predictable and aligned with expectations. Applying This to Existing Code (Refactor Scenario) This approach becomes even more useful when applied to existing code. Instead of asking: “Fix the bug in the Auth controller” You can define expected behavior: Input validation rules Response formats Error handling Authorization behavior The task then becomes aligning the implementation with the defined spec. This shifts the interaction from guesswork to validation—comparing current behavior with intended behavior. Example Comparison (Auth Scenario) Without Spec (Typical Prompt) “Fix the login issue in Auth controller” Possible outcomes include: Partial validation added Inconsistent error responses No clear handling of repeated failed attempts With Spec (Defined Behavior) Spec defines: Validate username and password Return consistent error responses Lock account after 5 failed attempts Do not expose internal errors Resulting behavior: Input validation is consistently applied Error responses follow a defined structure Edge cases like account lockout are handled explicitly This mirrors the same pattern seen in the API example—moving from ambiguity to clearly defined behavior. A Practical Way to Start You do not need new tools or frameworks to try this. A simple workflow that has worked well in practice: Ask – Describe the problem (prompt, discussion, or notes) Write a spec – Define inputs, outputs, constraints Refine – Remove ambiguity Generate – Use the spec as input Validate – Compare output with the spec This adds a small upfront step, but it often reduces back-and-forth iterations later. The Practical Challenge One important point to note: Writing a good spec requires understanding the problem. Spec-driven development does not eliminate complexity—it surfaces it earlier. In many cases, the hardest part is not writing code, but clearly defining: What the system should do What it should not do How it should behave under edge conditions This is also why specs evolve over time. They do not need to be perfect upfront. They improve as your understanding improves. Where This Approach Helps From what I have seen, this approach is most useful in scenarios where the problem involves multiple inputs, defined contracts, or structured outputs such as APIs, schema-driven systems, or refactoring existing code where consistency matters. Where It May Not Be Necessary For simpler tasks such as small scripts, minor UI changes, or quick experiments, a detailed specification may not add much value. In those cases, a straightforward prompt is often sufficient. A Note on Tools Tools like GitHub Copilot, Azure AI Studio, and AI-assisted workflows in Visual Studio Code tend to be more effective when given clear, structured inputs. Spec-driven development is not tied to any specific tool. It is a way of thinking about how we interact with these systems more effectively. References https://github.com/features/copilot https://platform.openai.com/docs/guides/prompt-engineering https://github.com/github/spec-kit Amplifier - Modular AI Agent Framework - Amplifier Final Thoughts Many discussions around AI-assisted development focus on what tools can do. This approach focuses on something slightly different: How developers can structure problems more effectively before implementation. In my experience, moving from prompts to specs does not eliminate iteration, but it makes that iteration more predictable and purposeful.High Availability Testing for Azure Kubernetes Service in a Single Region with Availability Zones
Perform repeatable high availability testing for Azure Kubernetes Service workloads deployed in a single Azure region with Availability Zones. Validate pod recovery, node disruptions, and workload resiliency during infrastructure failures.Stop Experimenting, Start Building: AI Apps & Agents Dev Days Has You Covered
The AI landscape has shifted. The question is no longer “Can we build AI applications?” it’s “Can we build AI applications that actually work in production?” Demos are easy. Reliable, scalable, resilient AI systems that handle real-world complexity? That’s where most teams struggle. If you’re an AI developer, software engineer, or solution architect who’s ready to move beyond prototypes and into production-grade AI, there’s a series built specifically for you. What Is AI Apps & Agents Dev Days? AI Apps & Agents Dev Days is a monthly technical series from Microsoft Reactor, delivered in partnership with Microsoft and NVIDIA. You can explore the full series at https://developer.microsoft.com/en-us/reactor/series/s-1590/ This isn’t a slide deck marathon. The series tagline says it best: “It’s not about slides, it’s about building.” Each session tackles real-world challenges, shares patterns that actually work, and digs into what’s next in AI-driven app and agent design. You bring your curiosity, your code, and your questions. You leave with something you can ship. The sessions are led by experienced engineers and advocates from both Microsoft and NVIDIA, people like Pamela Fox, Bruno Capuano, Anthony Shaw, Gwyneth Peña-Siguenza, and solutions architects from NVIDIA’s Cloud AI team. These aren’t theorists; they’re practitioners who build and ship the tools you use every day. What You’ll Learn The series covers the full spectrum of building AI applications and agent-based systems. Here are the key themes: Building AI Applications with Azure, GitHub, and Modern Tooling Sessions walk through how to wire up AI capabilities using Azure services, GitHub workflows, and the latest SDKs. The focus is always on code-first learning, you’ll see real implementations, not abstract architecture diagrams. Designing and Orchestrating AI Agents Agent development is one of the series’ strongest threads. Sessions cover how to build agents that orchestrate long-running workflows, persist state automatically, recover from failures, and pause for human-in-the-loop input, without losing progress. For example, the session “AI Agents That Don’t Break Under Pressure” demonstrates building durable, production-ready AI agents using the Microsoft Agent Framework, running on Azure Container Apps with NVIDIA serverless GPUs. Scaling LLM Inference and Deploying to Production Moving from a working prototype to a production deployment means grappling with inference performance, GPU infrastructure, and cost management. The series covers how to leverage NVIDIA GPU infrastructure alongside Azure services to scale inference effectively, including patterns for serverless GPU compute. Real-World Architecture Patterns Expect sessions on container-based deployments, distributed agent systems, and enterprise-grade architectures. You’ll learn how to use services like Azure Container Apps to host resilient AI workloads, how Foundry IQ fits into agent architectures as a trusted knowledge source, and how to make architectural decisions that balance performance, cost, and scalability. Why This Matters for Your Day Job There’s a critical gap between what most AI tutorials teach and what production systems actually require. This series bridges that gap: Production-ready patterns, not demos. Every session focuses on code and architecture you can take directly into your projects. You’ll learn patterns for state persistence, failure recovery, and durable execution — the things that break at 2 AM. Enterprise applicability. The scenarios covered — travel planning agents, multi-step workflows, GPU-accelerated inference — map directly to enterprise use cases. Whether you’re building internal tooling or customer-facing AI features, the patterns transfer. Honest trade-off discussions. The speakers don’t shy away from the hard questions: When do you need serverless GPUs versus dedicated compute? How do you handle agent failures gracefully? What does it actually cost to run these systems at scale? Watch On-Demand, Build at Your Own Pace Every session is available on-demand. You can watch, pause, and build along at your own pace, no need to rearrange your schedule. The full playlist is available at This is particularly valuable for technical content. Pause a session while you replicate the architecture in your own environment. Rewind when you need to catch a configuration detail. Build alongside the presenters rather than just watching passively. What You’ll Walk Away Wit After working through the series, you’ll have: Practical agent development skills — how to design, orchestrate, and deploy AI agents that handle real-world complexity, including state management, failure recovery, and human-in-the-loop patterns Production architecture patterns — battle-tested approaches for deploying AI workloads on Azure Container Apps, leveraging NVIDIA GPU infrastructure, and building resilient distributed systems Infrastructure decision-making confidence — a clearer understanding of when to use serverless GPUs, how to optimise inference costs, and how to choose the right compute strategy for your workload Working code and reference implementations — the sessions are built around live coding and sample applications (like the Travel Planner agent demo), giving you starting points you can adapt immediately A framework for continuous learning — with new sessions each month, you’ll stay current as the AI platform evolves and new capabilities emerge Start Building The AI applications that will matter most aren’t the ones with the flashiest demos — they’re the ones that work reliably, scale gracefully, and solve real problems. That’s exactly what this series helps you build. Whether you’re designing your first AI agent system or hardening an existing one for production, the AI Apps & Agents Dev Days sessions give you the patterns, tools, and practical knowledge to move forward with confidence. Explore the series at https://developer.microsoft.com/en-us/reactor/series/s-1590/ and start watching the on-demand sessions at the link above. The best time to level up your AI engineering skills was yesterday. The second-best time is right now and these sessions make it easy to start.If You're Building AI on Azure, ECS 2026 is Where You Need to Be
Let me be direct: there's a lot of noise in the conference calendar. Generic cloud events. Vendor showcases dressed up as technical content. Sessions that look great on paper but leave you with nothing you can actually ship on Monday. ECS 2026 isn't that. As someone who will be on stage at Cologne this May, I can tell you the European Collaboration Summit combined with the European AI & Cloud Summit and European Biz Apps Summit is one of the few events I've seen where engineers leave with real, production-applicable knowledge. Three days. Three summits. 3,000+ attendees. One of the largest Microsoft-focused events in Europe, and it keeps getting better. If you're building AI systems on Azure, designing cloud-native architectures, or trying to figure out how to take your AI experiments to production — this is where the conversation is happening. What ECS 2026 Actually Is ECS 2026 runs May 5–7 at Confex in Cologne, Germany. It brings together three co-located summits under one roof: European Collaboration Summit — Microsoft 365, Teams, Copilot, and governance European AI & Cloud Summit — Azure architecture, AI agents, cloud security, responsible AI European BizApps Summit — Power Platform, Microsoft Fabric, Dynamics For Azure engineers and AI developers, the European AI & Cloud Summit is your primary destination. But don't ignore the overlap, some of the most interesting AI conversations happen at the intersection of collaboration tooling and cloud infrastructure. The scale matters here: 3,000+ attendees, 100+ sessions, multiple deep-dive tracks, and a speaker lineup that includes Microsoft executives, Regional Directors, and MVPs who have built, broken, and rebuilt production systems. The Azure + AI Track - What's Actually On the Agenda The AI & Cloud Summit agenda is built around real technical depth. Not "intro to AI" content, actual architecture decisions, patterns that work, and lessons from things that didn't. Here's what you can expect: AI Agents and Agentic Systems This is where the energy is right now, and ECS is leaning in. Expect sessions covering how to design agent workflows, chain reasoning steps, handle memory and state, and integrate with Azure AI services. Marco Casalaina, VP of Products for Azure AI at Microsoft, is speaking if you want to understand the direction of the Azure AI platform from the people building it, this is a direct line. Azure Architecture at Scale Cloud-native patterns, microservices, containers, and the architectural decisions that determine whether your system holds up under real load. These sessions go beyond theory you'll hear from engineers who've shipped these designs at enterprise scale. Observability, DevOps, and Production AI Getting AI to production is harder than the demos suggest. Sessions here cover monitoring AI systems, integrating LLMs into CI/CD pipelines, and building the operational practices that keep AI in production reliable and governable. Cloud Security and Compliance Security isn't optional when you're putting AI in front of users or connecting it to enterprise data. Tracks cover identity, access patterns, responsible AI governance, and how to design systems that satisfy compliance requirements without becoming unmaintainable. Pre-Conference Deep Dives One underrated part of ECS: the pre-conference workshops. These are extended, hands-on sessions typically 3–6 hours that let you go deep on a single topic with an expert. Think of them as intensive short courses where you can actually work through the material, not just watch slides. If you're newer to a particular area of Azure AI, or you want to build fluency in a specific pattern before the main conference sessions, these are worth the early travel. The Speaker Quality Is Different Here The ECS speaker roster includes Microsoft executives, Microsoft MVPs, and Regional Directors, people who have real accountability for the products and patterns they're presenting. You'll hear from over 20 Microsoft speakers: Marco Casalaina — VP of Products, Azure AI at Microsoft Adam Harmetz — VP of Product at Microsoft, Enterprise Agent And dozens of MVPs and Regional Directors who are in the field every day, solving the same problems you are. These aren't keynote-only speakers — they're in the session rooms, at the hallway track, available for real conversations. The Hallway Track Is Not a Cliché I know "networking" sounds like a corporate afterthought. At ECS it genuinely isn't. When you put 3,000 practitioners, engineers, architects, DevOps leads, security specialists in one venue for three days, the conversations between sessions are often more valuable than the sessions themselves. You get candid answers to "how are you actually handling X in production?" that you won't find in documentation. The European Microsoft community is tight-knit and collaborative. ECS is where that community concentrates. Why This Matters Right Now We're in a period where AI development is moving fast but the engineering discipline around it is still maturing. Most teams are figuring out: How to move from AI prototype to production system How to instrument and observe AI behaviour reliably How to design agent systems that don't become unmaintainable How to satisfy security and compliance requirements in AI-integrated architectures ECS 2026 is one of the few places where you can get direct answers to these questions from people who've solved them — not theoretically, but in production, on Azure, in the last 12 months. If you go, you'll come back with practical patterns you can apply immediately. That's the bar I hold events to. ECS consistently clears it. Register and Explore the Agenda Register for ECS 2026: ecs.events Explore the AI & Cloud Summit agenda: cloudsummit.eu/en/agenda Dates: May 5–7, 2026 | Location: Confex, Cologne, Germany Early registration is worth it the pre-conference workshops fill up. And if you're coming, find me, I'll be the one talking too much about AI agents and Azure deployments. See you in Cologne.MCP Demystified: Tools vs Resources vs Prompts Explained Simply
Introduction When developers start working with Model Context Protocol (MCP), one of the most confusing parts is understanding the difference between MCP Tools, Resources, and Prompts. All three are important components in modern AI application development, but they serve completely different purposes. In real-world AI systems like chatbots, AI agents, and copilots, using these components correctly can make your application scalable, clean, and easy to maintain. If used incorrectly, it can lead to confusion, bugs, and poor system design. In this article, we will clearly explain the difference between MCP Tools, Resources, and Prompts in simple words, using real-world examples and practical explanations. This guide is helpful for both beginner and intermediate developers working with AI and MCP. What Are MCP Tools? MCP Tools are functions or services that an AI model can use to perform real-world actions. These actions usually involve doing something outside the AI system, such as calling an API, updating a database, or sending a message. In simple terms, Tools represent what the AI can do. Real-World Analogy Think of MCP Tools like service workers in a company. For example, a delivery person delivers packages, a support agent updates tickets, and a payment system processes transactions. Similarly, MCP Tools perform specific tasks when requested by the AI. Examples of MCP Tools A tool that fetches user details from a database A tool that sends emails or notifications A tool that creates or updates support tickets A tool that calls third-party APIs like payment gateways A tool that triggers workflows in enterprise systems Key Understanding Tools are action-based. They execute operations and return results. Whenever your AI needs to "do something," you should use a Tool. What Are MCP Resources? MCP Resources are data sources that the AI model can access to read information. These are typically read-only and provide context or knowledge to the AI. In simple terms, Resources represent what the AI can read or see. Real-World Analogy Think of MCP Resources like books in a library or documents in a company. You can read and learn from them, but you cannot directly change their content. Examples of MCP Resources A database table containing customer information A knowledge base with FAQs and documentation System logs that track user activity Configuration files or static datasets Company policy documents or guidelines Key Understanding Resources are data-based. They provide information but do not perform any action. Whenever your AI needs information to make a decision, you should use a Resource. What Are MCP Prompts? MCP Prompts are structured instructions or templates that guide how the AI model should think, behave, and respond. In simple terms, Prompts represent how you instruct the AI. Real-World Analogy Think of Prompts like instructions given to an employee. For example, “Write a professional email,” “Summarize this report,” or “Answer politely to the customer.” These instructions shape how the output is generated. Examples of MCP Prompts A prompt to summarize customer feedback A prompt to generate a support response in a polite tone A prompt to analyze data and provide insights A prompt to translate text into another language A prompt to generate code based on requirements Key Understanding Prompts are instruction-based. They define how the AI should process input and generate output. Key Differences Between MCP Tools, Resources, and Prompts Understanding the difference between MCP Tools, Resources, and Prompts is important for building scalable AI systems. Tools vs Resources vs Prompts Tools are used for performing actions Resources are used for reading data Prompts are used for guiding AI behavior Detailed Comparison Tools interact with external systems and can change data or trigger operations Resources only provide data and do not modify anything Prompts control how the AI thinks, responds, and formats its output Comparison Table Aspect MCP Tools MCP Resources MCP Prompts Purpose Perform actions Provide data Guide behavior Nature Active Passive Instructional Usage API calls, updates Data reading AI response generation Output Action result Data Generated content How MCP Tools, Resources, and Prompts Work Together In real-world AI systems, these three components are used together to create powerful workflows. Step-by-Step Flow The user sends a request to the AI system The Prompt defines how the AI should understand and respond The AI fetches required information from Resources If an action is required, the AI uses a Tool The AI combines everything and generates a final response Practical Example Consider an AI customer support system: The Prompt ensures the response is polite and helpful The Resource provides customer history and previous tickets The Tool updates the ticket status or sends an email notification This combination helps build intelligent, real-world AI applications. Advantages of Understanding MCP Concepts Helps developers design clean and scalable AI architecture Improves clarity in system design and reduces confusion Enhances performance by separating responsibilities Makes debugging and maintenance easier Supports faster development of AI-powered applications Common Mistakes Developers Make Using Tools when only data retrieval is needed Treating Resources as editable systems Writing vague or unclear Prompts Mixing responsibilities between Tools, Resources, and Prompts Not structuring MCP components properly in applications Best Practices for Using MCP Tools, Resources, and Prompts Clearly define the role of each component before implementation Use Tools only for actions that change system state or trigger operations Use Resources strictly for reading and retrieving data Write clear, specific, and well-structured Prompts Test Tools, Resources, and Prompts independently before integration Keep your architecture modular and easy to scale Summary Understanding the difference between MCP Tools, Resources, and Prompts is essential for modern AI application development using Model Context Protocol. Tools allow AI systems to perform actions, Resources provide the necessary data, and Prompts guide how the AI behaves and generates responses. When these components are used correctly, developers can build scalable, efficient, and intelligent AI systems. Mastering these MCP concepts will help you design better architectures and create powerful AI-driven applications in today’s evolving technology landscape.Architecting Secure and Trustworthy AI Agents with Microsoft Foundry
Co-Authored by Avneesh Kaushik Why Trust Matters for AI Agents Unlike static ML models, AI agents call tools and APIs, retrieve enterprise data, generate dynamic outputs, and can act autonomously based on their planning. This introduces expanded risk surfaces: prompt injection, data exfiltration, over-privileged tool access, hallucinations, and undetected model drift. A trustworthy agent must be designed with defense-in-depth controls spanning planning, development, deployment, and operations. Key Principles for Trustworthy AI Agents Trust Is Designed, Not Bolted On- Trust cannot be added after deployment. By the time an agent reaches production, its data flows, permissions, reasoning boundaries, and safety posture must already be structurally embedded. Trust is architecture, not configuration. Architecturally this means trust must exist across all layers: Layer Design-Time Consideration Model Safety-aligned model selection Prompting System prompt isolation & injection defenses Retrieval Data classification & access filtering Tools Explicit allowlists Infrastructure Network isolation Identity Strong authentication & RBAC Logging Full traceability Implementing trustworthy AI agents in Microsoft Foundry requires embedding security and control mechanisms directly into the architecture. Secure-by-design approach- includes using private connectivity where supported (for example, Private Link/private endpoints) to reduce public exposure of AI and data services, enforcing managed identities for tool and service calls, and applying strong security trimming for retrieval (for example, per-document ACL filtering and metadata filters), with optional separate indexes by tenant or data classification when required for isolation. Sensitive credentials and configuration secrets should be stored in Azure Key Vault rather than embedded in code, and content filtering should be applied pre-model (input), post-model (output), to screen unsafe prompts, unsafe generations, and unsafe tool actions in real time. Prompt hardening- further reduces risk by clearly separating system instructions from user input, applying structured tool invocation schemas instead of free-form calls, rejecting malformed or unexpected tool requests, and enforcing strict output validation such as JSON schema checks. Threat Modeling -Before development begins, structured threat modeling should define what data the agent can access, evaluate the blast radius of a compromised or manipulated prompt, identify tools capable of real-world impact, and assess any regulatory or compliance exposure. Together, these implementation patterns ensure the agent is resilient, controlled, and aligned with enterprise trust requirements from the outset. Observability Is Mandatory - Observability converts AI from a black box into a managed system. AI agents are non-deterministic systems. You cannot secure or govern what you cannot see. Unlike traditional APIs, agents reason step-by-step, call multiple tools, adapt outputs dynamically and generate unstructured content which makes deep observability non-optional. When implementing observability in Microsoft Foundry, organizations must monitor the full behavioral footprint of the AI agent to ensure transparency, security, and reliability. This begins with Reasoning transparency includes capturing prompt inputs, system instructions, tool selection decisions, and high-level execution traces (for example, tool call sequence, retrieved sources, and policy outcomes) to understand how the agent arrives at outcomes, without storing sensitive chain-of-thought verbatim. Security signals should also be continuously analyzed, including prompt injection attempts, suspicious usage patterns, repeated tool retries, and abnormal token consumption spikes that may indicate misuse or exploitation. From a performance and reliability standpoint, teams should measure latency at each reasoning step, monitor timeout frequency, and detect drift in output distribution over time. Core telemetry should include prompt and completion logs, detailed tool invocation traces, safety filter scores, and model version metadata to maintain traceability. Additionally, automated alerting should be enabled for anomaly detection, predefined drift thresholds, and safety score regressions, ensuring rapid response to emerging risks and maintaining continuous trust in production environments. Least Privilege Everywhere- AI agents amplify the consequences of over-permissioned systems. Least privilege must be enforced across every layer of an AI agent’s architecture to reduce blast radius and prevent misuse. Identity controls should rely on managed identities instead of shared secrets, combined with role-based access control (RBAC) and conditional access policies to tightly scope who and what can access resources. At the tooling layer, agents should operate with an explicit tool allowlist, use scope-limited API endpoints, and avoid any wildcard or unrestricted backend access. Network protections should include VNet isolation, elimination of public endpoints, and routing all external access through API Management as a controlled gateway. Without these restrictions, prompt injection or agent manipulation could lead to serious consequences such as data exfiltration, or unauthorized transactions, making least privilege a foundational requirement for trustworthy AI . Continuous Validation Beats One-Time Approval- Unlike traditional software that may pass QA testing and remain relatively stable, AI systems continuously evolve—models are updated, prompts are refined, and data distributions shift over time. Because of this dynamic nature, AI agents require ongoing validation rather than a single approval checkpoint. Continuous validation should include automated safety regression testing such as bias evaluation, and hallucination detection to ensure outputs remain aligned with policy expectations. Drift monitoring is equally important, covering semantic drift, response distribution changes, and shifts in retrieval sources that could alter agent behavior. Red teaming should also be embedded into the lifecycle, leveraging injection attack libraries, adversarial test prompts, and edge-case simulations to proactively identify vulnerabilities. These evaluations should be integrated directly into CI/CD pipelines so that prompt updates automatically trigger evaluation runs, model upgrades initiate regression testing, and any failure to meet predefined safety thresholds blocks deployment. This approach ensures that trust is continuously enforced rather than assumed. Humans Remain Accountable - AI agents can make recommendations, automate tasks, or execute actions, but they cannot bear accountability themselves. Organizations must retain legal responsibility, ethical oversight, and governance authority over every decision and action performed by the agent. To enforce accountability, mechanisms such as immutable audit logs, detailed decision trace storage, user interaction histories, and versioned policy documentation should be implemented. Every action taken by an agent must be fully traceable to a specific model version, prompt version, policy configuration, and ultimately a human owner. Together, these five principles—trust by design, observability, least privilege, continuous validation, and human accountability—form a reinforcing framework. When applied within Microsoft Foundry, they elevate AI agents from experimental tools to enterprise-grade, governed digital actors capable of operating reliably and responsibly in production environments. Principle Without It With It Designed Trust Retroactive patching Embedded resilience Observability Blind production risk Proactive detection Least Privilege High blast radius Controlled exposure Continuous Validation Silent drift Active governance Human Accountability Unclear liability Clear ownership The AI Agent Lifecycle - We can structure trust controls across five stages: Design & Planning Development Pre-Deployment Validation Deployment & Runtime Operations & Continuous Governance Design & Planning: Establishing Guardrails Early. Trustworthy AI agents are not created by adding controls at the end of development, they are architected deliberately from the very beginning. In platforms such as Microsoft Foundry, trust must be embedded during the design and planning phase, before a single line of code is written. This stage defines the security boundaries, governance structure, and responsible AI commitments that will shape the agent’s entire lifecycle. From a security perspective, planning begins with structured threat modeling of the agent’s capabilities. Teams should evaluate what the agent is allowed to access and what actions it can execute. This includes defining least-privilege access to tools and APIs, ensuring the agent can only perform explicitly authorized operations. Data classification is equally critical. identifying whether information is public, confidential, or regulated determines how it can be retrieved, stored, and processed. Identity architecture should be designed using strong authentication and role-based access controls through Microsoft Entra ID, ensuring that both human users and system components are properly authenticated and scoped. Additionally, private networking strategies such as VNet integration and private endpoints should be defined early to prevent unintended public exposure of models, vector stores, or backend services. Governance checkpoints must also be formalized at this stage. Organizations should clearly define the intended use cases of the agent, as well as prohibited scenarios to prevent misuse. A Responsible AI impact assessment should be conducted to evaluate potential societal, ethical, and operational risks before development proceeds. Responsible AI considerations further strengthen these guardrails. Finally, clear human-in-the-loop thresholds should be defined, specifying when automated outputs require review. By treating design and planning as a structured control phase rather than a preliminary formality, organizations create a strong foundation for trustworthy AI. Development: Secure-by-Default Agent Engineering During development in Microsoft Foundry, agents are designed to orchestrate foundation models, retrieval pipelines, external tools, and enterprise business APIs making security a core architectural requirement rather than an afterthought. Secure-by-default engineering includes model and prompt hardening through system prompt isolation, structured tool invocation and strict output validation schemas. Retrieval pipelines must enforce source allow-listing, metadata filtering, document sensitivity tagging, and tenant-level vector index isolation to prevent unauthorized data exposure. Observability must also be embedded from day one. Agents should log prompts and responses, trace tool invocations, track token usage, capture safety classifier scores, and measure latency and reasoning-step performance. Telemetry can be exported to platforms such as Azure Monitor, Azure Application Insights, and enterprise SIEM systems to enable real-time monitoring, anomaly detection, and continuous trust validation. Pre-Deployment: Red Teaming & Validation Before moving to production, AI agents must undergo reliability, and governance validation. Security testing should include prompt injection simulations, data leakage assessments, tool misuse scenarios, and cross-tenant isolation verification to ensure containment boundaries are intact. Responsible AI validation should evaluate bias, measure toxicity and content safety scores, benchmark hallucination rates, and test robustness against edge cases and unexpected inputs. Governance controls at this stage formalize approval workflows, risk sign-off, audit trail documentation, and model version registration to ensure traceability and accountability. The outcome of this phase is a documented trustworthiness assessment that confirms the agent is ready for controlled production deployment. Deployment: Zero-Trust Runtime Architecture Deploying AI agents securely in Azure requires a layered, Zero Trust architecture that protects infrastructure, identities, and data at runtime. Infrastructure security should include private endpoints, Network Security Groups, Web Application Firewalls (WAF), API Management gateways, secure secret storage in Azure Key Vault, and the use of managed identities. Following Zero Trust principles verify explicitly, enforce least privilege, and assume breach ensures that every request, tool call, and data access is continuously validated. Runtime observability is equally critical. Organizations must monitor agent reasoning traces, tool execution outcomes, anomalous usage patterns, prompt irregularities, and output drift. Key telemetry signals include safety indicators (toxicity scores, jailbreak attempts), security events (suspicious tool call frequency), reliability metrics (timeouts, retry spikes), and cost anomalies (unexpected token consumption). Automated alerts should be configured to detect spikes in unsafe outputs, tool abuse attempts, or excessive reasoning loops, enabling rapid response and containment. Operations: Continuous Governance & Drift Management Trust in AI systems is not static, rather it should be continuously monitored, validated, and enforced throughout production. Organizations should implement automated evaluation pipelines that perform regression testing on new model versions, apply safety scoring to production logs, detect behavioral or data drift, and benchmark performance over time. Governance in production requires immutable audit logs, a versioned model registry, controlled policy updates, periodic risk reassessments, and well-defined incident response playbooks. Strong human oversight remains essential, supported by escalation workflows, manual review queues for high-risk outputs, and kill-switch mechanisms to immediately suspend agent capabilities if abnormal or unsafe behavior is detected. To conclude - AI agents unlock powerful automation but those same capabilities can introduce risk if left unchecked. A well-architected trust framework transforms agents from experimental chatbots into enterprise-ready autonomous systems. By coupling Microsoft Foundry’s flexibility with layered security, observability, and continuous governance, organizations can confidently deliver AI agents that are: Secure Reliable Compliant Governed Trustworthy