ai solutions
54 TopicsSecuring Azure AI Applications: A Deep Dive into Emerging Threats | Part 1
Why AI Security Can’t Be Ignored? Generative AI is rapidly reshaping how enterprises operate—accelerating decision-making, enhancing customer experiences, and powering intelligent automation across critical workflows. But as organizations adopt these capabilities at scale, a new challenge emerges: AI introduces security risks that traditional controls cannot fully address. AI models interpret natural language, rely on vast datasets, and behave dynamically. This flexibility enables innovation—but also creates unpredictable attack surfaces that adversaries are actively exploiting. As AI becomes embedded in business-critical operations, securing these systems is no longer optional—it is essential. The New Reality of AI Security The threat landscape surrounding AI is evolving faster than any previous technology wave. Attackers are no longer focused solely on exploiting infrastructure or APIs; they are targeting the intelligence itself—the model, its prompts, and its underlying data. These AI-specific attack vectors can: Expose sensitive or regulated data Trigger unintended or harmful actions Skew decisions made by AI-driven processes Undermine trust in automated systems As AI becomes deeply integrated into customer journeys, operations, and analytics, the impact of these attacks grows exponentially. Why These Threats Matter? Threats such as prompt manipulation and model tampering go beyond technical issues—they strike at the foundational principles of trustworthy AI. They affect: Confidentiality: Preventing accidental or malicious exposure of sensitive data through manipulated prompts. Integrity: Ensuring outputs remain accurate, unbiased, and free from tampering. Reliability: Maintaining consistent model behavior even when adversaries attempt to deceive or mislead the system. When these pillars are compromised, the consequences extend across the business: Incorrect or harmful AI recommendations Regulatory and compliance violations Damage to customer trust Operational and financial risk In regulated sectors, these threats can also impact audit readiness, risk posture, and long-term credibility. Understanding why these risks matter builds the foundation. In the upcoming blogs, we’ll explore how these threats work and practical steps to mitigate them using Azure AI’s security ecosystem. Why AI Security Remains an Evolving Discipline? Traditional security frameworks—built around identity, network boundaries, and application hardening—do not fully address how AI systems operate. Generative models introduce unique and constantly shifting challenges: Dynamic Model Behavior: Models adapt to context and data, creating a fluid and unpredictable attack surface. Natural Language Interfaces: Prompts are unstructured and expressive, making sanitization inherently difficult. Data-Driven Risks: Training and fine-tuning pipelines can be manipulated, poisoned, or misused. Rapidly Emerging Threats: Attack techniques evolve faster than most defensive mechanisms, requiring continuous learning and adaptation. Microsoft and other industry leaders are responding with robust tools—Azure AI Content Safety, Prompt Shields, Responsible AI Frameworks, encryption, isolation patterns—but technology alone cannot eliminate risk. True resilience requires a combination of tooling, governance, awareness, and proactive operational practices. Let's Build a Culture of Vigilance: AI security is not just a technical requirement—it is a strategic business necessity. Effective protection requires collaboration across: Developers Data and AI engineers Cybersecurity teams Cloud platform teams Leadership and governance functions Security for AI is a shared responsibility. Organizations must cultivate awareness, adopt secure design patterns, and continuously monitor for evolving attack techniques. Building this culture of vigilance is critical for long-term success. Key Takeaways: AI brings transformative value, but it also introduces risks that evolve as quickly as the technology itself. Strengthening your AI security posture requires more than robust tooling—it demands responsible AI practices, strong governance, and proactive monitoring. By combining Azure’s built-in security capabilities with disciplined operational practices, organizations can ensure their AI systems remain secure, compliant, and trustworthy, even as new threats emerge. What’s Next? In future blogs, we’ll explore two of the most important AI threats—Prompt Injection and Model Manipulation—and share actionable strategies to mitigate them using Azure AI’s security capabilities. Stay tuned for practical guidance, real-world scenarios, and Microsoft-backed best practices to keep your AI applications secure. Stay Tuned.!503Views3likes0CommentsHybrid AI Using Foundry Local, Microsoft Foundry and the Agent Framework - Part 2
Background In Part 1, we explored how a local LLM running entirely on your GPU can call out to the cloud for advanced capabilities The theme was: Keep your data local. Pull intelligence in only when necessary. That was local-first AI calling cloud agents as needed. This time, the cloud is in charge, and the user interacts with a Microsoft Foundry hosted agent — but whenever it needs private, sensitive, or user-specific information, it securely “calls back home” to a local agent running next to the user via MCP. Think of it as: The cloud agent = specialist doctor The local agent = your health coach who you trust and who knows your medical history The cloud never sees your raw medical history The local agent only shares the minimum amount of information needed to support the cloud agent reasoning This hybrid intelligence pattern respects privacy while still benefiting from hosted frontier-level reasoning. Disclaimer: The diagnostic results, symptom checker, and any medical guidance provided in this article are for illustrative and informational purposes only. They are not intended to provide medical advice, diagnosis, or treatment. Architecture Overview The diagram illustrates a hybrid AI workflow where a Microsoft Foundry–hosted agent in Azure works together with a local MCP server running on the user’s machine. The cloud agent receives user symptoms and uses a frontier model (GPT-4.1) for reasoning, but when it needs personal context—like medical history—it securely calls back into the local MCP Health Coach via a dev-tunnel. The local MCP server queries a local GPU-accelerated LLM (Phi-4-mini via Foundry Local) along with stored health-history JSON, returning only the minimal structured background the cloud model needs. The cloud agent then combines both pieces—its own reasoning plus the local context—to produce tailored recommendations, all while sensitive data stays fully on the user’s device. Hosting the agent in Microsoft Foundry on Azure provides a reliable, scalable orchestration layer that integrates directly with Azure identity, monitoring, and governance. It lets you keep your logic, policies, and reasoning engine in the cloud, while still delegating private or resource-intensive tasks to your local environment. This gives you the best of both worlds: enterprise-grade control and flexibility with edge-level privacy and efficiency. Demo Setup Create the Cloud Hosted Agent Using Microsoft Foundry, I created an agent in the UI and pick gpt-4.1 as model: I provided rigorous instructions as system prompt: You are a medical-specialist reasoning assistant for non-emergency triage. You do NOT have access to the patient’s identity or private medical history. A privacy firewall limits what you can see. A local “Personal Health Coach” LLM exists on the user’s device. It holds the patient’s full medical history privately. You may request information from this local model ONLY by calling the MCP tool: get_patient_background(symptoms) This tool returns a privacy-safe, PII-free medical summary, including: - chronic conditions - allergies - medications - relevant risk factors - relevant recent labs - family history relevance - age group Rules: 1. When symptoms are provided, ALWAYS call get_patient_background BEFORE reasoning. 2. NEVER guess or invent medical history — always retrieve it from the local tool. 3. NEVER ask the user for sensitive personal details. The local model handles that. 4. After the tool runs, combine: (a) the patient_background output (b) the user’s reported symptoms to deliver high-level triage guidance. 5. Do not diagnose or prescribe medication. 6. Always end with: “This is not medical advice.” You MUST display the section “Local Health Coach Summary:” containing the JSON returned from the tool before giving your reasoning. Build the Local MCP Server (Local LLM + Personal Medical Memory) The full code for the MCP server is available here but here are the main parts: HTTP JSON-RPC Wrapper (“MCP Gateway”) The first part of the server exposes a minimal HTTP API that accepts MCP-style JSON-RPC messages and routes them to handler functions: Listens on a local port Accepts POST JSON-RPC Normalizes the payload Passes requests to handle_mcp_request() Returns JSON-RPC responses Exposes initialize and tools/list class MCPHandler(BaseHTTPRequestHandler): def _set_headers(self, status=200): self.send_response(status) self.send_header("Content-Type", "application/json") self.end_headers() def do_GET(self): self._set_headers() self.wfile.write(b"OK") def do_POST(self): content_len = int(self.headers.get("Content-Length", 0)) raw = self.rfile.read(content_len) print("---- RAW BODY ----") print(raw) print("-------------------") try: req = json.loads(raw.decode("utf-8")) except: self._set_headers(400) self.wfile.write(b'{"error":"Invalid JSON"}') return resp = handle_mcp_request(req) self._set_headers() self.wfile.write(json.dumps(resp).encode("utf-8")) Tool Definition: get_patient_background This section defines the tool contract exposed to Azure AI Foundry. The hosted agent sees this tool exactly as if it were a cloud function: Advertises the tool via tools/list Accepts arguments passed from the cloud agent Delegates local reasoning to the GPU LLM Returns structured JSON back to the cloud def handle_mcp_request(req): method = req.get("method") req_id = req.get("id") if method == "tools/list": return { "jsonrpc": "2.0", "id": req_id, "result": { "tools": [ { "name": "get_patient_background", "description": "Returns anonymized personal medical context using your local LLM.", "inputSchema": { "type": "object", "properties": { "symptoms": {"type": "string"} }, "required": ["symptoms"] } } ] } } if method == "tools/call": tool = req["params"]["name"] args = req["params"]["arguments"] if tool == "get_patient_background": symptoms = args.get("symptoms", "") summary = summarize_patient_locally(symptoms) return { "jsonrpc": "2.0", "id": req_id, "result": { "content": [ { "type": "text", "text": json.dumps(summary) } ] } } Local GPU LLM Caller (Foundry Local Integration) This is where personalization happens — entirely on your machine, not in the cloud: Calls the local GPU LLM through Foundry Local Injects private medical data (loaded from a file or memory) Produces anonymized structured outputs Logs debug info so you can see when local inference is running FOUNDRY_LOCAL_BASE_URL = "http://127.0.0.1:52403" FOUNDRY_LOCAL_CHAT_URL = f"{FOUNDRY_LOCAL_BASE_URL}/v1/chat/completions" FOUNDRY_LOCAL_MODEL_ID = "Phi-4-mini-instruct-cuda-gpu:5" def summarize_patient_locally(symptoms: str): print("[LOCAL] Calling Foundry Local GPU model...") payload = { "model": FOUNDRY_LOCAL_MODEL_ID, "messages": [ {"role": "system", "content": PERSONAL_SYSTEM_PROMPT}, {"role": "user", "content": symptoms} ], "max_tokens": 300, "temperature": 0.1 } resp = requests.post( FOUNDRY_LOCAL_CHAT_URL, headers={"Content-Type": "application/json"}, data=json.dumps(payload), timeout=60 ) llm_content = resp.json()["choices"][0]["message"]["content"] print("[LOCAL] Raw content:\n", llm_content) cleaned = _strip_code_fences(llm_content) return json.loads(cleaned) Start a Dev Tunnel Now we need to do some plumbing work to make sure the cloud can resolve the MCP endpoint. I used Azure Dev Tunnels for this demo. The snippet below shows how to set that up in 4 PowerShell commands: PS C:\Windows\system32> winget install Microsoft.DevTunnel PS C:\Windows\system32> devtunnel create mcp-health PS C:\Windows\system32> devtunnel port create mcp-health -p 8081 --protocol http PS C:\Windows\system32> devtunnel host mcp-health I have now a public endpoint: https://xxxxxxxxx.devtunnels.ms:8081 Wire Everything Together in Azure AI Foundry Now let's us the UI to add a new custom tool as MCP for our agent: And point to the public endpoint created previously: Voila, we're done with the setup, let's test it Demo: The Cloud Agent Talks to Your Local Private LLM I am going to use a simple prompt in the agent: “Hi, I’ve been feeling feverish, fatigued, and a bit short of breath since yesterday. Should I be worried?” Disclaimer: The diagnostic results and any medical guidance provided in this article are for illustrative and informational purposes only. They are not intended to provide medical advice, diagnosis, or treatment. Below is the sequence shown in real time: Conclusion — Why This Hybrid Pattern Matters Hybrid AI lets you place intelligence exactly where it belongs: high-value reasoning in the cloud, sensitive or contextual data on the local machine. This protects privacy while reducing cloud compute costs—routine lookups, context gathering, and personal history retrieval can all run on lightweight local models instead of expensive frontier models. This pattern also unlocks powerful real-world applications: local financial data paired with cloud financial analysis, on-device coding knowledge combined with cloud-scale refactoring, or local corporate context augmenting cloud automation agents. In industrial and edge environments, local agents can sit directly next to the action—embedded in factory sensors, cameras, kiosks, or ambient IoT devices—providing instant, private intelligence while the cloud handles complex decision-making. Hybrid AI turns every device into an intelligent collaborator, and every cloud agent into a specialist that can safely leverage local expertise. References Get started with Foundry Local - Foundry Local: https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-local/get-started?view=foundry-classic Using MCP tools with Agents (Microsoft Agent Framework) — https://learn.microsoft.com/en-us/agent-framework/user-guide/model-context-protocol/using-mcp-tools Microsoft Learn Build Agents using Model Context Protocol on Azure — https://learn.microsoft.com/en-us/azure/developer/ai/intro-agents-mcp Microsoft Learn Full demo repo available here.382Views0likes0CommentsIntroducing Microsoft Agent Factory
Microsoft Agent Factory is a new program designed for organizations that want to move from experimentation to execution faster. With a single plan, organizations can build agents with Work IQ, Fabric IQ, and Foundry IQ using Microsoft Foundry and Copilot Studio. They can also deploy their agents anywhere, including Microsoft 365 Copilot, with no upfront licensing and provisioning required. Eligible organizations can also tap into hands-on engagement from top AI Forward Deployed Engineers (FDEs) and access tailored role-based training to boost AI fluency across teams.17KViews9likes0CommentsHybrid AI Using Foundry Local, Microsoft Foundry and the Agent Framework - Part 1
Hybrid AI is quickly becoming one of the most practical architectures for real-world applications—especially when privacy, compliance, or sensitive data handling matter. Today, it’s increasingly common for users to have capable GPUs in their laptops or desktops, and the ecosystem of small, efficient open-source language models has grown dramatically. That makes local inference not only possible, but easy. In this guide, we explore how a locally run agent built with the Agent Framework can combine the strengths of cloud models in Azure AI Foundry with a local LLM running on your own GPU through Foundry Local. This pattern allows you to use powerful cloud reasoning without ever sending raw sensitive data—like medical labs, legal documents, or financial statements—off the device. Part 1 focuses on the foundations of this architecture, using a simple illustrative example to show how local and cloud inference can work together seamlessly under a single agent. Disclaimer: The diagnostic results, symptom checker, and any medical guidance provided in this article are for illustrative and informational purposes only. They are not intended to provide medical advice, diagnosis, or treatment. Demonstrating the concept Problem Statement We’ve all done it: something feels off, we get a strange symptom, or a lab report pops into our inbox—and before thinking twice, we copy-paste way too much personal information into whatever website or chatbot seems helpful at the moment. Names, dates of birth, addresses, lab values, clinic details… all shared out of habit, usually because we just want answers quickly. This guide uses a simple, illustrative scenario—a symptom checker with lab report summarization—to show how hybrid AI can help reduce that oversharing. It’s not a medical product or a clinical solution, but it’s a great way to understand the pattern. With Microsoft Foundry, Foundry Local, and the Agent Framework, we can build workflows where sensitive data stays on the user’s machine and is processed locally, while the cloud handles the heavier reasoning. Only a safe, structured summary ever leaves the device. The Agent Framework handles when to use the local model vs. the cloud model, giving us a seamless and privacy-preserving hybrid experience. Demo scenario This demo uses a simple, illustrative symptom-checker to show how hybrid AI keeps sensitive data private while still benefiting from powerful cloud reasoning. It’s not a medical product—just an easy way to demonstrate the pattern: Here’s what happens: A Python agent (Agent Framework) runs locally and can call both cloud models and local tools. Azure AI Foundry (GPT-4o) handles reasoning and triage logic but never sees raw PHI. Foundry Local runs a small LLM (phi-4-mini) on your GPU and processes the raw lab report entirely on-device. A tool function (@ai_function) lets the agent call the local model automatically when it detects lab-like text. The flow is simple: user_message = symptoms + raw lab text agent → calls local tool → local LLM returns JSON cloud LLM → uses JSON to produce guidance Environment setup Foundry Local Service On the local machine with GPU, let's install Foundry local using: PS C: \Windows\system32> winget install Microsoft.FoundryLocal Then let's download our local model, in this case phi-4-mini and test it: PS C:\Windows\system32> foundry model download phi-4-mini Downloading Phi-4-mini-instruct-cuda-gpu:5... [################### ] 53.59 % [Time remaining: about 4m] 5.9 MB/s/s PS C:\Windows\system32> foundry model load phi-4-mini 🕗 Loading model... 🟢 Model phi-4-mini loaded successfully PS C:\Windows\system32> foundry model run phi-4-mini Model Phi-4-mini-instruct-cuda-gpu:5 was found in the local cache. Interactive Chat. Enter /? or /help for help. Press Ctrl+C to cancel generation. Type /exit to leave the chat. Interactive mode, please enter your prompt > Hello can you let me know who you are and which model you are using 🧠 Thinking... 🤖 Hello! I'm Phi, an AI developed by Microsoft. I'm here to help you with any questions or tasks you have. How can I assist you today? > PS C:\Windows\system32> foundry service status 🟢 Model management service is running on http://127.0.0.1:52403/openai/status Now we see the model is accessible with API on the localhost with port 52403. Foundry Local models don’t always use simple names like "phi-4-mini". Each installed model has a specific Model ID that Foundry Local assigns (for example: Phi-4-mini-instruct-cuda-gpu:5 in this case). We now can use the Model ID for a quick test: from openai import OpenAI client = OpenAI(base_url="http://127.0.0.1:52403/v1", api_key="ignored") resp = client.chat.completions.create( model="Phi-4-mini-instruct-cuda-gpu:5", messages=[{"role": "user", "content": "Say hello"}]) Returned 200 OK. Microsoft Foundry To handle the cloud part of the hybrid workflow, we start by creating a Microsoft AI Foundry project. This gives us an easy, managed way to use models like GPT-4o-mini —no deployment steps, no servers to configure. You simply point the Agent Framework at your project, authenticate, and you’re ready to call the model. A nice benefit is that Microsoft Foundry and Foundry Local share the same style of API. Whether you call a model in the cloud or on your own machine, the request looks almost identical. This consistency makes hybrid development much easier: the agent doesn’t need different logic for local vs. cloud models—it just switches between them when needed. Under the Hood of Our Hybrid AI Workflow Agent Framework For the agent code, I am using the Agent Framework libraries, and I am giving specific instructions to the agent as per below: from agent_framework import ChatAgent, ai_function from agent_framework.azure import AzureAIAgentClient from azure.identity.aio import AzureCliCredential # ========= Cloud Symptom Checker Instructions ========= SYMPTOM_CHECKER_INSTRUCTIONS = """ You are a careful symptom-checker assistant for non-emergency triage. General behavior: - You are NOT a clinician. Do NOT provide medical diagnosis or prescribe treatment. - First, check for red-flag symptoms (e.g., chest pain, trouble breathing, severe bleeding, stroke signs, one-sided weakness, confusion, fainting). If any are present, advise urgent/emergency care and STOP. - If no red-flags, summarize key factors (age group, duration, severity), then provide: 1) sensible next steps a layperson could take, 2) clear guidance on when to contact a clinician, 3) simple self-care advice if appropriate. - Use plain language, under 8 bullets total. - Always end with: "This is not medical advice." Tool usage: - When the user provides raw lab report text, or mentions “labs below” or “see labs”, you MUST call the `summarize_lab_report` tool to convert the labs into structured data before giving your triage guidance. - Use the tool result as context, but do NOT expose the raw JSON directly. Instead, summarize the key abnormal findings in plain language. """.strip() Referencing the local model Now I am providing a system prompt for the locally inferred model to transform the lab result text into a JSON object with lab results only: # ========= Local Lab Summarizer (Foundry Local + Phi-4-mini) ========= FOUNDRY_LOCAL_BASE = "http://127.0.0.1:52403" # from `foundry service status` FOUNDRY_LOCAL_CHAT_URL = FOUNDRY_LOCAL_BASE + "/v1/chat/completions" # This is the model id you confirmed works: FOUNDRY_LOCAL_MODEL_ID = "Phi-4-mini-instruct-cuda-gpu:5" LOCAL_LAB_SYSTEM_PROMPT = """ You are a medical lab report summarizer running locally on the user's machine. You MUST respond with ONLY one valid JSON object. Do not include any explanation, backticks, markdown, or text outside the JSON. The JSON must have this shape: { "overall_assessment": "<short plain English summary>", "notable_abnormal_results": [ { "test": "string", "value": "string", "unit": "string or null", "reference_range": "string or null", "severity": "mild|moderate|severe" } ] } If you are unsure about a field, use null. Do NOT invent values. """.strip() Agent Framework tool In this next step, we wrap the local Foundry inference inside an Agent Framework tool using the AI_function decorator. This abstraction is more than styler—it is the recommended best practice for hybrid architectures. By exposing local GPU inference as a tool, the cloud-hosted agent can decide when to call it, pass structured arguments, and consume the returned JSON seamlessly. It also ensures that the raw lab text (which may contain PII) stays strictly within the local function boundary, never entering the cloud conversation. Using a tool in this way provides a consistent, declarative interface, enables automatic reasoning and tool-routing by frontier models, and keeps the entire hybrid workflow maintainable, testable, and secure: @ai_function( name="summarize_lab_report", description=( "Summarize a raw lab report into structured abnormalities using a local model " "running on the user's GPU. Use this whenever the user provides lab results as text." ), ) def summarize_lab_report( lab_text: Annotated[str, Field(description="The raw text of the lab report to summarize.")], ) -> Dict[str, Any]: """ Tool: summarize a lab report using Foundry Local (Phi-4-mini) on the user's GPU. Returns a JSON-compatible dict with: - overall_assessment: short text summary - notable_abnormal_results: list of abnormal test objects """ payload = { "model": FOUNDRY_LOCAL_MODEL_ID, "messages": [ {"role": "system", "content": LOCAL_LAB_SYSTEM_PROMPT}, {"role": "user", "content": lab_text}, ], "max_tokens": 256, "temperature": 0.2, } headers = { "Content-Type": "application/json", } print(f"[LOCAL TOOL] POST {FOUNDRY_LOCAL_CHAT_URL}") resp = requests.post( FOUNDRY_LOCAL_CHAT_URL, headers=headers, data=json.dumps(payload), timeout=120, ) resp.raise_for_status() data = resp.json() # OpenAI-compatible shape: choices[0].message.content content = data["choices"][0]["message"]["content"] # Handle string vs list-of-parts if isinstance(content, list): content_text = "".join( part.get("text", "") for part in content if isinstance(part, dict) ) else: content_text = content print("[LOCAL TOOL] Raw content from model:") print(content_text) # Strip ```json fences if present, then parse JSON cleaned = _strip_code_fences(content_text) lab_summary = json.loads(cleaned) print("[LOCAL TOOL] Parsed lab summary JSON:") print(json.dumps(lab_summary, indent=2)) # Return dict – Agent Framework will serialize this as the tool result return lab_summary The case, labs and prompt All patient and provider information in below example is entirely fictitious and used for illustrative purposes only. To illustrate the pattern, this sample prepares the “case” in code: it combines a symptom description with a lab report string and then submits that prompt to the agent. In production, these inputs would be captured from a UI or API. # Example free-text case + raw lab text that the agent can decide to send to the tool case = ( "Teenager with bad headache and throwing up. Fever of 40C and no other symptoms." ) lab_report_text = """ ------------------------------------------- AI Land FAMILY LABORATORY SERVICES 4420 Camino Del Foundry, Suite 210 Gpuville, CA 92108 Phone: (123) 555-4821 | Fax: (123) 555-4822 ------------------------------------------- PATIENT INFORMATION Name: Frontier Model DOB: 04/12/2007 (17 yrs) Sex: Male Patient ID: AXT-442871 Address: 1921 MCP Court, CA 01100 ORDERING PROVIDER Dr. Bot, MD NPI: 1780952216 Clinic: Phi Pediatrics Group REPORT DETAILS Accession #: 24-SDFLS-118392 Collected: 11/14/2025 14:32 Received: 11/14/2025 16:06 Reported: 11/14/2025 20:54 Specimen: Whole Blood (EDTA), Serum Separator Tube ------------------------------------------------------ COMPLETE BLOOD COUNT (CBC) ------------------------------------------------------ WBC ................. 14.5 x10^3/µL (4.0 – 10.0) HIGH RBC ................. 4.61 x10^6/µL (4.50 – 5.90) Hemoglobin .......... 13.2 g/dL (13.0 – 17.5) LOW-NORMAL Hematocrit .......... 39.8 % (40.0 – 52.0) LOW MCV ................. 86.4 fL (80 – 100) Platelets ........... 210 x10^3/µL (150 – 400) ------------------------------------------------------ INFLAMMATORY MARKERS ------------------------------------------------------ C-Reactive Protein (CRP) ......... 60 mg/L (< 5 mg/L) HIGH Erythrocyte Sedimentation Rate ... 32 mm/hr (0 – 15 mm/hr) HIGH ------------------------------------------------------ BASIC METABOLIC PANEL (BMP) ------------------------------------------------------ Sodium (Na) .............. 138 mmol/L (135 – 145) Potassium (K) ............ 3.9 mmol/L (3.5 – 5.1) Chloride (Cl) ............ 102 mmol/L (98 – 107) CO2 (Bicarbonate) ........ 23 mmol/L (22 – 29) Blood Urea Nitrogen (BUN) 11 mg/dL (7 – 20) Creatinine ................ 0.74 mg/dL (0.50 – 1.00) Glucose (fasting) ......... 109 mg/dL (70 – 99) HIGH ------------------------------------------------------ LIVER FUNCTION TESTS ------------------------------------------------------ AST ....................... 28 U/L (0 – 40) ALT ....................... 22 U/L (0 – 44) Alkaline Phosphatase ...... 144 U/L (65 – 260) Total Bilirubin ........... 0.6 mg/dL (0.1 – 1.2) ------------------------------------------------------ NOTES ------------------------------------------------------ Mild leukocytosis and elevated inflammatory markers (CRP, ESR) may indicate an acute infectious or inflammatory process. Glucose slightly elevated; could be non-fasting. ------------------------------------------------------ END OF REPORT SDFLS-CLIA ID: 05D5554973 This report is for informational purposes only and not a diagnosis. ------------------------------------------------------ """ # Single user message that gives both the case and labs. # The agent will see that there are labs and call summarize_lab_report() as a tool. user_message = ( "Patient case:\n" f"{case}\n\n" "Here are the lab results as raw text. If helpful, you can summarize them first:\n" f"{lab_report_text}\n\n" "Please provide non-emergency triage guidance." ) The Hybrid Agent code Here’s where the hybrid behavior actually comes together. By this point, we’ve defined a local tool that talks to Foundry Local and configured access to a cloud model in Azure AI Foundry. In the main() function, the Agent Framework ties these pieces into a single workflow. The agent runs locally, receives a message containing both symptoms and a raw lab report, and decides when to call the local tool. The lab report is summarized on your GPU, and only the structured JSON is passed to the cloud model for reasoning. The snippet below shows how we attach the tool to the agent and trigger both local inference and cloud guidance within one natural-language prompt # ========= Hybrid Main (Agent uses the local tool) ========= async def main(): ... async with ( AzureCliCredential() as credential, ChatAgent( chat_client=AzureAIAgentClient(async_credential=credential), instructions=SYMPTOM_CHECKER_INSTRUCTIONS, # 👇 Tool is now attached to the agent tools=[summarize_lab_report], name="hybrid-symptom-checker", ) as agent, ): result = await agent.run(user_message) print("\n=== Symptom Checker (Hybrid: Local Tool + Cloud Agent) ===\n") print(result.text) if __name__ == "__main__": asyncio.run(main()) Testing the Hybrid Agent Now I am running the agent code from VSCode and can see the local inference happening when lab was submitted. Then results are formatted, PII omitted and the GPT-40 model can process the symptom along the results What's next In this example, the agent runs locally and pulls in both cloud and local inference. In Part 2, we’ll explore the opposite architecture: a cloud-hosted agent that can safely call back into a local LLM through a secure gateway. This opens the door to more advanced hybrid patterns where tools running on edge devices, desktops, or on-prem systems can participate in cloud-driven workflows without exposing sensitive data. References Agent Framework: https://github.com/microsoft/agent-framework Repo for the code available here:763Views2likes0CommentsThe Future of AI: The Model is Key, but the App is the Doorway
This post explores the real-world impact of GPT-5 beyond benchmark scores, focusing on how application design shapes user experience. It highlights early developer feedback, common integration challenges, and practical strategies for adapting apps to leverage the advanced capabilities of GPT-5 in Foundry Models. From prompt refinement to fine-tuning to new API controls, learn how to make the most of this powerful model.565Views3likes0CommentsDeployment Guide-Copilot Studio agent with MCP Server exposed by API Management using OAuth 2.0
Introduction In today’s enterprise landscape, enabling AI agents to interact with backend systems securely and at scale is critical. By exposing MCP servers through Azure API Management (APIM), organizations can provide controlled access to these services. When combined with OAuth 2.0 authorization code flow, this setup ensures robust, enterprise-grade security for AI agents built in Copilot Studio—empowering intelligent automation while maintaining strict access governance. Disclaimer & Caveats This article explores how to configure a MCP tool—exposed as a MCP server via APIM—for secure consumption by AI agents built in Copilot Studio. Leveraging the OAuth 2.0 Authorization Code Flow, this setup ensures enterprise-grade security by enabling delegated access without exposing user credentials. With Azure API Management now supporting MCP server capabilities in public preview, developers can expose REST APIs as MCP tools using a standardized JSON-RPC interface. This allows AI agents to invoke backend services securely and scalable, without the need to rebuild existing APIs. Copilot Studio, also in preview for MCP integration, empowers organizations to orchestrate intelligent agents that interact with these tools in real time. While this guide provides a foundational approach, every environment is unique. You can enhance security further by implementing app roles, conditional access policies, and extending your integration logic with custom Python code for advanced scenarios. ⚠️ Note: Both MCP server support in APIM and MCP tool integration in Copilot Studio are currently in public preview. As these platforms evolve rapidly, expect changes and improvements over time. Always refer to the https://learn.microsoft.com/en-us/azure/api-management/export-rest-mcp-server for the latest updates. This article is about consuming remote MCP servers. In Azure, managed identity can also be leveraged for APIM integration. What is Authorization Code Flow? The Authorization Code Flow is designed for applications that can securely store a client secret (like server-side apps). It allows the app to obtain an access token on behalf of the user without exposing their credentials. This flow uses an intermediate authorization code to exchange for tokens, adding an extra layer of security. Steps in the Flow User Authentication The user is redirected to the Authorization Server (In this case: Azure AD) to log in and grant consent. Authorization Code Issued After successful login, the Authorization Server sends an authorization code to the app via the redirect URI. Token Exchange The app sends the authorization code (plus client credentials) to the Token Endpoint to get: Access Token (for API calls) and Refresh Token (to renew access without user interaction) API Access The app uses the Access Token to call protected resources. Below diagram shows the Authorization code flow in detail. Press enter or click to view image in full size Microsoft identity platform and OAuth 2.0 authorization code flow — Microsoft identity platform | Microsoft Learn High Level Architecture Press enter or click to view image in full size This architecture can also be implemented with APIM backend app registration only. However, stay cautious in configuring redirect URIs appropriately. Remote MCP Servers using APIM Architecture APIM exposing Remote MCP servers, enabling AI agents—such as those built in Copilot Studio—to securely access backend services using standardized JSON-RPC interfaces. This integration offers a robust, scalable, and secure way to connect AI tools with enterprise APIs. Key Capabilities: Secure Gateway: APIM acts as an intelligent gateway, handling OAuth 2.0 Authorization Code Flow, authentication, and request routing. Monitoring & Observability: Integration with Azure Log Analytics and Application Insights enables deep visibility into API usage, performance, and errors. Policy Enforcement: APIM’s policy engine allows for custom rules, including token validation, header manipulation, and response transformation. Rate Limiting & Throttling: Built-in support for rate limits, quotas, and IP filtering helps protect backend services from abuse and ensures fair usage. Managed Identity & Entra ID: Secure service-to-service communication is enabled via system-assigned and user-assigned managed identities, with Entra ID handling identity and access management. Flexible Deployment: MCP servers can be hosted in Azure Functions, App Services, or Container Apps, and exposed via APIM with minimal changes to existing APIs. To learn more, visit https://learn.microsoft.com/en-us/samples/azure-samples/remote-mcp-apim-functions-python/remote-mcp-apim-functions-python/ Develop MCP server in VS Code This deployment guide provides sample MCP code written in python for ease of use. It is available on the following GitHub repo. However, you can also use your own MCP server. Clone the following repository and open in VS Code. git clone https://github.com/mafzal786/mcp-server.git Run the following to execute it locally. cd mcp-server uv venv uv sync uv run mcpserver.py Deploy MCP Server as Azure Container App In this deployment guide, MCP server is deployed in Azure Container App. It can also be deployed as Azure App service. Deploy the MCP server in Azure container App by running the following command. It can be deployed by many other various ways such as via VS Code or CI/CD pipeline. AZ Cli is used for simplicity. az containerapp up \ --resource-group <RESOURCE_GROUP_NAME> \ --name streamable-mcp-server2 \ --environment mcp \ --location <REGION> \ --source . Configure Authentication for Azure Container App 1. Sign in Azure portal. Visit the container App in Azure and Click “Authentication” as shown below. Press enter or click to view image in full size For more details, visit the following link: Enable authentication and authorization in Azure Container Apps with Microsoft Entra ID | Microsoft Learn Click Add Identity Provider as shown. 2. Select Microsoft from the drop down and leave everything as is as shown below. 3. This will create a new app registration for the container App. After it is all setup, it will look like as below. As soon as authentication is configured. it will make container app inaccessible except for OAuth. Note: If you have app registration for Azure Container App already configured, use that by selecting "pick an existing app registration in this directory" option. Review App Registration of Container App — Backend Visit App registration and click streamable-mcp-server2 as in this case. Click on Authentication tab. Verify the Redirect URIs. you should see a redirect URL for container app. URI will end with /.auth/login/aad/callback as shown in the green box in the below screenshot. Now click on “Expose an API”. Confirm Application ID URI is configured with scope as shown below. its format is api://<client id> Scope "user_impersonation" is created. Verify API Permission. Make sure you Grant admin consent for your tenant as shown below. More scope can be created depending on the requirement of data access. Note: Make sure to "Grant admin consent" before proceeding to next step. Create App registration for representing APIM API Lauch Azure Portal. Visit App registration. Click New registration. Create a new App registration as shown below. For example, "apim-mcp-backend-api" in this case. Click "Expose an API", configure Application ID URI, and add a scope as shown in the below diagram such as user_impersonation. Click "App roles" and create the following role as shown below. More roles can be created depending on the requirements and case by case basis. Here app roles are created to get the concept around it and how it will be used in APIM inbound policies in the coming sections. Create App Registration for Client App — Copilot Studio In these steps, we will be configuring app registration for the client app, such as copilot studio in this case acting as a client app. This is also mentioned in the “high level architecture” diagram in the earlier section of this article. Lauch Azure Portal. Visit App registration. Click New registration. Create a new App registration. leave the Redirect URL as of now, we will configure it later as it is provided by copilot studio when configuring custom MCP connector. 3. Click on API permission and click “Add a permission”. Click Microsoft Graph and then click “Delegated permissions”. Select email, openid, profile as shown below. 4. Make sure to Grant admin consent and it should look like as below. 5. Create a secret. click “Certificates & secrets”. Create a new client secret by clicking “New client secret”. store the value as it will be masked after some time. if that happens, you can always delete and re-create a new secret. 6. Capture the following as you would need it in configuring MCP tool in Copilot Studio. Client ID from the Overview Tab of app registration. Client secret from “Certificates & secrets” tab. 7. Configure API permissions for APIM API i.e. "apim-mcp-backend-api" in this case. Click “API permissions” tab. Click “Add a permission”. Click on “My APIs” tab as shown below and select "apim-mcp-backend-api". Note: If you don't see the app registration in "My APIs". Go to App registration. Click "Owners". Add your AD account as Owners. 8. Select "Delegated permissions". Then select the permission as shown below. 9. Select the Application permission. Select the App roles created in the apim-mcp-backend-api registration. Such as mcp.read in this case. You MUST “Grant admin consent” as final step. It is very important!!! I can’t emphasize more on that. without it, nothing will work!!! 10. End result of this client app registration should look like as mentioned in the below figure. Configure permissions for Container App registration Lauch Azure Portal. Visit app registration. Select app registration of Azure container app such as streamable-mcp-server2 in this case. Select API permissions. Add the following delegated and application permissions as shown in the below diagram. Note: Don't forget to Grant admin consent. Configure allowed token audience for Container App It defines which audience values (aud claim) in a token are considered valid for your app. When a client app requests an access token from Microsoft Entra ID (Azure AD), the token includes an aud claim that identifies the intended recipient. Your container app will only accept tokens where the aud claim matches one of the values in the Allowed Token Audiences list. This is important as it ensures that only tokens issued for your API or app are accepted and prevents misuse of tokens intended for other resources. This adds extra layer of security. In the Azure Portal, visit Azure Container App. i.e. streamable-mcp-server2. Click on "Authentication" Click "Edit" under identity provider Under "Allowed token audiences", add the application ID URI of "apim-mcp-backend-api". As this will be included as an audience in the access token. Best Practices Only include trusted client app IDs. Avoid using overly broad values like “allow all” (not recommended). Validate tokens using Microsoft libraries (MSAL) or built-in auth features. Configure MCP server in API Management Note: Provisioning an API Management resource is outside the scope of this document. If you do not already have an API Management instance, follow this QuickStart: https://learn.microsoft.com/en-us/azure/api-management/get-started-create-service-instance The following service tiers are available for preview: Classic Basic, Standard, Premium, and Basic v2, Standard v2, Premium v2. For the Classic Basic, Standard, or Premium tiers, you must join the AI Gateway Early Update group to enable MCP server features. Please allow up to 2 hours for the update to take effect. Expose an existing MCP server Follow these steps to expose an existing MCP server is API Management: In the Azure portal, navigate to your API Management instance. In the left-hand menu, under APIs, select MCP servers > + Create MCP server. Select Expose an existing MCP server. In Backend MCP server: Enter the existing MCP server base URL. Example: https://streamable-mcp-serverv2.kdhg489457dslkjgn,.eastus2.azurecontainerapps.io/mcpfor the Microsoft Azure Container App hosting MCP server. In Transport type, Streamable HTTP is selected by default. In New MCP server: Enter a Name the MCP server in API Management. In Base path, enter a route prefix for tools. Example: mcptools Optionally, enter a Description for the MCP server. Select Create. Below diagram shows the MCP servers configured in APIM for reference. Configure policies for MCP server Configure one or more API Management policies to help manage the MCP server. The policies are applied to all API operations exposed as tools in the MCP server and can be used to control access, authentication, and other aspects of the tools. To configure policies for the MCP server: In the Azure portal, navigate to your API Management instance. In the left-hand menu, under APIs, select MCP Servers. Select an MCP server from the list. In the left menu, under MCP, select Policies. In the policy editor, add or edit the policies you want to apply to the MCP server's tools. The policies are defined in XML format. <!-- - Policies are applied in the order they appear. - Position <base/> inside a section to inherit policies from the outer scope. - Comments within policies are not preserved. --> <!-- Add policies as children to the <inbound>, <outbound>, <backend>, and <on-error> elements --> <policies> <!-- Throttle, authorize, validate, cache, or transform the requests --> <inbound> <base /> <set-variable name="accessToken" value="@(context.Request.Headers.GetValueOrDefault("Authorization", "").Replace("Bearer ", ""))" /> <!-- Log the captured access token to the trace logs --> <trace source="Access Token Debug" severity="information"> <message>@("Access Token: " + (string)context.Variables["accessToken"])</message> </trace> <set-variable name="userId" value="@(context.Request.Headers.GetValueOrDefault("Authorization", "Bearer ").Split(' ')[1].AsJwt().Claims["oid"].FirstOrDefault())" /> <set-variable name="userName" value="@(context.Request.Headers.GetValueOrDefault("Authorization", "Bearer ").Split(' ')[1].AsJwt().Claims["name"].FirstOrDefault())" /> <trace source="User Name Debug" severity="information"> <message>@("username: " + (string)context.Variables["userName"])</message> </trace> <set-variable name="scp" value="@(context.Request.Headers.GetValueOrDefault("Authorization", "Bearer ").Split(' ')[1].AsJwt().Claims["scp"].FirstOrDefault())" /> <trace source="Scope Debug" severity="information"> <message>@("scope: " + (string)context.Variables["scp"])</message> </trace> <set-variable name="roles" value="@(context.Request.Headers.GetValueOrDefault("Authorization", "Bearer ").Split(' ')[1].AsJwt().Claims["roles"].FirstOrDefault())" /> <trace source="Role Debug" severity="information"> <message>@("Roles: " + (string)context.Variables["roles"])</message> </trace> <!-- <set-variable name="requestBody" value="@{ return context.Request.Body.As<string>(preserveContent:true); }" /> <trace source="Request Body information" severity="information"> <message>@("Request body: " + (string)context.Variables["requestBody"])</message> </trace> --> <validate-azure-ad-token tenant-id="{{tenant-id}}" header-name="Authorization" failed-validation-httpcode="401" failed-validation-error-message="Unauthorized. Access token is missing or invalid."> <client-application-ids> <application-id>{{client-application-id}}</application-id> </client-application-ids> <audiences> <audience>{{audience}}</audience> </audiences> <required-claims> <claim name="roles" match="any"> <value>mcp.read</value> </claim> </required-claims> </validate-azure-ad-token> </inbound> <!-- Control if and how the requests are forwarded to services --> <backend> <base /> </backend> <!-- Customize the responses --> <outbound> <base /> </outbound> <!-- Handle exceptions and customize error responses --> <on-error> <base /> <trace source="Role Debug" severity="error"> <message>@("username: " + (string)context.Variables["userName"] + " has error in accessing the MCP server, could be auth or role related...")</message> </trace> <return-response> <set-status code="403" reason="Forbidden" /> <set-body> {"error":"Missing required scope or role"} </set-body> </return-response> </on-error> </policies> Note: Update the above inbound policy with the tenant Id, client application id, and audience as per your environment. It is recommended to use APIM "Named values" instead of hard coding inside the policy. To learn more, visit Use named values in Azure API Management policies Configure Diagnostics for APIM In this solution, APIM diagnostics are configured to forward log data to Log Analytics. Testing and validation will be carried out using insights from Log Analytics. Note: Setting up diagnostics is outside the scope of this article. However, you can visit the following link for more information. https://learn.microsoft.com/en-us/azure/api-management/api-management-howto-use-azure-monitor Below diagram shows what Logs are being sent to Log Analytics workspace. MCP Tool configuration in Copilot Studio Lauch copilot studio at https://copilotstudio.microsoft.com/. Configuration of environment and agent is beyond the scope of this article. It is assumed, you already have environment setup and agent has been created. Following link will help you, how to create an agent in copilot studio. Quickstart: Create and deploy an agent — Microsoft Copilot Studio | Microsoft Learn Inside agent configuration, click "Add tool". 3. Click on New tool. 4. Select Model Context Protocol. 5. Provide all relevant information for MCP server. Make sure your server URL ends with your mcp setup. In this case, it is APIM MCP server URL, with base path configured in APIM in the end. Provide server name and server description. Select OAuth 2.0 radio button. 6. Provide the following in the OAuth 2.0 section Client ID of client app registration. In this case, copilot-studio-client as configured earlier. Client secret of copilot-studio-client app registration. Authorization URL: https://login.microsoftonline.com/common/oauth2/v2.0/authorize Token URL template & Refresh URL: https://login.microsoftonline.com/oauth2/v2.0/token Scopes: openid, profile, email — which we selected earlier for Microsoft Azure Graph permissions. Click “Create”. This will provide you Redirect URL. you need to configure the redirect URL in client app registration. In this case, it is copilot-agent-client. Configure Redirect URI in Client App Registration Visit client app registration. i.e. copilot-studio-client. Click Authentication Tab and provide the Web Redirect URIs as shown below. Note: Configure Redirect URIs MUST be configured in app registration. Otherwise, authorization will not complete and sign on will fail. Configure redirect URI in APIM API app registration Also configure apim-mcp-backend-api app registration with the same redirect URI as shown below. Modify MCP connector in PowerApps Now visit the https://make.powerapps.com and open the newly created connector as shown below. Select the security tab and modify the Resource URL with application ID URI of apim-mcp-backend-api configured earlier in app registration for expose an API. Add .default in the scope. Provide the secret of client app registration as it will not let you update the connector. This is extra security measure for updating the connector in Powerapps. Click Update connector. CORS Configuration CORS configuration is a MUST!!! Since our Azure Container App is a remote MCP server with totally different domain or origin. Power Apps and CORS for External Domains — Brief Overview When embedding or integrating Power Apps with external web applications or APIs, Cross-Origin Resource Sharing (CORS) becomes a critical consideration. CORS is a browser security feature that restricts web pages from making requests to a different domain than the one that served the page, unless explicitly allowed. Key Points: Power Apps hosted on *.powerapps.com or within Microsoft 365 domains will block calls to external APIs unless those APIs include the proper CORS headers. The external API must return: Access-Control-Allow-Origin: https://apps.powerapps.com (or * for all origins, though not recommended for production) Access-Control-Allow-Methods: GET, POST, OPTIONS (or as needed) Access-Control-Allow-Headers: Content-Type, Authorization (and any custom headers) If the API requires authentication (e.g., OAuth 2.0), ensure preflight OPTIONS requests are handled correctly. For scenarios where you cannot modify the external API, consider using: Power Automate flows as a proxy Azure API Management or Azure Functions to inject CORS headers Always validate security implications before enabling wide-open CORS. If the CORS are not setup. You will encounter following error in copilot studio after pressing F12 (Browser Developer) CORS policy — blocking the container app Azure container app provides very efficient way of configuring CORS in the Azure portal. Lauch Azure Portal. Visit Azure container app i.e. streamable-mcp-server2 in this case. Click on CORS under Networking section. Configure the following in Allowed Origin Section as shown below. localhost is added to make it work from local laptop, although it is not required for Copilot Studio. 4. Click on “Allowed Method” tab and provide the following. 5. Provide wild card “*” in “Allowed Headers”tab. Although, it is not recommended for production system. it is done for the sake for simplicity. Configure that for added security 6. Click “Apply”. This will configure CORS for remote application. Test the MCP custom connector We are in the final stages of configuring the connector. It is time to test it, if everything is configured correctly and works. Launch the http://make.powerapps.com and click on “Custom connectors”, select your configured connector and click “5. Test” tab as shown below. You will see Selected Connection as blank if you are running it first time. Click “+ New connection” 2. New connection will launch the Authorization flow and browser dialog will pop up for making a request for authorization code. 3. Click “Create”. 4. Complete the login process. This will create a successful connection. 5. Click “Test operation”. If the response is 406 means everything is configured correctly as shown below. Solution validation Add user in Enterprise Application for App roles Roles have been defined under the required claims in the APIM inbound policy and also configured in the apim-mcp-backend-api app registration. As a result, any request from Copilot Studio will be denied if this role is not properly assigned. This role is included in the JWT access token, which we will validate in the following sections. To assign role, perform the following steps. Visit Azure Portal. Visit Enterprise Application. Select APIM backend app registration. In this case for example, apim-mcp-backend-api Click "Users and groups" Select "Add user/group" 5. Select User or Group who should have access to the role. 6. Click "Assign". It will look like as below. Note: Role assignment for users or groups is an important step. If it is not configured, MCP server tests will fail in Copilot studio. Test MCP server in Copilot Studio Lauch copilot studio and click on the Agent you created in earlier steps and click on “Tools tab”. Select your MCP tool as shown the following figure. Make sure it is “Enabled” if you have other tools attached to the same agent, disable them for now for testing. Make sure you have connection available which we created during the testing of custom connector in earlier step. You can also initiate a fresh connection by clicking on the drop down under “Connection” as shown below. Refreshing the tools will show all the tools available in this MCP server. Provide the sample prompt such as “Give me the stock price of tesla”. This will trigger the MCP server and call the respective method to bring the stock price of Tesla. Now try a weather-related question to see more. Now invoking weather forecast tool in the MCP server. APIM Monitoring with Log Analytics We previously configured APIM diagnostic settings to forward log data to Log Analytics. In this section, we’ll review that data, as the inbound policy in APIM sends valuable information to Log Analytics. Run the Kusto query to retrieve data from the last 30 minutes. As shown, the logs capture the APIM API endpoint URL and the backend URL, which corresponds to the Azure Container App endpoint. Scrolling further, we find the TraceRecords section. This contains the information captured by APIM inbound policies and sent to Log Analytics. The figure below illustrates the TraceRecords data. In the inbound policy, we configured it to extract details from the access token—such as the token itself, username, scope, and roles—and forward them to Log Analytics. Now let's capture the access token in the clip board, launch the http://jwt.io which is JSON Web Token (JWT) debugger, and paste the access token in the ENCODED VALUE box as show below. Note the following information. aud: This shows the Application URI ID of apim-mcp-backend-api. which shows access token is requested for that audience. appid: This shows the client Id for copilot-studio-client app registration. You can also see roles and scope. These roles are specified in APIM inbound policy. Note: As you can see, roles are included in access token and if it is not assigned in the enterprise application for "apim-mcp-backend-api", all requests will be denied by APIM inbound policy configured earlier. Perform a test using another Azure AD account that does not have the app role assigned Now, let's try the copilot studio agent by logging in with another account which is not assigned for the "mcp.read" role. Let's, review the below diagram. Logged in as demo and tried to access the MCP tool in copilot studio agent. Request failed with the error "Missing required scope or roles". If you look at it, this is coming from the APIM policy configured earlier in <on-error> Let's review log analytics. As you can see request failed due to inbound APIM policy with 403 error and there is no backend URL. Error is also reported under TraceRecords as we configured it in APIM policy. Now copy the Access token from log analytics and paste it into jwt.io. You can notice in the below diagram, there is no "roles" in the access token, resulting access denied from APIM inbound policy definition to the APIM backend i.e. azure container app. Assign the app role to the demo account Let's assign the "mcp.read" role to the demo account and test if it accesses the tool. Visit Azure Portal, Lauch Enterprise application, and select "apim-mcp-backend-api" as in this example. Click "Users and groups" Click "+ Add user/group" Select demo Click "Select" Click "Assign" End result would look like as shown below. Now, login again as demo. Make sure a new access token is generated. Access token refresh happens after one hours. As you can see in the image below, this time the request is successful after assigning the "mcp.read" app roles. Now let's review the log analytics entries. Let's review the access token in JWT.io. As you can see, roles are included in the access token. Conclusion Exposing the MCP server through Azure API Management (APIM) and integrating it with Copilot Studio agents provides a secure and scalable way to extend enterprise capabilities. By implementing OAuth 2.0, you ensure robust authentication and authorization, protecting sensitive data and maintaining compliance with industry standards. Beyond security, APIM adds significant operational value. With APIM policies, you can monitor traffic, enforce rate limits, and apply fine-grained controls to manage access and performance effectively. This combination of security and governance empowers organizations to innovate confidently while maintaining control and visibility over API usage. In today’s enterprise landscape, leveraging APIM with OAuth 2.0 for MCP integration is not just best practice—it’s a strategic move toward building resilient, secure, and well-governed solutions.2.2KViews2likes2CommentsThe Future of AI: Building Weird, Warm, and Wildly Effective AI Agents
Discover how humor and heart can transform AI experiences. From the playful Emotional Support Goose to the productivity-driven Penultimate Penguin, this post explores why designing with personality matters—and how Azure AI Foundry empowers creators to build tools that are not just efficient, but engaging.1.6KViews0likes0CommentsImplementing MCP Remote Servers with Azure Function App and GitHub Copilot Integration
Introduction In the evolving landscape of AI-driven applications, the ability to seamlessly connect large language models (LLMs) with external tools and data sources is becoming a cornerstone of intelligent system design. Model Context Protocol (MCP) — a specification that enables AI agents to discover and invoke tools dynamically, based on context. While MCP is powerful, implementing it from scratch can be daunting !!! That’s where Azure Functions comes in handy. With its event-driven, serverless architecture, Azure Functions now supports a preview extension for MCP, allowing developers to build remote MCP servers that are scalable, secure, and cloud-native. Further, In VS Code, GitHub Copilot Chat in Agent Mode can connect to your deployed Azure Function App acting as an MCP server. This connection allows Copilot to leverage the tools and services exposed by your function app. Why Use Azure Functions for MCP? Serverless Simplicity: Deploy MCP endpoints without managing infrastructure. Secure by Design: Leverage HTTPS, system keys, and OAuth via EasyAuth or API Management. Language Flexibility: Build in .NET, Python, or Node.js using QuickStart templates. AI Integration: Enable GitHub Copilot, VS Code, or other AI agents to invoke your tools via SSE endpoints. Prerequisites Python version 3.11 or higher Azure Functions Core Tools >= 4.0.7030 Azure Developer CLI To use Visual Studio Code to run and debug locally: Visual Studio Code Azure Functions extension An storage emulator is needed when developing azure function app in VScode. you can deploy Azurite extension in VScode to meet this requirement. Press enter or click to view image in full size You can run the Azurite in VS Code as shown below. C:\Program Files\Microsoft Visual Studio\2022\Enterprise\Common7\IDE\Extensions\Microsoft\Azure Storage Emulator> .\azurite.exe Press enter or click to view image in full size alternatively, you can also run Azurite in docker container as shown below. docker run -p 10000:10000 -p 10001:10001 -p 10002:10002 \ mcr.microsoft.com/azure-storage/azurite For more information about setting up Azurite, visit Use Azurite emulator for local Azure Storage development | Microsoft Learn Github Repositories Following Github repos are needed to setup this PoC. Repository for MCP server using Azure Function App https://github.com/mafzal786/mcp-azure-functions-python.git Repository for AI Foundry agent as MCP Client https://github.com/mafzal786/ai-foundry-agent-with-remote-mcp-using-azure-functionapp.git Clone the repository Run the following command to clone the repository to start building your MCP server using Azure function app. git clone https://github.com/mafzal786/mcp-azure-functions-python.git Run the MCP server in VS Code Once cloned. Open the folder in VS Code. Create a virtual environment in VS Code. Change directory to “src” in a new terminal window, install the python dependencies and start the function host locally as shown below. cd src pip install -r requirements.txt func start Note: by default this will use the webhooks route: /runtime/webhooks/mcp/sse. Later we will use this in Azure to set the key on client/host calls: /runtime/webhooks/mcp/sse?code=<system_key> Press enter or click to view image in full size MCP Inspector In a new terminal window, install and run MCP Inspector. npx @modelcontextprotocol/inspector Click to load the MCP inspector. Also provide the generated proxy session token. http://127.0.0.1:6274/#resources In the URL type and click “Connect”: http://localhost:7071/runtime/webhooks/mcp/sse Once connected, click List Tools under Tools and select “hello_mcp” tool and click “Run Tool” for testing as shown below. Press enter or click to view image in full size Select another tool such as get_stockprice and run it as shown below. Press enter or click to view image in full size Deploy Function App to Azure from VS Code For deploying function app to azure from vs code, make sure you have Azure Tools extension enabled in VS Code. To learn more about Azure Tools extension, visit the following Azure Extensions if your VS code environment is not setup for Azure development, follow Configure Visual Studio Code for Azure development with .NET — .NET | Microsoft Learn Once Azure Tools are setup, sign in to Azure account with Azure Tools Press enter or click to view image in full size Once Sign-in is completed, you should be able to see all of your existing resources in the Resources view. These resources can be managed directly in VS Code. Look for Function App in Resource, right click and click “Deploy to Function App”. Press enter or click to view image in full size If you already have it deployed, you will get the following pop-up. Click “Deploy” Press enter or click to view image in full size This will start deploying your function app to Azure. In VS Code, Azure tab will display the following. Press enter or click to view image in full size Once the deployment is completed, you can view the function app and all the tools in Azure portal under function app as shown below. Press enter or click to view image in full size Get the mcp_extension key from Functions → App Keys in Function App. Press enter or click to view image in full size This mcp_extension key would be needed in mcp.json file in VS code, if you would like to test the MCP server using Github Copilot in VS Code. Your entries in mcp.json file will look like as below for example. { "inputs": [ { "type": "promptString", "id": "functions-mcp-extension-system-key", "description": "Azure Functions MCP Extension System Key", "password": true }, { "type": "promptString", "id": "functionapp-name", "description": "Azure Functions App Name" } ], "servers": { "remote-mcp-function": { "type": "sse", "url": "https://${input:functionapp-name}.azurewebsites.net/runtime/webhooks/mcp/sse", "headers": { "x-functions-key": "${input:functions-mcp-extension-system-key}" } }, "local-mcp-function": { "type": "sse", "url": "http://0.0.0.0:7071/runtime/webhooks/mcp/sse" } } } Test Azure Function MCP Server in MCP Inspector Launch MCP Inspector and provide the Azure Function in MCP inspector URL. Provide authentication as shown below. Bearer token is mcp_extension key. Testing an MCP server with GitHub Copilot Testing an MCP server with GitHub Copilot involves configuring and utilizing the server within your development environment to provide enhanced context and capabilities to Copilot Chat. Steps to Test an MCP Server with GitHub Copilot: Ensure Agent Mode is Enabled: Open Copilot Chat in Visual Studio Code and select “Agent” mode. This mode allows Copilot to interact with external tools and services, including MCP servers. Add the MCP Server: Open the Command Palette (Ctrl+Shift+P or Cmd+Shift+P) and run the command MCP: Add Server. Press enter or click to view image in full size Follow the prompts to configure the server. You can choose to add it to your workspace settings (creating a .vscode/mcp.json file) . Select HTTP or Server-Sent events Press enter or click to view image in full size Specify the URL and click Enter Press enter or click to view image in full size Provide a name of your choice Press enter or click to view image in full size Select scope as Global or workspace. I selected Workspace Press enter or click to view image in full size This will generate mcp.json file in .vscode or create a new entry if mcp.json already exists as shown below. Click Start to “start” the server. Also make sure your Azure function app is locally running with func start command. Press enter or click to view image in full size Now Type the prompt as shown below. Press enter or click to view image in full size Try another tool as below. Press enter or click to view image in full size VS code terminal output for reference. Press enter or click to view image in full size Testing an MCP server with Claude Desktop Claude Desktop is a standalone AI application that allows users to interact with Claude AI models directly from their desktop, providing a seamless and efficient experience. you can download Claude desktop at Download Claude In this article, I have added another tool to utilize to test your MCP server running in Azure Function app. Modify claude_desktop_config.json with the following. you can find this file in window environment at C:\Users\<username>\AppData\Roaming\Claude { "mcpServers": { "my mcp": { "command": "npx", "args": [ "mcp-remote", "http://localhost:7071/runtime/webhooks/mcp/sse" ] } } } Note: If claude_desktop_config.json does not exists, click on setting in Claude desktop under user and visit developer tab. You will see you MCP server in Claude Desktop as shown below. Press enter or click to view image in full size Type the prompt such as “What is the stock price of Tesla” . After submitting, you will notice that it is invoking the tool “get_stockprice” from the MCP server running locally and configured in the .json earlier. Click Allow once or Allow always as shown below. Following output will be displayed. Press enter or click to view image in full size Now lets try weather related prompt. As you can see, it has invoked “get_weatheralerts” tool from MCP server. Press enter or click to view image in full size Azure AI Foundry agent as MCP Client Use the following Github repo to set up Azure AI Foundry agent as MCP client. git clone https://github.com/mafzal786/ai-foundry-agent-with-remote-mcp-using-azure-functionapp.git Open the code in VS code and follow the instructions mentioned in README.md file at Github repo. Once you execute the code, following output will show up in VS code. Press enter or click to view image in full size In this code, message is hard coded. Change the content to “what is weather advisory for Florida” and rerun the program. It will call get_weatheralerts tool and output will look like as below. Press enter or click to view image in full size Conclusion The integration of Model Context Protocol (MCP) with Azure Functions marks a pivotal step in democratizing AI agent development. By leveraging Azure’s serverless architecture, developers can now build remote MCP servers that scale automatically, integrate seamlessly with other Azure services, and expose modular tools to intelligent agents like GitHub Copilot. This setup not only simplifies the deployment and management of MCP servers but also enhances the developer experience — allowing tools to be invoked contextually by AI agents in environments like VS Code, GitHub Codespaces, or Copilot Studio[2]. Whether you’re building a tool to query logs, calculate metrics, or manage data, Azure Functions provides the flexibility, security, and scalability needed to bring your AI-powered workflows to life. As the MCP spec continues to evolve, and GitHub Copilot expands its agentic capabilities, this architecture positions you to stay ahead — offering a robust foundation for cloud-native AI tooling that’s both powerful and future-proof.1.1KViews1like1CommentContext-Aware RAG System with Azure AI Search to Cut Token Costs and Boost Accuracy
🚀 Introduction As AI copilots and assistants become integral to enterprises, one question dominates architecture discussions: “How can we make large language models (LLMs) provide accurate, source-grounded answers — without blowing up token costs?” Retrieval-Augmented Generation (RAG) is the industry’s go-to strategy for this challenge. But traditional RAG pipelines often use static document chunking, which breaks semantic context and drives inefficiencies. To address this, we built a context-aware, cost-optimized RAG pipeline using Azure AI Search and Azure OpenAI, leveraging AI-driven semantic chunking and intelligent retrieval. The result: accurate answers with up to 85% lower token consumption. Majorly in this blog we are considering: Tokenization Chunking The Problem with Naive Chunking Most RAG systems split documents by token or character count (e.g., every 1,000 tokens). This is easy to implement but introduces real-world problems: 🧩 Loss of context — sentences or concepts get split mid-idea. ⚙️ Retrieval noise — irrelevant fragments appear in top results. 💸 Higher cost — you often send 5× more text than necessary. These issues degrade both accuracy and cost efficiency. 🧠 Context-Aware Chunking: Smarter Document Segmentation Instead of breaking text arbitrarily, our system uses an LLM-powered preprocessor to identify semantic boundaries — meaning each chunk represents a complete and coherent concept. Example Naive chunking: “Azure OpenAI Service offers… [cut] …integrates with Azure AI Search for intelligent retrieval.” Context-aware chunking: “Azure OpenAI Service provides access to models like GPT-4o, enabling developers to integrate advanced natural language understanding and generation into their applications. It can be paired with Azure AI Search for efficient, context-aware information retrieval.” ✅ The chunk is self-contained and semantically meaningful. This allows the retriever to match queries with conceptually complete information rather than partial sentences — leading to precision and fewer chunks needed per query. Architecture Diagram Chunking Service: Purpose: Transforms messy enterprise data (wikis, PDFs, transcripts, repos, images) into structured, model-friendly chunks for Retrieval-Augmented Generation (RAG). ChallengeChunking FixLLM context limitsBreaks docs into smaller piecesEmbedding sizeKeeps within token boundsRetrieval accuracyGranular, relevant sections onlyNoiseRemoves irrelevant blocksTraceabilityChunk IDs for auditabilityCost/latencyRe-embed only changed chunks The Chunking Flow (End-to-End) The Chunking Service sits in the ingestion pipeline and follows this sequence: Ingestion: Raw text arrives from sources (wiki, repo, transcript, PDF, image description). Token-aware splitting: Large text is cut into manageable pre-chunks with a 100-token overlap, ensuring no semantic drift across boundaries. Semantic segmentation: Each pre-chunk is passed to an Azure OpenAI Chat model with a structured prompt. Output = JSON array of semantic chunks (sectiontitle, speaker, content). Optional overlap injection: Character-level overlap can be applied across chunks for discourse-heavy text like meeting transcripts. Embedding generation: Each chunk is passed to Azure OpenAI Embeddings API (text-embedding-3-small), producing a 1536-dimension vector. Indexing: Chunks (text + vectors) are uploaded to Azure AI Search. Retrieval: During question answering or document generation, the system pulls top-k chunks, concatenates them, and enriches the prompt for the LLM. Resilience & Traceability The service is built to handle real-world pipeline issues. It retries once on rate limits, validates JSON outputs, and fails fast on malformed data instead of silently dropping chunks. Each chunk is assigned a unique ID (chunk_<sequence>_<sourceTag>), making retrieval auditable and enabling selective re-embedding when only parts of a document change. ☁️ Why Azure AI Search Matters Here Azure AI Search (formerly Cognitive Search) is the heart of the retrieval pipeline. Key Roles: Vector Search Engine: Stores embeddings of chunks and performs semantic similarity search. Hybrid Search (Keyword + Vector): Combines lexical and semantic matching for high precision and recall. Scalability: Supports millions of chunks with blazing-fast search latency. Metadata Filtering: Enables fine-grained retrieval (e.g., by document type, author, section). Native Integration with Azure OpenAI: Allows a seamless, end-to-end RAG pipeline without third-party dependencies. In short, Azure AI Search provides the speed, scalability, and semantic intelligence to make your RAG pipeline enterprise-grade. 💡 Importance of Azure OpenAI Azure OpenAI complements Azure AI Search by providing: High-quality embeddings (text-embedding-3-large) for accurate vector search. Powerful generative reasoning (GPT-4o or GPT-4.1) to craft contextually relevant answers. Security and compliance within your organization’s Azure boundary — critical for regulated environments. Together, these two services form the retrieval (Azure AI Search) and generation (Azure OpenAI) halves of your RAG system. 💰 Token Efficiency By limiting the model’s input to only the most relevant, semantically meaningful chunks, you drastically reduce prompt size and cost. Approach Tokens per Query Typical Cost Accuracy Full-document prompt ~15,000–20,000 Very high Medium Fixed-size RAG chunks ~5,000–8,000 Moderate Medium-high Context-aware RAG (this approach) ~2,000–3,000 Low High 💰 Token Cost Reduction Analysis Let’s quantify it: Step Naive Approach (no RAG) Your Approach (Context-Aware RAG) Prompt context size Entire document (e.g., 15,000 tokens) Top 3 chunks (e.g., 2,000 tokens) Tokens per query ~16,000 (incl. user + system) ~2,500 Cost reduction — ~84% reduction in token usage Accuracy Often low (hallucinations) Higher (targeted retrieval) That’s roughly an 80–85% reduction in token usage while improving both accuracy and response speed. 🧱 Tech Stack Overview Component Service Purpose Chunking Engine Azure OpenAI (GPT models) Generate context-aware chunks Embedding Model Azure OpenAI Embedding API Create high-dimensional vectors Retriever Azure AI Search Perform hybrid and vector search Generator Azure OpenAI GPT-4o Produce final answer Orchestration Layer Python / FastAPI / .NET c# Handle RAG pipeline 🔍 The Bottom Line By adopting context-aware chunking and Azure AI Search-powered RAG, you achieve: ✅ Higher accuracy (contextually complete retrievals) 💸 Lower cost (token-efficient prompts) ⚡ Faster latency (smaller context per call) 🧩 Scalable and secure architecture (fully Azure-native) This is the same design philosophy powering Microsoft Copilot and other enterprise AI assistants today. 🧪 Real-Life Example: Context-Aware RAG in Action To bring this architecture to life, let’s walk through a simple example of how documents can be chunked, embedded, stored in Azure AI Search, and then queried to generate accurate, cost-efficient answers. Imagine you want to build an internal knowledge assistant that answers developer questions from your company’s Azure documentation. ⚙️ Step 1: Intelligent Document Chunking We’ll use a small LLM call to segment text into context-aware chunks — rather than fixed token counts //Context Aware Chunking //text can be your retrieved text from any page/ document private async Task<List<SemanticChunk>> AzureOpenAIChunk(string text) { try { string prompt = $@" Divide the following text into logical, meaningful chunks. Each chunk should represent a coherent section, topic, or idea. Return the result as a JSON array, where each object contains: - sectiontitle - speaker (if applicable, otherwise leave empty) - content Do not add any extra commentary or explanation. Only output the JSON array. Do not give content an array, try to keep all in string. TEXT: {text}" var client = GetAzureOpenAIClient(); var chatCompletionsOptions = new ChatCompletionOptions { Temperature = 0, FrequencyPenalty = 0, PresencePenalty = 0 }; var Messages = new List<OpenAI.Chat.ChatMessage> { new SystemChatMessage("You are a text processing assistant."), new UserChatMessage(prompt) }; var chatClient = client.GetChatClient( deploymentName: _appSettings.Agent.Model); var response = await chatClient.CompleteChatAsync(Messages, chatCompletionsOptions); string responseText = response.Value.Content[0].Text.ToString(); string cleaned = Regex.Replace(responseText, @"```[\s\S]*?```", match => { var match1 = match.Value.Replace("```json", "").Trim(); return match1.Replace("```", "").Trim(); }); // Try to parse the response as JSON array of chunks return CreateChunkArray(cleaned); } catch (JsonException ex) { _logger.LogError("Failed to parse GPT response: " + ex.Message); throw; } catch (Exception ex) { _logger.LogError("Error in AzureOpenAIChunk: " + ex.Message); throw; } } 🧠 Step 2: Adding Overlaps for better result We are adding overlapping between chunks for better and accurate answers. Overlapping window can be modified based on the documents. public List<SemanticChunk> AddOverlap(List<SemanticChunk> chunks, string IDText, int overlapChars = 0) { var overlappedChunks = new List<SemanticChunk>(); for (int i = 0; i < chunks.Count; i++) { var current = chunks[i]; string previousOverlap = i > 0 ? chunks[i - 1].Content[^Math.Min(overlapChars, chunks[i - 1].Content.Length)..] : ""; string combinedText = previousOverlap + "\n" + current.Content; var Id = $"chunk_{i + '_' + IDText}"; overlappedChunks.Add(new SemanticChunk { Id = Regex.Replace(Id, @"[^A-Za-z0-9_\-=]", "_"), Content = combinedText, SectionTitle = current.SectionTitle }); } return overlappedChunks; } 🧠 Step 3: Generate and Store Embeddings in Azure AI Search We convert each chunk into an embedding vector and push it to an Azure AI Search index. public async Task<List<SemanticChunk>> AddEmbeddings(List<SemanticChunk> chunks) { var client = GetAzureOpenAIClient(); var embeddingClient = client.GetEmbeddingClient("text-embedding-3-small"); foreach (var chunk in chunks) { // Generate embedding using the EmbeddingClient var embeddingResult = await embeddingClient.GenerateEmbeddingAsync(chunk.Content).ConfigureAwait(false); chunk.Embedding = embeddingResult.Value.ToFloats(); } return chunks; } public async Task UploadDocsAsync(List<SemanticChunk> chunks) { try { var indexClient = GetSearchindexClient(); var searchClient = indexClient.GetSearchClient(_indexName); var result = await searchClient.UploadDocumentsAsync(chunks); } catch (Exception ex) { _logger.LogError("Failed to upload documents: " + ex); throw; } } 🤖 Step 4: Generate the Final Answer with Azure OpenAI Now we combine the top chunks with the user query to create a cost-efficient, context-rich prompt. P.S. : Here in this example we have used semantic kernel agent , in real time any agent can be used and any prompt can be updated. var context = await _aiSearchService.GetSemanticSearchresultsAsync(UserQuery); // Gets chunks from Azure AI Search //here UserQuery is query asked by user/any question prompt which need to be answered. string questionWithContext = $@"Answer the question briefly in short relevant words based on the context provided. Context : {context}. \n\n Question : {UserQuery}?"; var _agentModel = new AgentModel() { Model = _appSettings.Agent.Model, AgentName = "Answering_Agent", Temperature = _appSettings.Agent.Temperature, TopP = _appSettings.Agent.TopP, AgentInstructions = $@"You are a cloud Migration Architect. " + "Analyze all the details from top to bottom in context based on the details provided for the Migration of APP app using Azure Services. Do not assume anything." + "There can be conflicting details for a question , please verify all details of the context. If there are any conflict please start your answer with word - **Conflict**." + "There might not be answers for all the questions, please verify all details of the context. If there are no answer for question just mention - **No Information**" }; _agentModel = await _agentService.CreateAgentAsync(_agentModel); _agentModel.QuestionWithContext = questionWithContext; var modelWithResponse = await _agentService.GetAnswerAsync(_agentModel); 🧠 Final Thoughts Context-aware RAG isn’t just a performance optimization — it’s an architectural evolution. It shifts the focus from feeding LLMs more data to feeding them the right data. By letting Azure AI Search handle intelligent retrieval and Azure OpenAI handle reasoning, you create an efficient, explainable, and scalable AI assistant. The outcome: Smarter answers, lower costs, and a pipeline that scales with your enterprise. Wiki Link: Tokenization and Chunking IP Link: AI Migration Accelerator1.2KViews4likes0CommentsIntegrate Custom Azure AI Agents with CoPilot Studio and M365 CoPilot
Integrating Custom Agents with Copilot Studio and M365 Copilot In today's fast-paced digital world, integrating custom agents with Copilot Studio and M365 Copilot can significantly enhance your company's digital presence and extend your CoPilot platform to your enterprise applications and data. This blog will guide you through the integration steps of bringing your custom Azure AI Agent Service within an Azure Function App, into a Copilot Studio solution and publishing it to M365 and Teams Applications. When Might This Be Necessary: Integrating custom agents with Copilot Studio and M365 Copilot is necessary when you want to extend customization to automate tasks, streamline processes, and provide better user experience for your end-users. This integration is particularly useful for organizations looking to streamline their AI Platform, extend out-of-the-box functionality, and leverage existing enterprise data and applications to optimize their operations. Custom agents built on Azure allow you to achieve greater customization and flexibility than using Copilot Studio agents alone. What You Will Need: To get started, you will need the following: Azure AI Foundry Azure OpenAI Service Copilot Studio Developer License Microsoft Teams Enterprise License M365 Copilot License Steps to Integrate Custom Agents: Create a Project in Azure AI Foundry: Navigate to Azure AI Foundry and create a project. Select 'Agents' from the 'Build and Customize' menu pane on the left side of the screen and click the blue button to create a new agent. Customize Your Agent: Your agent will automatically be assigned an Agent ID. Give your agent a name and assign the model your agent will use. Customize your agent with instructions: Add your knowledge source: You can connect to Azure AI Search, load files directly to your agent, link to Microsoft Fabric, or connect to third-party sources like Tripadvisor. In our example, we are only testing the CoPilot integration steps of the AI Agent, so we did not build out additional options of providing grounding knowledge or function calling here. Test Your Agent: Once you have created your agent, test it in the playground. If you are happy with it, you are ready to call the agent in an Azure Function. Create and Publish an Azure Function: Use the sample function code from the GitHub repository to call the Azure AI Project and Agent. Publish your Azure Function to make it available for integration. azure-ai-foundry-agent/function_app.py at main · azure-data-ai-hub/azure-ai-foundry-agent Connect your AI Agent to your Function: update the "AIProjectConnString" value to include your Project connection string from the project overview page of in the AI Foundry. Role Based Access Controls: We have to add a role for the function app on OpenAI service. Role-based access control for Azure OpenAI - Azure AI services | Microsoft Learn Enable Managed Identity on the Function App Grant "Cognitive Services OpenAI Contributor" role to the System-assigned managed identity to the Function App in the Azure OpenAI resource Grant "Azure AI Developer" role to the System-assigned managed identity for your Function App in the Azure AI Project resource from the AI Foundry Build a Flow in Power Platform: Before you begin, make sure you are working in the same environment you will use to create your CoPilot Studio agent. To get started, navigate to the Power Platform (https://make.powerapps.com) to build out a flow that connects your Copilot Studio solution to your Azure Function App. When creating a new flow, select 'Build an instant cloud flow' and trigger the flow using 'Run a flow from Copilot'. Add an HTTP action to call the Function using the URL and pass the message prompt from the end user with your URL. The output of your function is plain text, so you can pass the response from your Azure AI Agent directly to your Copilot Studio solution. Create Your Copilot Studio Agent: Navigate to Microsoft Copilot Studio and select 'Agents', then 'New Agent'. Make sure you are in the same environment you used to create your cloud flow. Now select ‘Create’ button at the top of the screen From the top menu, navigate to ‘Topics’ and ‘System’. We will open up the ‘Conversation boosting’ topic. When you first open the Conversation boosting topic, you will see a template of connected nodes. Delete all but the initial ‘Trigger’ node. Now we will rebuild the conversation boosting agent to call the Flow you built in the previous step. Select 'Add an Action' and then select the option for existing Power Automate flow. Pass the response from your Custom Agent to the end user and end the current topic. My existing Cloud Flow: Add action to connect to existing Cloud Flow: When this menu pops up, you should see the option to Run the flow you created. Here, mine does not have a very unique name, but you see my flow 'Run a flow from Copilot' as a Basic action menu item. If you do not see your cloud flow here add the flow to the default solution in the environment. Go to Solutions > select the All pill > Default Solution > then add the Cloud Flow you created to the solution. Then go back to Copilot Studio, refresh and the flow will be listed there. Now complete building out the conversation boosting topic: Make Agent Available in M365 Copilot: Navigate to the 'Channels' menu and select 'Teams + Microsoft 365'. Be sure to select the box to 'Make agent available in M365 Copilot'. Save and re-publish your Copilot Agent. It may take up to 24 hours for the Copilot Agent to appear in M365 Teams agents list. Once it has loaded, select the 'Get Agents' option from the side menu of Copilot and pin your Copilot Studio Agent to your featured agent list Now, you can chat with your custom Azure AI Agent, directly from M365 Copilot! Conclusion: By following these steps, you can successfully integrate custom Azure AI Agents with Copilot Studio and M365 Copilot, enhancing you’re the utility of your existing platform and improving operational efficiency. This integration allows you to automate tasks, streamline processes, and provide better user experience for your end-users. Give it a try! Curious of how to bring custom models from your AI Foundry to your CoPilot Studio solutions? Check out this blog19KViews3likes11Comments