azure openai service

258 Topics

Foundry Agent Service at Ignite 2025: Simple to Build. Powerful to Deploy. Trusted to Operate.
The upgraded Foundry Agent Service delivers a unified, simplified platform with managed hosting, built-in memory, tool catalogs, and seamless integration with Microsoft Agent Framework. Developers can now deploy agents faster and more securely, leveraging one-click publishing to Microsoft 365 and advanced governance features for streamlined enterprise AI operations.
Yina Arenas
Nov 20, 2025 Place Microsoft Foundry Blog
6.8KViews
3likes
1Comment
GPT‑5.1 in Foundry: A Workhorse for Reasoning, Coding, and Chat
The pace of AI innovation is accelerating, and developers—across startups and global enterprises—are at the heart of this transformation. Today marks a significant moment for enterprise AI innovation: Azure AI Foundry is unveiling OpenAI’s GPT-5.1 series, the next generation of reasoning, analytics, and conversational intelligence. The following models will be rolling out in Foundry today: GPT-5.1: adaptive, more efficient reasoning GPT-5.1-chat: chat with new chain-of-thought for end-users GPT-5.1-codex: optimized for long-running conversations with enhanced tools and agentic workflows GPT-5.1-codex-mini: a compact variant for resource-constrained environments What’s new with GPT-5.1 series The GPT-5.1 series is built to respond faster to users in a variety of situations with adaptive reasoning, improving latency and cost efficiency across the series by varying thinking time more significantly. This, combined with other tooling improvements, enhanced stepwise reasoning visibility, multimodal intelligence, and enterprise-grade compliance. GPT-5.1: Adaptive and Efficient Reasoning GPT-5.1 is the mainline model engineered to deliver adaptive, stepwise reasoning that adjusts its approach based on the complexity of each task. Core capabilities included: Adaptive reasoning for nuanced, context-aware thinking time Multimodal intelligence: supporting text, image, and audio inputs/outputs Enterprise-grade performance, security, and compliance This model’s flexibility empowers developers to tackle a wide spectrum of tasks—from simple queries to deep, multi-step workflows for enterprise-grade solutions. With its ability to intelligently balance speed, cost, and intelligence, GPT-5.1 sets a new standard for both performance and efficiency in AI-powered development. GPT-5.1-chat: Elevating Interactive Experiences with Smart, Safe Conversations GPT-5.1-chat powers fast, context-aware chat experiences with adaptive reasoning and robust safety guardrails. With chain-of-thought added in the chat for the first time, it brings an interactive experience to the next level. It’s tuned for safety and instruction-following, making it ideal for customer support, IT helpdesk, HR, and sales enablement. Multimodal chat (text, image, and audio) improves long-turn consistency for real problem solving, delivering brand-aligned, safe conversations, and supporting next-best-action recommendations. GPT-5.1-codex and GPT-5.1-codex-mini: Frontier Models for Agentic Coding GPT-5.1-codex builds on the foundation set by GPT-5-codex, advancing developer tooling with: Enhanced reasoning frameworks for stepwise, context-aware code analysis and generation; plus Enhanced tool handling for certain development scenario's Multimodal intelligence for richer developer experiences when coding With Foundry’s enterprise-grade security and governance, GPT-5.1-codex is ideal for automated code generation and review, accelerating development cycles with intelligent code suggestions, refactoring, and bug detection. GPT-5.1-codex-mini is a compact, efficient variant optimized for resource-constrained environments. It maintains near state-of-the-art performance, multimodal intelligence, and the same safety stack and tool access as GPT-5.1-codex, making it best for cost-effective, scalable solutions in education, startups, and cost-conscience settings. Together, these Codex models empower teams to innovate faster and with greater confidence. Selecting Your AI Engine: Match Model Strengths to Your Business Goals One of the advantages of the GPT-5.1 series is unified access to deep reasoning, adaptive chat, and advanced coding—all in one place. Here’s how to match model strengths to your needs: Opt for GPT-5.1 for general ai application use—tasks like analytics, research, legal/financial review, or consolidating large documents and codebases. It’s the model of choice for reliability and high-impact outputs. Go with GPT-5.1-chat for interactive assistants and product UX, especially when adaptive reasoning is required for complex cases. Reasoning hints and adaptive reasoning help with customer latency perception. Leverage GPT-5.1-codex for deep, stepwise reasoning in complex code generation, refactoring, or multi-step analysis—ideal for demanding agentic workflows and enterprise automation. Utilize GPT-5.1-codex-mini for efficient, cost-effective coding intelligence in broad-scale deployment, education, or resource-constrained environments—delivering near-mainline performance in a compact model. Deployment and Pricing Model Deployment Available Regions Pricing ($/million tokens) Input Cached Input Output GPT-5.1 Standard Global Global $1.25 $0.125 $10.00 Standard Data Zone Data Zone (US & EU) $1.38 $0.14 $11.00 GPT-5.1-chat Standard Global Global $1.25 $0.125 $10.00 GPT-5.1-codex Standard Global Global $1.25 $0.125 $10.00 GPT-5.1-codex-mini Standard Global Global $0.25 $0.025 $2.00 Start Building Today The GPT-5.1 series is now available in Foundry Models. Whether you’re building for enterprise, small and medium-sized business, or launching the next digital-native app, these models and the Foundry platform are designed to help you innovate faster, safer, and at scale.
Naomi Moneypenny
Nov 18, 2025 Place Microsoft Foundry Blog
19KViews
1like
22Comments
Observability for Multi-Agent Systems with Microsoft Agent Framework and Azure AI Foundry
Agentic applications are revolutionizing enterprise automation, but their dynamic toolchains and latent reasoning make them notoriously hard to operate. In this post, you'll learn how to instrument a Microsoft Agent Framework–based service with OpenTelemetry, ship traces to Azure AI Foundry observability, and adopt a practical workflow to debug, evaluate, and improve multi-agent behavior in production. We'll show how to wire spans around reasoning steps and tool calls (OpenAPI / MCP), enabling deep visibility into your agentic workflows. Who Should Read This? Developers building agents with Microsoft Agent Framework (MAF) in .NET or Python Architects/SREs seeking enterprise-grade visibility, governance, and reliability for deployments on Azure AI Foundry Why Observability Is Non-Negotiable for Agents Traditional logs fall short for agentic systems: Reasoning and routing (which tool? which doc?) are opaque without explicit spans/events Failures often occur between components (e.g., retrieval mismatch, tool schema drift) Without traces across agents ⇄ tools ⇄ data stores, you can't reproduce or evaluate behavior Microsoft has introduced multi-agent observability patterns and OpenTelemetry (OTel) conventions that unify traces across Agent Framework, Foundry, and popular stacks—so you can see one coherent timeline for each task. Reference Architecture Key Capabilities Agent orchestration & deployment via Microsoft Agent Framework Model access using Foundry’s OpenAI-compatible endpoint OpenTelemetry for traces/spans + attributes (agent, tool, retrieval, latency, tokens) Step-by-Step Implementation Assumption: This article uses Azure Monitor (via Application Insights) as the OpenTelemetry exporter, but you can configure other supported exporters in the same way. Prerequisites .NET 8 SDK or later Azure OpenAI service (endpoint, API key, deployed model) Application Insights and Grafana Create an Agent with OpenTelemetry (ASP.NET Core or Console App) Install required packages: dotnet add package Azure.AI.OpenAI dotnet add package Azure.Monitor.OpenTelemetry.Exporter dotnet add package Microsoft.Agents.AI.OpenAI dotnet add package Microsoft.Extensions.Logging dotnet add package OpenTelemetry dotnet add package OpenTelemetry.Trace dotnet add package OpenTelemetry.Metrics dotnet add package OpenTelemetry.Extensions.Hosting dotnet add package OpenTelemetry.Instrumentation.Http Setup environment variables: AZURE_OPENAI_ENDPOINT: https://<your_service_name>.openai.azure.com/ AZURE_OPENAI_API_KEY: <your_azure_openai_apikey> APPLICATIONINSIGHTS_CONNECTION_STRING: <your_application_insights_connectionstring_for_azuremonitor_exporter> Configure tracing once at startup: var applicationInsightsConnectionString = Environment.GetEnvironmentVariable("APPLICATIONINSIGHTS_CONNECTION_STRING"); // Create a resource describing the service var resource = ResourceBuilder.CreateDefault() .AddService(serviceName: ServiceName) .AddAttributes(new Dictionary<string, object> { ["deployment.environment"] = "development", ["service.instance.id"] = Environment.MachineName }) .Build(); // Setup OpenTelemetry TracerProvider var traceProvider = Sdk.CreateTracerProviderBuilder() .SetResourceBuilder(ResourceBuilder.CreateDefault().AddService(ServiceName)) .AddSource(SourceName) .AddSource("Microsoft.Agents.AI") .AddHttpClientInstrumentation() .AddAzureMonitorTraceExporter(options => { options.ConnectionString = applicationInsightsConnectionString; }) .Build(); // Setup OpenTelemetry MeterProvider var meterProvider = Sdk.CreateMeterProviderBuilder() .SetResourceBuilder(ResourceBuilder.CreateDefault().AddService(ServiceName)) .AddMeter(SourceName) .AddAzureMonitorMetricExporter(options => { options.ConnectionString = applicationInsightsConnectionString; }) .Build(); // Configure DI and OpenTelemetry var serviceCollection = new ServiceCollection(); // Setup Logging with OpenTelemetry and Application Insights serviceCollection.AddLogging(loggingBuilder => { loggingBuilder.SetMinimumLevel(LogLevel.Debug); loggingBuilder.AddOpenTelemetry(options => { options.SetResourceBuilder(ResourceBuilder.CreateDefault().AddService(ServiceName)); options.IncludeScopes = true; options.IncludeFormattedMessage = true; options.AddAzureMonitorLogExporter(exporterOptions => { exporterOptions.ConnectionString = applicationInsightsConnectionString; }); }); loggingBuilder.AddApplicationInsights( configureTelemetryConfiguration: (config) => { config.ConnectionString = Environment.GetEnvironmentVariable("APPLICATIONINSIGHTS_CONNECTION_STRING"); }, configureApplicationInsightsLoggerOptions: options => { options.TrackExceptionsAsExceptionTelemetry = true; options.IncludeScopes = true; }); }); Configure custom metrics and activity source for tracing: using var activitySource = new ActivitySource(SourceName); using var meter = new Meter(SourceName); // Create custom metrics var interactionCounter = meter.CreateCounter<long>("chat_interactions_total", description: "Total number of chat interactions"); var responseTimeHistogram = meter.CreateHistogram<double>("chat_response_time_ms", description: "Chat response time in milliseconds"); 2. Wire-up the AI Agent: // Create OpenAI client var endpoint = Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT"); var apiKey = Environment.GetEnvironmentVariable("AZURE_OPENAI_API_KEY"); var deploymentName = "gpt-4o-mini"; using var client = new AzureOpenAIClient(new Uri(endpoint), new AzureKeyCredential(apiKey)) .GetChatClient(deploymentName) .AsIChatClient() .AsBuilder() .UseOpenTelemetry(sourceName: SourceName, configure: (cfg) => cfg.EnableSensitiveData = true) .Build(); logger.LogInformation("Creating Agent with OpenTelemetry instrumentation"); // Create AI Agent var agent = new ChatClientAgent( client, name: "AgentObservabilityDemo", instructions: "You are a helpful assistant that provides concise and informative responses.") .AsBuilder() .UseOpenTelemetry(SourceName, configure: (cfg) => cfg.EnableSensitiveData = true) .Build(); var thread = agent.GetNewThread(); logger.LogInformation("Agent created successfully with ID: {AgentId}", agent.Id); 3. Instrument Agent logic with semantic attributes and call OpenAI-compatible API: // Create a parent span for the entire agent session using var sessionActivity = activitySource.StartActivity("Agent Session"); Console.WriteLine($"Trace ID: {sessionActivity?.TraceId} "); var sessionId = Guid.NewGuid().ToString("N"); sessionActivity? .SetTag("agent.name", "AgentObservabilityDemo") .SetTag("session.id", sessionId) .SetTag("session.start_time", DateTimeOffset.UtcNow.ToString("O")); logger.LogInformation("Starting agent session with ID: {SessionId}", sessionId); using (logger.BeginScope(new Dictionary<string, object> { ["SessionId"] = sessionId, ["AgentName"] = "AgentObservabilityDemo" })) { var interactionCount = 0; while (true) { Console.Write("You (or 'exit' to quit): "); var input = Console.ReadLine(); if (string.IsNullOrWhiteSpace(input) || input.Equals("exit", StringComparison.OrdinalIgnoreCase)) { logger.LogInformation("User requested to exit the session"); break; } interactionCount++; logger.LogInformation("Processing interaction #{InteractionCount}", interactionCount); // Create a child span for each individual interaction using var activity = activitySource.StartActivity("Agent Interaction"); activity? .SetTag("user.input", input) .SetTag("agent.name", "AgentObservabilityDemo") .SetTag("interaction.number", interactionCount); var stopwatch = Stopwatch.StartNew(); try { logger.LogInformation("Starting agent execution for interaction #{InteractionCount}", interactionCount); var response = await agent.RunAsync(input); Console.WriteLine($"Agent: {response}"); Console.WriteLine(); stopwatch.Stop(); var responseTimeMs = stopwatch.Elapsed.TotalMilliseconds; // Record metrics interactionCounter.Add(1, new KeyValuePair<string, object?>("status", "success")); responseTimeHistogram.Record(responseTimeMs, new KeyValuePair<string, object?>("status", "success")); activity?.SetTag("interaction.status", "success"); logger.LogInformation("Agent interaction #{InteractionNumber} completed successfully in {ResponseTime:F2} seconds", interactionCount, responseTimeMs); } catch (Exception ex) { Console.WriteLine($"Error: {ex.Message}"); Console.WriteLine(); stopwatch.Stop(); var responseTimeMs = stopwatch.Elapsed.TotalSeconds; // Record error metrics interactionCounter.Add(1, new KeyValuePair<string, object?>("status", "error")); responseTimeHistogram.Record(responseTimeMs, new KeyValuePair<string, object?>("status", "error")); activity? .SetTag("response.success", false) .SetTag("error.message", ex.Message) .SetStatus(ActivityStatusCode.Error, ex.Message); logger.LogError(ex, "Agent interaction #{InteractionNumber} failed after {ResponseTime:F2} seconds: {ErrorMessage}", interactionCount, responseTimeMs, ex.Message); } } // Add session summary to the parent span sessionActivity? .SetTag("session.total_interactions", interactionCount) .SetTag("session.end_time", DateTimeOffset.UtcNow.ToString("O")); logger.LogInformation("Agent session completed. Total interactions: {TotalInteractions}", interactionCount); Azure Monitor dashboard Once you run the agent and generate some traffic, your dashboard in Azure Monitor will be populated as shown below: You can drill down to specific service / activity source / spans by applying relevant filters: Key Features Demonstrated OpenTelemetry instrumentation with Microsoft Agent framework Custom metrics for user interactions End-to-end Telemetry correlation Real time telemetry visualization along with metrics and logging interactions Further reading Introducing Microsoft Agent Framework Azure AI Foundry docs OpenTelemetry Aspire Demo with Azure OpenAI
Shah_Viral
Nov 13, 2025 Place Microsoft Foundry Blog
1.2KViews
3likes
0Comments
Real-Time Speech Intelligence for Global Scale: gpt-4o-transcribe-diarize in Azure AI Foundry
Voice is a natural interface for communication. Now, with the general availability of gpt-4o-transcribe-diarize, the new automatic speech recognition (ASR) model in Azure AI Foundry, transforming speech into actionable text is faster, smarter, and more accurate than ever. This launch marks a significant milestone in our mission to empower organizations with AI that delivers speed, accuracy, and enterprise-grade reliability. With gpt-4o-transcribe-diarize seamlessly integrated, businesses can unlock critical insights from conversations, instantly converting audio into text with ultra-low latency and outstanding accuracy across 100+ languages. Whether you're enhancing live event accessibility, analyzing customer interactions, or enabling intelligent voice-driven applications, gpt-4o-transcribe-diarize helps capture spoken word and leverages it for real-time decision-making. Experience how Azure AI’s innovation in speech technology is helping to redefine productivity and global reach, setting a new standard for audio intelligence in the enterprise landscape. Why gpt-4o-transcribe-diarize Matters Businesses today operate in a world where conversations drive decisions. From customer support calls to virtual meetings, audio data holds critical insights. Gpt-4o-transcribe-diarize unlocks these insights, converting speech to text with ultra-low latency and high accuracy across 100+ languages. Whether you’re captioning live events, analyzing call center interactions, or building voice-driven applications, gpt-4o-transcribe-diarize offers the opportunity to help your workflows be powered by real-time intelligence. Key Features Lightning-Fast Transcription: Convert 10 minutes of audio in ~15 seconds with our new Fast Transcription API. Global Language Coverage: Support for 100+ languages and dialects for inclusive, global experiences. Seamless Integration: Available in Azure AI Foundry with managed endpoints for easy deployment and scale. Real-World Impact Imagine a reporter summarizing interviews in real time, a financial institution transcribing calls instantly, or a global retailer powering multilingual voice assistants; all with the speed and security of Azure AI Foundry. gpt-4o-transcribe-diarize can make these scenarios possible today. Pricing and regional availability for gpt-4o-transcribe-diarize Model Deployment Regions Price $/1m tokens gpt-4o-transcribe-diarize Global Standard (Paygo) East US 2, Sweden Central Text input: $2.50 Audio input: $6.00 Output: $10.00 gpt-4o-transcribe-diarize in audio AI innovation context gpt-4o-transcribe-diarize is part of a broader wave of audio AI innovation on Azure, joining new models like OpenAI gpt-realtime and gpt-audio that are purpose-built for expressive, low-latency voice experiences. While gpt-4o-transcribe-diarize delivers ultra-fast transcription with enterprise-grade accuracy, gpt-realtime enables natural, emotionally rich voice interactions with millisecond responsiveness—ideal for live conversations, voice agents, and multimodal applications. Meanwhile, audio models like gpt-4o-transcribe mini, and mini-tts extend the platform’s capabilities with customizable speech synthesis and real-time captioning, making Azure AI a comprehensive solution for building intelligent, production-ready voice systems. gpt-realtime Features OpenAI claims the gpt-realtime model introduces a new standard for voice-first applications, combining expressive audio generation with ultra-low latency and natural conversational flow. It’s designed to power real-time interactions that feel like natural, responsive speech. Key Features: Millisecond Latency: Enables live responsiveness suitable for real-time conversations, kiosks, and voice agents. Emotionally Expressive Voices: Supports nuanced speech delivery with voices like Marin and Cedar, capable of conveying tone, emotion, and intent. Natural Turn-Taking: Built-in mechanisms for detecting pauses and transitions, allowing fluid back-and-forth dialogue. Function Calling Support: Seamlessly integrates with backend systems to trigger actions based on voice input. Multimodal Readiness: Designed to work with text, audio, and visual inputs for rich, interactive experiences. Stable APIs for Production: Enterprise-grade reliability with consistent behavior across sessions and deployments. These features make gpt-realtime a foundational model for building intelligent voice interfaces that go beyond transcription—delivering conversational intelligence in real time. gpt-realtime Use Cases With its expressive audio capabilities and real-time responsiveness, gpt-realtime unlocks new possibilities across industries. Whether enhancing customer engagement or streamlining operations, it brings voice AI into the heart of enterprise workflows. Examples include: Customer Service Agents: Power virtual agents that respond instantly with natural, tones for rich expressiveness, improving customer satisfaction and reducing wait times. Retail Kiosks & Smart Devices: Enable voice-driven product discovery, troubleshooting, and checkout experiences with real-time feedback. Multilingual Voice Assistants: Deliver localized, expressive voice experiences across global markets with support for multiple languages and dialects. Live Captioning & Accessibility: Combine gpt-4o-transcribe-diarize gpt-realtime to provide real-time captions and voice synthesis for inclusive experiences. These use cases demonstrate how gpt-realtime transforms voice into a strategic interface—bridging human communication and intelligent systems with speed and accuracy. Ready to transform voice into value? Learn more and start building with gpt-4o-transcribe-diarize
Naomi Moneypenny
Nov 05, 2025 Place Microsoft Foundry Blog
4KViews
0likes
1Comment
Implementing MCP Remote Servers with Azure Function App and GitHub Copilot Integration
Introduction In the evolving landscape of AI-driven applications, the ability to seamlessly connect large language models (LLMs) with external tools and data sources is becoming a cornerstone of intelligent system design. Model Context Protocol (MCP) — a specification that enables AI agents to discover and invoke tools dynamically, based on context. While MCP is powerful, implementing it from scratch can be daunting !!! That’s where Azure Functions comes in handy. With its event-driven, serverless architecture, Azure Functions now supports a preview extension for MCP, allowing developers to build remote MCP servers that are scalable, secure, and cloud-native. Further, In VS Code, GitHub Copilot Chat in Agent Mode can connect to your deployed Azure Function App acting as an MCP server. This connection allows Copilot to leverage the tools and services exposed by your function app. Why Use Azure Functions for MCP? Serverless Simplicity: Deploy MCP endpoints without managing infrastructure. Secure by Design: Leverage HTTPS, system keys, and OAuth via EasyAuth or API Management. Language Flexibility: Build in .NET, Python, or Node.js using QuickStart templates. AI Integration: Enable GitHub Copilot, VS Code, or other AI agents to invoke your tools via SSE endpoints. Prerequisites Python version 3.11 or higher Azure Functions Core Tools >= 4.0.7030 Azure Developer CLI To use Visual Studio Code to run and debug locally: Visual Studio Code Azure Functions extension An storage emulator is needed when developing azure function app in VScode. you can deploy Azurite extension in VScode to meet this requirement. Press enter or click to view image in full size You can run the Azurite in VS Code as shown below. C:\Program Files\Microsoft Visual Studio\2022\Enterprise\Common7\IDE\Extensions\Microsoft\Azure Storage Emulator> .\azurite.exe Press enter or click to view image in full size alternatively, you can also run Azurite in docker container as shown below. docker run -p 10000:10000 -p 10001:10001 -p 10002:10002 \ mcr.microsoft.com/azure-storage/azurite For more information about setting up Azurite, visit Use Azurite emulator for local Azure Storage development | Microsoft Learn Github Repositories Following Github repos are needed to setup this PoC. Repository for MCP server using Azure Function App https://github.com/mafzal786/mcp-azure-functions-python.git Repository for AI Foundry agent as MCP Client https://github.com/mafzal786/ai-foundry-agent-with-remote-mcp-using-azure-functionapp.git Clone the repository Run the following command to clone the repository to start building your MCP server using Azure function app. git clone https://github.com/mafzal786/mcp-azure-functions-python.git Run the MCP server in VS Code Once cloned. Open the folder in VS Code. Create a virtual environment in VS Code. Change directory to “src” in a new terminal window, install the python dependencies and start the function host locally as shown below. cd src pip install -r requirements.txt func start Note: by default this will use the webhooks route: /runtime/webhooks/mcp/sse. Later we will use this in Azure to set the key on client/host calls: /runtime/webhooks/mcp/sse?code=<system_key> Press enter or click to view image in full size MCP Inspector In a new terminal window, install and run MCP Inspector. npx @modelcontextprotocol/inspector Click to load the MCP inspector. Also provide the generated proxy session token. http://127.0.0.1:6274/#resources In the URL type and click “Connect”: http://localhost:7071/runtime/webhooks/mcp/sse Once connected, click List Tools under Tools and select “hello_mcp” tool and click “Run Tool” for testing as shown below. Press enter or click to view image in full size Select another tool such as get_stockprice and run it as shown below. Press enter or click to view image in full size Deploy Function App to Azure from VS Code For deploying function app to azure from vs code, make sure you have Azure Tools extension enabled in VS Code. To learn more about Azure Tools extension, visit the following Azure Extensions if your VS code environment is not setup for Azure development, follow Configure Visual Studio Code for Azure development with .NET — .NET | Microsoft Learn Once Azure Tools are setup, sign in to Azure account with Azure Tools Press enter or click to view image in full size Once Sign-in is completed, you should be able to see all of your existing resources in the Resources view. These resources can be managed directly in VS Code. Look for Function App in Resource, right click and click “Deploy to Function App”. Press enter or click to view image in full size If you already have it deployed, you will get the following pop-up. Click “Deploy” Press enter or click to view image in full size This will start deploying your function app to Azure. In VS Code, Azure tab will display the following. Press enter or click to view image in full size Once the deployment is completed, you can view the function app and all the tools in Azure portal under function app as shown below. Press enter or click to view image in full size Get the mcp_extension key from Functions → App Keys in Function App. Press enter or click to view image in full size This mcp_extension key would be needed in mcp.json file in VS code, if you would like to test the MCP server using Github Copilot in VS Code. Your entries in mcp.json file will look like as below for example. { "inputs": [ { "type": "promptString", "id": "functions-mcp-extension-system-key", "description": "Azure Functions MCP Extension System Key", "password": true }, { "type": "promptString", "id": "functionapp-name", "description": "Azure Functions App Name" } ], "servers": { "remote-mcp-function": { "type": "sse", "url": "https://${input:functionapp-name}.azurewebsites.net/runtime/webhooks/mcp/sse", "headers": { "x-functions-key": "${input:functions-mcp-extension-system-key}" } }, "local-mcp-function": { "type": "sse", "url": "http://0.0.0.0:7071/runtime/webhooks/mcp/sse" } } } Test Azure Function MCP Server in MCP Inspector Launch MCP Inspector and provide the Azure Function in MCP inspector URL. Provide authentication as shown below. Bearer token is mcp_extension key. Testing an MCP server with GitHub Copilot Testing an MCP server with GitHub Copilot involves configuring and utilizing the server within your development environment to provide enhanced context and capabilities to Copilot Chat. Steps to Test an MCP Server with GitHub Copilot: Ensure Agent Mode is Enabled: Open Copilot Chat in Visual Studio Code and select “Agent” mode. This mode allows Copilot to interact with external tools and services, including MCP servers. Add the MCP Server: Open the Command Palette (Ctrl+Shift+P or Cmd+Shift+P) and run the command MCP: Add Server. Press enter or click to view image in full size Follow the prompts to configure the server. You can choose to add it to your workspace settings (creating a .vscode/mcp.json file) . Select HTTP or Server-Sent events Press enter or click to view image in full size Specify the URL and click Enter Press enter or click to view image in full size Provide a name of your choice Press enter or click to view image in full size Select scope as Global or workspace. I selected Workspace Press enter or click to view image in full size This will generate mcp.json file in .vscode or create a new entry if mcp.json already exists as shown below. Click Start to “start” the server. Also make sure your Azure function app is locally running with func start command. Press enter or click to view image in full size Now Type the prompt as shown below. Press enter or click to view image in full size Try another tool as below. Press enter or click to view image in full size VS code terminal output for reference. Press enter or click to view image in full size Testing an MCP server with Claude Desktop Claude Desktop is a standalone AI application that allows users to interact with Claude AI models directly from their desktop, providing a seamless and efficient experience. you can download Claude desktop at Download Claude In this article, I have added another tool to utilize to test your MCP server running in Azure Function app. Modify claude_desktop_config.json with the following. you can find this file in window environment at C:\Users\<username>\AppData\Roaming\Claude { "mcpServers": { "my mcp": { "command": "npx", "args": [ "mcp-remote", "http://localhost:7071/runtime/webhooks/mcp/sse" ] } } } Note: If claude_desktop_config.json does not exists, click on setting in Claude desktop under user and visit developer tab. You will see you MCP server in Claude Desktop as shown below. Press enter or click to view image in full size Type the prompt such as “What is the stock price of Tesla” . After submitting, you will notice that it is invoking the tool “get_stockprice” from the MCP server running locally and configured in the .json earlier. Click Allow once or Allow always as shown below. Following output will be displayed. Press enter or click to view image in full size Now lets try weather related prompt. As you can see, it has invoked “get_weatheralerts” tool from MCP server. Press enter or click to view image in full size Azure AI Foundry agent as MCP Client Use the following Github repo to set up Azure AI Foundry agent as MCP client. git clone https://github.com/mafzal786/ai-foundry-agent-with-remote-mcp-using-azure-functionapp.git Open the code in VS code and follow the instructions mentioned in README.md file at Github repo. Once you execute the code, following output will show up in VS code. Press enter or click to view image in full size In this code, message is hard coded. Change the content to “what is weather advisory for Florida” and rerun the program. It will call get_weatheralerts tool and output will look like as below. Press enter or click to view image in full size Conclusion The integration of Model Context Protocol (MCP) with Azure Functions marks a pivotal step in democratizing AI agent development. By leveraging Azure’s serverless architecture, developers can now build remote MCP servers that scale automatically, integrate seamlessly with other Azure services, and expose modular tools to intelligent agents like GitHub Copilot. This setup not only simplifies the deployment and management of MCP servers but also enhances the developer experience — allowing tools to be invoked contextually by AI agents in environments like VS Code, GitHub Codespaces, or Copilot Studio[2]. Whether you’re building a tool to query logs, calculate metrics, or manage data, Azure Functions provides the flexibility, security, and scalability needed to bring your AI-powered workflows to life. As the MCP spec continues to evolve, and GitHub Copilot expands its agentic capabilities, this architecture positions you to stay ahead — offering a robust foundation for cloud-native AI tooling that’s both powerful and future-proof.
muafzal
Oct 27, 2025 Place Microsoft Foundry Blog
1.2KViews
1like
1Comment
Interactive AI Avatars: Building Voice Agents with Azure Voice Live API
Azure Voice Live API recently reached General Availability, marking a significant milestone in conversational AI technology. This unified API surface doesn't just enable speech-to-speech capabilities for AI agents—it revolutionizes the entire experience by streaming interactions through lifelike avatars. Built on the powerful speech-to-speech capabilities of the GPT-4 Realtime model, Azure Voice Live API offers developers unprecedented flexibility: - Out-of-the-box or custom avatars from Azure AI Services - Wide range of neural voices, including Indic languages like the one featured in this demo - Single API interface that handles both audio processing and avatar streaming - Real-time responsiveness with sub-second latency In this post, I'll walk you through building a retail e-commerce voice agent that demonstrates this technology. While this implementation focuses on retail apparel, the architecture is entirely generic and can be adapted to any domain—healthcare, banking, education, or customer support—by simply changing the system prompt and implementing domain-specific tools integration. The Challenge: Navigating Uncharted Territory At the time of writing, documentation for implementing avatar features with Azure Voice Live API is minimal. The protocol-specific intricacies around avatar video streaming and the complex sequence of steps required to establish a live avatar connection were quite overwhelming. This is where Agent mode in GitHub Copilot in Visual Studio Code proved extremely useful. Through iterative conversations with the AI agent, I successfully discovered the approach to implement avatar streaming without getting lost in low-level protocol details. Here's how different AI models contributed to this solution: - Claude Sonnet 4.5: Rapidly architected the application structure, designing the hybrid WebSocket + WebRTC architecture with TypeScript/Vite frontend and FastAPI backend - GPT-5-Codex (Preview): Instrumental in implementing the complex avatar streaming components, handling WebRTC peer connections, and managing the bidirectional audio flow Architecture Overview: A Hybrid Approach The architecture comprises of these components 🐳 Container Application Architecture Vite Server: Node.js-based development server that serves the React application. In development, it provides hot module replacement and proxies API calls to `FastAPI`. In production, the React app is built into static files served by FastAPI. FastAPI with ASGI: Python web framework running on `uvicorn ASGI server`. ASGI (Asynchronous Server Gateway Interface) enables handling multiple concurrent connections efficiently, crucial for WebSocket connections and real-time audio processing. 🤖 AI & Voice Services Integration Azure Voice Live API: Primary service that manages the connection to GPT-4 Realtime Model, provides avatar video generation, neural text-to-speech, and WebSocket gateway functionality GPT-4 Realtime Model: Accessed through Azure Voice Live API for real-time audio processing, function calling, and intelligent conversation management 🔄 Communication Flows Audio Flow: Browser → WebSocket → FastAPI → WebSocket → Azure Voice Live API → GPT-4 Realtime Model Video Flow: Browser ↔ WebRTC Direct Connection ↔ Azure Voice Live API (bypasses backend for performance) Function Calls: GPT-4 Realtime (via Voice Live) → FastAPI Tools → Business APIs → Response → GPT-4 Realtime (via Voice Live) 🤖 Business process automation Workflows / RAG Shipment Logic App Agent: Analyzes orders, validates data, creates shipping labels, and updates tracking information Conversation Analysis Agent: Azure Logic App Reviews complete conversations, performs sentiment analysis, generates quality scores with justification, and stores insights for continuous improvement Knowledge Retrieval: Azure AI Search is used to reason over manuals and help respond to Customer queries on policies, products The solution implements a hybrid architecture that leverages both WebSocket proxying and direct WebRTC connections for optimal performance. This design ensures the conversational audio flow remains manageable and secure through the backend, while the bandwidth-intensive avatar video streams directly to the browser for optimal performance. The flow used in the Avatar communication: ``` Frontend FastAPI Backend Azure Voice Live API │ │ │ │ 1. Request Session │ │ │─────────────────────────►│ │ │ │ 2. Create Session │ │ │─────────────────────────►│ │ │ │ │ │ 3. Session Config │ │ │ (with avatar settings)│ │ │─────────────────────────►│ │ │ │ │ │ 4. session.updated │ │ │ (ICE servers) │ │ 5. ICE servers │◄─────────────────────────│ │◄─────────────────────────│ │ │ │ │ │ 6. Click "Start Avatar" │ │ │ │ │ │ 7. Create RTCPeerConn │ │ │ with ICE servers │ │ │ │ │ │ 8. Generate SDP Offer │ │ │ │ │ │ 9. POST /avatar-offer │ │ │─────────────────────────►│ │ │ │ 10. Encode & Send SDP │ │ │─────────────────────────►│ │ │ │ │ │ 11. session.avatar. │ │ │ connecting │ │ │ (SDP answer) │ │ 12. SDP Answer │◄─────────────────────────│ │◄─────────────────────────│ │ │ │ │ │ 13. setRemoteDescription │ │ │ │ │ │ 14. WebRTC Handshake │ │ │◄─────────────────────────┼─────────────────────────►│ │ (Direct Connection) │ │ │ │ │ │ 15. Video/Audio Stream │ │ │◄────────────────────────────────────────────────────│ │ (Bypasses Backend) │ │ ``` For more technical details, refer to the technical details behind the implementation, refer to the GitHub Repo shared in this post. Here is a video of the demo of the application in action.
srikantan
Oct 21, 2025 Place Microsoft Foundry Blog
1KViews
3likes
0Comments
GPT-5 Model Family Now Powers Azure AI Foundry Agent Service
The GPT-5 model family is now available in Azure AI Foundry Agent Service, which is generally available for enterprise customers. This means developers and enterprises can move beyond “just models” to build production-ready AI agents with: GPT-5’s advanced reasoning, coding, and multimodal intelligence Enterprise-grade trust, governance, and AgentOps built in Open standards and multi-agent orchestration for real-world workflows From insurance claims to supply chain optimization, Foundry enterprise agents are ready to power mission-critical AI at scale.
pavanli
Oct 09, 2025 Place Microsoft Foundry Blog
1.4KViews
0likes
0Comments
The Future of AI: Exploring Multi-Agent AI Systems
Image generated by Microsoft Designer
Marco_Casalaina
Oct 03, 2025 Place Microsoft Foundry Blog
25KViews
4likes
2Comments
The Future of AI: Unlocking the Power of Azure AI with the Book of AI
Image generated by Microsoft Designer
robchambers
Oct 03, 2025 Place Microsoft Foundry Blog
16KViews
10likes
1Comment
The Future of AI: The paradigm shifts in Generative AI Operations
Dive into the transformative world of Generative AI Operations (GenAIOps) with Microsoft Azure. Discover how businesses are overcoming the challenges of deploying and scaling generative AI applications. Learn about the innovative tools and services Azure AI offers, and how they empower developers to create high-quality, scalable AI solutions. Explore the paradigm shift from MLOps to GenAIOps and see how continuous improvement practices ensure your AI applications remain cutting-edge. Join us on this journey to harness the full potential of generative AI and drive operational excellence.
Yina Arenas
Oct 03, 2025 Place Microsoft Foundry Blog
7.4KViews
1like
1Comment