azure openai service

286 Topics

Weird problem when comparing the answers from chat playground and answer from api
I'm running into a weird issue with Azure AI Foundry (gpt-4o-mini) and need help. I'm building a chatbot that classifies each user message into: follow-up to previous message repeat of an earlier message brand-new query The classification logic works perfectly in the Azure AI Foundry Chat Playground. But when I use the exact same prompt in Python via: AzureChatOpenAI() (LangChain) or the official Azure OpenAI code from "View Code" (client.chat.completions.create()) …I get totally different and often wrong results. I’ve already verified: same deployment name (gpt-4o-mini) same temperature / top_p / max_tokens same system and user messages even tried copy-pasting the full system prompt from the Playground But the API version still behaves very differently. It feels like Azure AI Foundry’s Chat Playground is using some kind of hidden system prompt, invisible scaffolding, or extra formatting that is NOT shown in the UI and NOT included in the “View Code” snippet. The Playground output is consistently more accurate than the raw API call. Question: Does the Chat Playground apply hidden instructions or pre-processing that we can’t see? And is there any way to: view those hidden prompts, or replicate Playground behavior exactly through the API or LangChain? If anyone has run into this or knows how to get identical behavior outside the Playground, I’d really appreciate the help.
Rakanid
Jan 17, 2026 Place Azure
107Views
0likes
1Comment
Enabling SharePoint RAG with LogicApps Workflows
SharePoint Online is quite popular for storing organizational documents. Many organizations use it due to its robust features for document management, collaboration, and integration with other Microsoft 365 services. SharePoint Online provides a secure, centralized location for storing documents, making it easier for everyone from organization to access and collaborate on files from the device of their choice. Retrieve-Augment-Generate (RAG) is a process used to infuse the large language model with organizational knowledge without explicitly fine tuning it which is a laborious process. RAG enhances the capabilities of language models by integrating them with external data sources, such as SharePoint documents. In this approach, documents stored in SharePoint are first converted into smaller text chunks and vector embeddings of the chunks, then saved into index store such as Azure AI Search. Embeddings are numerical representations capturing the semantic properties of the text. When a user submits a text query, the system retrieves relevant document chunks from the index based on best matching text and text embeddings. These retrieved document chunks are then used to augment the query, providing additional context and information to the large language model. Finally, the augmented query is processed by the language model to generate a more accurate and contextually relevant response. Azure AI Search provides a built-in connector for SharePoint Online, enabling document ingestion via a pull approach, currently in public preview. This blog post outlines a LogicApps workflow-based method to export documents, along with associated ACLs and metadata, from SharePoint to Azure Storage. Once in Azure Storage, these documents can be indexed using the Azure AI Search indexer. At a high level, two workflow groups (historic and ongoing) are created, but only one should be active at a time. The historic flow manages the export of all documents from SharePoint Online to initially populate the Azure AI Search index from Azure Storage where documents are exported to. This flow processes documents from a specified start date to the current date, incrementally considering documents created within a configurable time window before moving to the next time slice. The sliding time window approach ensures compliance with SharePoint throttling limits by preventing the export of all documents at once. This method enables a gradual and controlled document export process by targeting documents created in a specific time window. Once the historical document export is complete, the ongoing export workflow should be activated (historic flow should be deactivated). This workflow exports documents from the timestamp when the historical export concluded up to the current date and time. The ongoing export workflow also accounts for documents created or modified since the last load and handles scenarios where documents are renamed at the source. Both workflows save the last exported timestamp in Azure Storage and use it as a starting point for every run. Historic document export flow Parent flow Recurs at every N hours. This is a configurable value. Usually export of historic documents requires many runs depending upon the total count of documents which could range from thousands to millions. Sets initial values for the sliding window variables - from_date_time_UTC, to_date_time_UTC from_date_time_UTC is read from the blob-history.txt file The to_date_time_UTC is set to from_date_time_UTC plus the increment days. If this increment results in a date greater than the current datetime, to_date_time_UTC is set to the current datetime Get the list of all SharePoint lists and Libraries using the built-in action Initialize the additional variables - files_to_process, files_to_process_temp, files_to_process_chunks Later, these variables facilitate the grouping of documents into smaller lists, with each group being passed to the child flow to enable scaling with parallel execution Loop through list of SharePoint Document libraries and lists Focus only on Document library, ignore SharePoint list (Handle SharePoint list processing only if your specific use case requires it) Get the files within the document library and file properties where file creation timestamp falls between from_date_time_UTC and to_date_time_UTC Created JSON to capture the Document library name and id (this will be required in the child flow to export a document) Use Javascript to only retain the documents and ignore folders. The files and their properties also have folders as a separate item which we do not require. Append the list of files to the variable Use the built-in chunk function to create list of lists, each containing the document as an item Invoke child workflow and pass each sub-list of files Wait for all child flows to finish successfully and then write the to_date_time_UTC to the blob-history.txt file Child flow Loop through each item which is document metadata received from the parent flow Get the content of file and save into Azure Storage Run SharePoint /roleassignments API to get the ACL (Access Control List) information, basically the users and groups that have access to the document Run Javascript to keep roles of interest Save the filtered ACL into Azure Storage Save the document metadata which is document title, created / modified timestamps, creator, etc. into Azure Storage All the information is saved into Azure Storage which offers flexibility to leverage the parts based on use case requirements All document metadata is also saved into an Azure SQL Database table for the purpose of determining if the file being processed was modified (exists in the database table) or renamed (file names do not match) Return Status 200 indicating the child flow has successfully completed Ongoing data export flow Parent flow The ongoing parent flow is very similar to the historic flow, it’s just that Get the files within the document library action gets the files that have creation timestamp or modified timestamp between from_date_time_UTC and to_date_time_UTC. This change allows to handle files that get created or modified in SharePoint after last run of the ongoing workflow. Note: Remember, you need to disable the historic flow after all history load has been completed. The ongoing flow can be enabled after the historic flow is disabled. Child flow The ongoing child flow also follows similar pattern of the historic child flow. Notable differences are – Handling of document rename at source which deletes the previously exported file / metadata / ACL from Azure Storage and recreates these artefacts with new file name. Return Status 200 indicating the child flow has successfully completed Both flows have been divided into parent-child flows, enabling the export process to scale by running multiple document exports simultaneously. To manage or scale this process, adjust the concurrency settings within LogicApps actions and the App scale-out settings under the LogicApps service. These adjustments help ensure compliance with SharePoint throttling limits. The presented solution works with single site out of the box and can be updated to work with a list of sites. Workflow parameters Parameter Name Type Example Value sharepoint_site_address String https://XXXXX.sharepoint.com/teams/test-sp-site blob_container_name String sharepoint-export blob_container_name_acl String sharepoint-acl blob_container_name_metadata String sharepoint-metadata blob_load_history_container_name String load-history blob_load_history_file_name String blob-history.txt file_group_count Int 40 increment_by_days int 7 The workflows can be imported into from GitHub repository below. Github repo: SharePoint-to-Azure-Storage-for-AI-Search LogicApps workflows
MaheshMSFT
Jan 13, 2026 Place Microsoft Foundry Blog
5.4KViews
0likes
7Comments
Foundry Agent Service at Ignite 2025: Simple to Build. Powerful to Deploy. Trusted to Operate.
The upgraded Foundry Agent Service delivers a unified, simplified platform with managed hosting, built-in memory, tool catalogs, and seamless integration with Microsoft Agent Framework. Developers can now deploy agents faster and more securely, leveraging one-click publishing to Microsoft 365 and advanced governance features for streamlined enterprise AI operations.
Yina Arenas
Nov 20, 2025 Place Microsoft Foundry Blog
9.1KViews
3likes
1Comment
GPT‑5.1 in Foundry: A Workhorse for Reasoning, Coding, and Chat
The pace of AI innovation is accelerating, and developers—across startups and global enterprises—are at the heart of this transformation. Today marks a significant moment for enterprise AI innovation: Azure AI Foundry is unveiling OpenAI’s GPT-5.1 series, the next generation of reasoning, analytics, and conversational intelligence. The following models will be rolling out in Foundry today: GPT-5.1: adaptive, more efficient reasoning GPT-5.1-chat: chat with new chain-of-thought for end-users GPT-5.1-codex: optimized for long-running conversations with enhanced tools and agentic workflows GPT-5.1-codex-mini: a compact variant for resource-constrained environments What’s new with GPT-5.1 series The GPT-5.1 series is built to respond faster to users in a variety of situations with adaptive reasoning, improving latency and cost efficiency across the series by varying thinking time more significantly. This, combined with other tooling improvements, enhanced stepwise reasoning visibility, multimodal intelligence, and enterprise-grade compliance. GPT-5.1: Adaptive and Efficient Reasoning GPT-5.1 is the mainline model engineered to deliver adaptive, stepwise reasoning that adjusts its approach based on the complexity of each task. Core capabilities included: Adaptive reasoning for nuanced, context-aware thinking time Multimodal intelligence: supporting text, image, and audio inputs/outputs Enterprise-grade performance, security, and compliance This model’s flexibility empowers developers to tackle a wide spectrum of tasks—from simple queries to deep, multi-step workflows for enterprise-grade solutions. With its ability to intelligently balance speed, cost, and intelligence, GPT-5.1 sets a new standard for both performance and efficiency in AI-powered development. GPT-5.1-chat: Elevating Interactive Experiences with Smart, Safe Conversations GPT-5.1-chat powers fast, context-aware chat experiences with adaptive reasoning and robust safety guardrails. With chain-of-thought added in the chat for the first time, it brings an interactive experience to the next level. It’s tuned for safety and instruction-following, making it ideal for customer support, IT helpdesk, HR, and sales enablement. Multimodal chat (text, image, and audio) improves long-turn consistency for real problem solving, delivering brand-aligned, safe conversations, and supporting next-best-action recommendations. GPT-5.1-codex and GPT-5.1-codex-mini: Frontier Models for Agentic Coding GPT-5.1-codex builds on the foundation set by GPT-5-codex, advancing developer tooling with: Enhanced reasoning frameworks for stepwise, context-aware code analysis and generation; plus Enhanced tool handling for certain development scenario's Multimodal intelligence for richer developer experiences when coding With Foundry’s enterprise-grade security and governance, GPT-5.1-codex is ideal for automated code generation and review, accelerating development cycles with intelligent code suggestions, refactoring, and bug detection. GPT-5.1-codex-mini is a compact, efficient variant optimized for resource-constrained environments. It maintains near state-of-the-art performance, multimodal intelligence, and the same safety stack and tool access as GPT-5.1-codex, making it best for cost-effective, scalable solutions in education, startups, and cost-conscience settings. Together, these Codex models empower teams to innovate faster and with greater confidence. Selecting Your AI Engine: Match Model Strengths to Your Business Goals One of the advantages of the GPT-5.1 series is unified access to deep reasoning, adaptive chat, and advanced coding—all in one place. Here’s how to match model strengths to your needs: Opt for GPT-5.1 for general ai application use—tasks like analytics, research, legal/financial review, or consolidating large documents and codebases. It’s the model of choice for reliability and high-impact outputs. Go with GPT-5.1-chat for interactive assistants and product UX, especially when adaptive reasoning is required for complex cases. Reasoning hints and adaptive reasoning help with customer latency perception. Leverage GPT-5.1-codex for deep, stepwise reasoning in complex code generation, refactoring, or multi-step analysis—ideal for demanding agentic workflows and enterprise automation. Utilize GPT-5.1-codex-mini for efficient, cost-effective coding intelligence in broad-scale deployment, education, or resource-constrained environments—delivering near-mainline performance in a compact model. Deployment and Pricing Model Deployment Available Regions Pricing ($/million tokens) Input Cached Input Output GPT-5.1 Standard Global Global $1.25 $0.125 $10.00 Standard Data Zone Data Zone (US & EU) $1.38 $0.14 $11.00 GPT-5.1-chat Standard Global Global $1.25 $0.125 $10.00 GPT-5.1-codex Standard Global Global $1.25 $0.125 $10.00 GPT-5.1-codex-mini Standard Global Global $0.25 $0.025 $2.00 Start Building Today The GPT-5.1 series is now available in Foundry Models. Whether you’re building for enterprise, small and medium-sized business, or launching the next digital-native app, these models and the Foundry platform are designed to help you innovate faster, safer, and at scale.
Naomi Moneypenny
Nov 18, 2025 Place Microsoft Foundry Blog
28KViews
2likes
22Comments
Observability for Multi-Agent Systems with Microsoft Agent Framework and Azure AI Foundry
Agentic applications are revolutionizing enterprise automation, but their dynamic toolchains and latent reasoning make them notoriously hard to operate. In this post, you'll learn how to instrument a Microsoft Agent Framework–based service with OpenTelemetry, ship traces to Azure AI Foundry observability, and adopt a practical workflow to debug, evaluate, and improve multi-agent behavior in production. We'll show how to wire spans around reasoning steps and tool calls (OpenAPI / MCP), enabling deep visibility into your agentic workflows. Who Should Read This? Developers building agents with Microsoft Agent Framework (MAF) in .NET or Python Architects/SREs seeking enterprise-grade visibility, governance, and reliability for deployments on Azure AI Foundry Why Observability Is Non-Negotiable for Agents Traditional logs fall short for agentic systems: Reasoning and routing (which tool? which doc?) are opaque without explicit spans/events Failures often occur between components (e.g., retrieval mismatch, tool schema drift) Without traces across agents ⇄ tools ⇄ data stores, you can't reproduce or evaluate behavior Microsoft has introduced multi-agent observability patterns and OpenTelemetry (OTel) conventions that unify traces across Agent Framework, Foundry, and popular stacks—so you can see one coherent timeline for each task. Reference Architecture Key Capabilities Agent orchestration & deployment via Microsoft Agent Framework Model access using Foundry’s OpenAI-compatible endpoint OpenTelemetry for traces/spans + attributes (agent, tool, retrieval, latency, tokens) Step-by-Step Implementation Assumption: This article uses Azure Monitor (via Application Insights) as the OpenTelemetry exporter, but you can configure other supported exporters in the same way. Prerequisites .NET 8 SDK or later Azure OpenAI service (endpoint, API key, deployed model) Application Insights and Grafana Create an Agent with OpenTelemetry (ASP.NET Core or Console App) Install required packages: dotnet add package Azure.AI.OpenAI dotnet add package Azure.Monitor.OpenTelemetry.Exporter dotnet add package Microsoft.Agents.AI.OpenAI dotnet add package Microsoft.Extensions.Logging dotnet add package OpenTelemetry dotnet add package OpenTelemetry.Trace dotnet add package OpenTelemetry.Metrics dotnet add package OpenTelemetry.Extensions.Hosting dotnet add package OpenTelemetry.Instrumentation.Http Setup environment variables: AZURE_OPENAI_ENDPOINT: https://<your_service_name>.openai.azure.com/ AZURE_OPENAI_API_KEY: <your_azure_openai_apikey> APPLICATIONINSIGHTS_CONNECTION_STRING: <your_application_insights_connectionstring_for_azuremonitor_exporter> Configure tracing once at startup: var applicationInsightsConnectionString = Environment.GetEnvironmentVariable("APPLICATIONINSIGHTS_CONNECTION_STRING"); // Create a resource describing the service var resource = ResourceBuilder.CreateDefault() .AddService(serviceName: ServiceName) .AddAttributes(new Dictionary<string, object> { ["deployment.environment"] = "development", ["service.instance.id"] = Environment.MachineName }) .Build(); // Setup OpenTelemetry TracerProvider var traceProvider = Sdk.CreateTracerProviderBuilder() .SetResourceBuilder(ResourceBuilder.CreateDefault().AddService(ServiceName)) .AddSource(SourceName) .AddSource("Microsoft.Agents.AI") .AddHttpClientInstrumentation() .AddAzureMonitorTraceExporter(options => { options.ConnectionString = applicationInsightsConnectionString; }) .Build(); // Setup OpenTelemetry MeterProvider var meterProvider = Sdk.CreateMeterProviderBuilder() .SetResourceBuilder(ResourceBuilder.CreateDefault().AddService(ServiceName)) .AddMeter(SourceName) .AddAzureMonitorMetricExporter(options => { options.ConnectionString = applicationInsightsConnectionString; }) .Build(); // Configure DI and OpenTelemetry var serviceCollection = new ServiceCollection(); // Setup Logging with OpenTelemetry and Application Insights serviceCollection.AddLogging(loggingBuilder => { loggingBuilder.SetMinimumLevel(LogLevel.Debug); loggingBuilder.AddOpenTelemetry(options => { options.SetResourceBuilder(ResourceBuilder.CreateDefault().AddService(ServiceName)); options.IncludeScopes = true; options.IncludeFormattedMessage = true; options.AddAzureMonitorLogExporter(exporterOptions => { exporterOptions.ConnectionString = applicationInsightsConnectionString; }); }); loggingBuilder.AddApplicationInsights( configureTelemetryConfiguration: (config) => { config.ConnectionString = Environment.GetEnvironmentVariable("APPLICATIONINSIGHTS_CONNECTION_STRING"); }, configureApplicationInsightsLoggerOptions: options => { options.TrackExceptionsAsExceptionTelemetry = true; options.IncludeScopes = true; }); }); Configure custom metrics and activity source for tracing: using var activitySource = new ActivitySource(SourceName); using var meter = new Meter(SourceName); // Create custom metrics var interactionCounter = meter.CreateCounter<long>("chat_interactions_total", description: "Total number of chat interactions"); var responseTimeHistogram = meter.CreateHistogram<double>("chat_response_time_ms", description: "Chat response time in milliseconds"); 2. Wire-up the AI Agent: // Create OpenAI client var endpoint = Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT"); var apiKey = Environment.GetEnvironmentVariable("AZURE_OPENAI_API_KEY"); var deploymentName = "gpt-4o-mini"; using var client = new AzureOpenAIClient(new Uri(endpoint), new AzureKeyCredential(apiKey)) .GetChatClient(deploymentName) .AsIChatClient() .AsBuilder() .UseOpenTelemetry(sourceName: SourceName, configure: (cfg) => cfg.EnableSensitiveData = true) .Build(); logger.LogInformation("Creating Agent with OpenTelemetry instrumentation"); // Create AI Agent var agent = new ChatClientAgent( client, name: "AgentObservabilityDemo", instructions: "You are a helpful assistant that provides concise and informative responses.") .AsBuilder() .UseOpenTelemetry(SourceName, configure: (cfg) => cfg.EnableSensitiveData = true) .Build(); var thread = agent.GetNewThread(); logger.LogInformation("Agent created successfully with ID: {AgentId}", agent.Id); 3. Instrument Agent logic with semantic attributes and call OpenAI-compatible API: // Create a parent span for the entire agent session using var sessionActivity = activitySource.StartActivity("Agent Session"); Console.WriteLine($"Trace ID: {sessionActivity?.TraceId} "); var sessionId = Guid.NewGuid().ToString("N"); sessionActivity? .SetTag("agent.name", "AgentObservabilityDemo") .SetTag("session.id", sessionId) .SetTag("session.start_time", DateTimeOffset.UtcNow.ToString("O")); logger.LogInformation("Starting agent session with ID: {SessionId}", sessionId); using (logger.BeginScope(new Dictionary<string, object> { ["SessionId"] = sessionId, ["AgentName"] = "AgentObservabilityDemo" })) { var interactionCount = 0; while (true) { Console.Write("You (or 'exit' to quit): "); var input = Console.ReadLine(); if (string.IsNullOrWhiteSpace(input) || input.Equals("exit", StringComparison.OrdinalIgnoreCase)) { logger.LogInformation("User requested to exit the session"); break; } interactionCount++; logger.LogInformation("Processing interaction #{InteractionCount}", interactionCount); // Create a child span for each individual interaction using var activity = activitySource.StartActivity("Agent Interaction"); activity? .SetTag("user.input", input) .SetTag("agent.name", "AgentObservabilityDemo") .SetTag("interaction.number", interactionCount); var stopwatch = Stopwatch.StartNew(); try { logger.LogInformation("Starting agent execution for interaction #{InteractionCount}", interactionCount); var response = await agent.RunAsync(input); Console.WriteLine($"Agent: {response}"); Console.WriteLine(); stopwatch.Stop(); var responseTimeMs = stopwatch.Elapsed.TotalMilliseconds; // Record metrics interactionCounter.Add(1, new KeyValuePair<string, object?>("status", "success")); responseTimeHistogram.Record(responseTimeMs, new KeyValuePair<string, object?>("status", "success")); activity?.SetTag("interaction.status", "success"); logger.LogInformation("Agent interaction #{InteractionNumber} completed successfully in {ResponseTime:F2} seconds", interactionCount, responseTimeMs); } catch (Exception ex) { Console.WriteLine($"Error: {ex.Message}"); Console.WriteLine(); stopwatch.Stop(); var responseTimeMs = stopwatch.Elapsed.TotalSeconds; // Record error metrics interactionCounter.Add(1, new KeyValuePair<string, object?>("status", "error")); responseTimeHistogram.Record(responseTimeMs, new KeyValuePair<string, object?>("status", "error")); activity? .SetTag("response.success", false) .SetTag("error.message", ex.Message) .SetStatus(ActivityStatusCode.Error, ex.Message); logger.LogError(ex, "Agent interaction #{InteractionNumber} failed after {ResponseTime:F2} seconds: {ErrorMessage}", interactionCount, responseTimeMs, ex.Message); } } // Add session summary to the parent span sessionActivity? .SetTag("session.total_interactions", interactionCount) .SetTag("session.end_time", DateTimeOffset.UtcNow.ToString("O")); logger.LogInformation("Agent session completed. Total interactions: {TotalInteractions}", interactionCount); Azure Monitor dashboard Once you run the agent and generate some traffic, your dashboard in Azure Monitor will be populated as shown below: You can drill down to specific service / activity source / spans by applying relevant filters: Key Features Demonstrated OpenTelemetry instrumentation with Microsoft Agent framework Custom metrics for user interactions End-to-end Telemetry correlation Real time telemetry visualization along with metrics and logging interactions Further reading Introducing Microsoft Agent Framework Azure AI Foundry docs OpenTelemetry Aspire Demo with Azure OpenAI
Shah_Viral
Nov 13, 2025 Place Microsoft Foundry Blog
1.6KViews
3likes
0Comments
Real-Time Speech Intelligence for Global Scale: gpt-4o-transcribe-diarize in Azure AI Foundry
Voice is a natural interface for communication. Now, with the general availability of gpt-4o-transcribe-diarize, the new automatic speech recognition (ASR) model in Azure AI Foundry, transforming speech into actionable text is faster, smarter, and more accurate than ever. This launch marks a significant milestone in our mission to empower organizations with AI that delivers speed, accuracy, and enterprise-grade reliability. With gpt-4o-transcribe-diarize seamlessly integrated, businesses can unlock critical insights from conversations, instantly converting audio into text with ultra-low latency and outstanding accuracy across 100+ languages. Whether you're enhancing live event accessibility, analyzing customer interactions, or enabling intelligent voice-driven applications, gpt-4o-transcribe-diarize helps capture spoken word and leverages it for real-time decision-making. Experience how Azure AI’s innovation in speech technology is helping to redefine productivity and global reach, setting a new standard for audio intelligence in the enterprise landscape. Why gpt-4o-transcribe-diarize Matters Businesses today operate in a world where conversations drive decisions. From customer support calls to virtual meetings, audio data holds critical insights. Gpt-4o-transcribe-diarize unlocks these insights, converting speech to text with ultra-low latency and high accuracy across 100+ languages. Whether you’re captioning live events, analyzing call center interactions, or building voice-driven applications, gpt-4o-transcribe-diarize offers the opportunity to help your workflows be powered by real-time intelligence. Key Features Lightning-Fast Transcription: Convert 10 minutes of audio in ~15 seconds with our new Fast Transcription API. Global Language Coverage: Support for 100+ languages and dialects for inclusive, global experiences. Seamless Integration: Available in Azure AI Foundry with managed endpoints for easy deployment and scale. Real-World Impact Imagine a reporter summarizing interviews in real time, a financial institution transcribing calls instantly, or a global retailer powering multilingual voice assistants; all with the speed and security of Azure AI Foundry. gpt-4o-transcribe-diarize can make these scenarios possible today. Pricing and regional availability for gpt-4o-transcribe-diarize Model Deployment Regions Price $/1m tokens gpt-4o-transcribe-diarize Global Standard (Paygo) East US 2, Sweden Central Text input: $2.50 Audio input: $6.00 Output: $10.00 gpt-4o-transcribe-diarize in audio AI innovation context gpt-4o-transcribe-diarize is part of a broader wave of audio AI innovation on Azure, joining new models like OpenAI gpt-realtime and gpt-audio that are purpose-built for expressive, low-latency voice experiences. While gpt-4o-transcribe-diarize delivers ultra-fast transcription with enterprise-grade accuracy, gpt-realtime enables natural, emotionally rich voice interactions with millisecond responsiveness—ideal for live conversations, voice agents, and multimodal applications. Meanwhile, audio models like gpt-4o-transcribe mini, and mini-tts extend the platform’s capabilities with customizable speech synthesis and real-time captioning, making Azure AI a comprehensive solution for building intelligent, production-ready voice systems. gpt-realtime Features OpenAI claims the gpt-realtime model introduces a new standard for voice-first applications, combining expressive audio generation with ultra-low latency and natural conversational flow. It’s designed to power real-time interactions that feel like natural, responsive speech. Key Features: Millisecond Latency: Enables live responsiveness suitable for real-time conversations, kiosks, and voice agents. Emotionally Expressive Voices: Supports nuanced speech delivery with voices like Marin and Cedar, capable of conveying tone, emotion, and intent. Natural Turn-Taking: Built-in mechanisms for detecting pauses and transitions, allowing fluid back-and-forth dialogue. Function Calling Support: Seamlessly integrates with backend systems to trigger actions based on voice input. Multimodal Readiness: Designed to work with text, audio, and visual inputs for rich, interactive experiences. Stable APIs for Production: Enterprise-grade reliability with consistent behavior across sessions and deployments. These features make gpt-realtime a foundational model for building intelligent voice interfaces that go beyond transcription—delivering conversational intelligence in real time. gpt-realtime Use Cases With its expressive audio capabilities and real-time responsiveness, gpt-realtime unlocks new possibilities across industries. Whether enhancing customer engagement or streamlining operations, it brings voice AI into the heart of enterprise workflows. Examples include: Customer Service Agents: Power virtual agents that respond instantly with natural, tones for rich expressiveness, improving customer satisfaction and reducing wait times. Retail Kiosks & Smart Devices: Enable voice-driven product discovery, troubleshooting, and checkout experiences with real-time feedback. Multilingual Voice Assistants: Deliver localized, expressive voice experiences across global markets with support for multiple languages and dialects. Live Captioning & Accessibility: Combine gpt-4o-transcribe-diarize gpt-realtime to provide real-time captions and voice synthesis for inclusive experiences. These use cases demonstrate how gpt-realtime transforms voice into a strategic interface—bridging human communication and intelligent systems with speed and accuracy. Ready to transform voice into value? Learn more and start building with gpt-4o-transcribe-diarize
Naomi Moneypenny
Nov 05, 2025 Place Microsoft Foundry Blog
5KViews
0likes
1Comment
Implementing MCP Remote Servers with Azure Function App and GitHub Copilot Integration
Introduction In the evolving landscape of AI-driven applications, the ability to seamlessly connect large language models (LLMs) with external tools and data sources is becoming a cornerstone of intelligent system design. Model Context Protocol (MCP) — a specification that enables AI agents to discover and invoke tools dynamically, based on context. While MCP is powerful, implementing it from scratch can be daunting !!! That’s where Azure Functions comes in handy. With its event-driven, serverless architecture, Azure Functions now supports a preview extension for MCP, allowing developers to build remote MCP servers that are scalable, secure, and cloud-native. Further, In VS Code, GitHub Copilot Chat in Agent Mode can connect to your deployed Azure Function App acting as an MCP server. This connection allows Copilot to leverage the tools and services exposed by your function app. Why Use Azure Functions for MCP? Serverless Simplicity: Deploy MCP endpoints without managing infrastructure. Secure by Design: Leverage HTTPS, system keys, and OAuth via EasyAuth or API Management. Language Flexibility: Build in .NET, Python, or Node.js using QuickStart templates. AI Integration: Enable GitHub Copilot, VS Code, or other AI agents to invoke your tools via SSE endpoints. Prerequisites Python version 3.11 or higher Azure Functions Core Tools >= 4.0.7030 Azure Developer CLI To use Visual Studio Code to run and debug locally: Visual Studio Code Azure Functions extension An storage emulator is needed when developing azure function app in VScode. you can deploy Azurite extension in VScode to meet this requirement. Press enter or click to view image in full size You can run the Azurite in VS Code as shown below. C:\Program Files\Microsoft Visual Studio\2022\Enterprise\Common7\IDE\Extensions\Microsoft\Azure Storage Emulator> .\azurite.exe Press enter or click to view image in full size alternatively, you can also run Azurite in docker container as shown below. docker run -p 10000:10000 -p 10001:10001 -p 10002:10002 \ mcr.microsoft.com/azure-storage/azurite For more information about setting up Azurite, visit Use Azurite emulator for local Azure Storage development | Microsoft Learn Github Repositories Following Github repos are needed to setup this PoC. Repository for MCP server using Azure Function App https://github.com/mafzal786/mcp-azure-functions-python.git Repository for AI Foundry agent as MCP Client https://github.com/mafzal786/ai-foundry-agent-with-remote-mcp-using-azure-functionapp.git Clone the repository Run the following command to clone the repository to start building your MCP server using Azure function app. git clone https://github.com/mafzal786/mcp-azure-functions-python.git Run the MCP server in VS Code Once cloned. Open the folder in VS Code. Create a virtual environment in VS Code. Change directory to “src” in a new terminal window, install the python dependencies and start the function host locally as shown below. cd src pip install -r requirements.txt func start Note: by default this will use the webhooks route: /runtime/webhooks/mcp/sse. Later we will use this in Azure to set the key on client/host calls: /runtime/webhooks/mcp/sse?code=<system_key> Press enter or click to view image in full size MCP Inspector In a new terminal window, install and run MCP Inspector. npx @modelcontextprotocol/inspector Click to load the MCP inspector. Also provide the generated proxy session token. http://127.0.0.1:6274/#resources In the URL type and click “Connect”: http://localhost:7071/runtime/webhooks/mcp/sse Once connected, click List Tools under Tools and select “hello_mcp” tool and click “Run Tool” for testing as shown below. Press enter or click to view image in full size Select another tool such as get_stockprice and run it as shown below. Press enter or click to view image in full size Deploy Function App to Azure from VS Code For deploying function app to azure from vs code, make sure you have Azure Tools extension enabled in VS Code. To learn more about Azure Tools extension, visit the following Azure Extensions if your VS code environment is not setup for Azure development, follow Configure Visual Studio Code for Azure development with .NET — .NET | Microsoft Learn Once Azure Tools are setup, sign in to Azure account with Azure Tools Press enter or click to view image in full size Once Sign-in is completed, you should be able to see all of your existing resources in the Resources view. These resources can be managed directly in VS Code. Look for Function App in Resource, right click and click “Deploy to Function App”. Press enter or click to view image in full size If you already have it deployed, you will get the following pop-up. Click “Deploy” Press enter or click to view image in full size This will start deploying your function app to Azure. In VS Code, Azure tab will display the following. Press enter or click to view image in full size Once the deployment is completed, you can view the function app and all the tools in Azure portal under function app as shown below. Press enter or click to view image in full size Get the mcp_extension key from Functions → App Keys in Function App. Press enter or click to view image in full size This mcp_extension key would be needed in mcp.json file in VS code, if you would like to test the MCP server using Github Copilot in VS Code. Your entries in mcp.json file will look like as below for example. { "inputs": [ { "type": "promptString", "id": "functions-mcp-extension-system-key", "description": "Azure Functions MCP Extension System Key", "password": true }, { "type": "promptString", "id": "functionapp-name", "description": "Azure Functions App Name" } ], "servers": { "remote-mcp-function": { "type": "sse", "url": "https://${input:functionapp-name}.azurewebsites.net/runtime/webhooks/mcp/sse", "headers": { "x-functions-key": "${input:functions-mcp-extension-system-key}" } }, "local-mcp-function": { "type": "sse", "url": "http://0.0.0.0:7071/runtime/webhooks/mcp/sse" } } } Test Azure Function MCP Server in MCP Inspector Launch MCP Inspector and provide the Azure Function in MCP inspector URL. Provide authentication as shown below. Bearer token is mcp_extension key. Testing an MCP server with GitHub Copilot Testing an MCP server with GitHub Copilot involves configuring and utilizing the server within your development environment to provide enhanced context and capabilities to Copilot Chat. Steps to Test an MCP Server with GitHub Copilot: Ensure Agent Mode is Enabled: Open Copilot Chat in Visual Studio Code and select “Agent” mode. This mode allows Copilot to interact with external tools and services, including MCP servers. Add the MCP Server: Open the Command Palette (Ctrl+Shift+P or Cmd+Shift+P) and run the command MCP: Add Server. Press enter or click to view image in full size Follow the prompts to configure the server. You can choose to add it to your workspace settings (creating a .vscode/mcp.json file) . Select HTTP or Server-Sent events Press enter or click to view image in full size Specify the URL and click Enter Press enter or click to view image in full size Provide a name of your choice Press enter or click to view image in full size Select scope as Global or workspace. I selected Workspace Press enter or click to view image in full size This will generate mcp.json file in .vscode or create a new entry if mcp.json already exists as shown below. Click Start to “start” the server. Also make sure your Azure function app is locally running with func start command. Press enter or click to view image in full size Now Type the prompt as shown below. Press enter or click to view image in full size Try another tool as below. Press enter or click to view image in full size VS code terminal output for reference. Press enter or click to view image in full size Testing an MCP server with Claude Desktop Claude Desktop is a standalone AI application that allows users to interact with Claude AI models directly from their desktop, providing a seamless and efficient experience. you can download Claude desktop at Download Claude In this article, I have added another tool to utilize to test your MCP server running in Azure Function app. Modify claude_desktop_config.json with the following. you can find this file in window environment at C:\Users\<username>\AppData\Roaming\Claude { "mcpServers": { "my mcp": { "command": "npx", "args": [ "mcp-remote", "http://localhost:7071/runtime/webhooks/mcp/sse" ] } } } Note: If claude_desktop_config.json does not exists, click on setting in Claude desktop under user and visit developer tab. You will see you MCP server in Claude Desktop as shown below. Press enter or click to view image in full size Type the prompt such as “What is the stock price of Tesla” . After submitting, you will notice that it is invoking the tool “get_stockprice” from the MCP server running locally and configured in the .json earlier. Click Allow once or Allow always as shown below. Following output will be displayed. Press enter or click to view image in full size Now lets try weather related prompt. As you can see, it has invoked “get_weatheralerts” tool from MCP server. Press enter or click to view image in full size Azure AI Foundry agent as MCP Client Use the following Github repo to set up Azure AI Foundry agent as MCP client. git clone https://github.com/mafzal786/ai-foundry-agent-with-remote-mcp-using-azure-functionapp.git Open the code in VS code and follow the instructions mentioned in README.md file at Github repo. Once you execute the code, following output will show up in VS code. Press enter or click to view image in full size In this code, message is hard coded. Change the content to “what is weather advisory for Florida” and rerun the program. It will call get_weatheralerts tool and output will look like as below. Press enter or click to view image in full size Conclusion The integration of Model Context Protocol (MCP) with Azure Functions marks a pivotal step in democratizing AI agent development. By leveraging Azure’s serverless architecture, developers can now build remote MCP servers that scale automatically, integrate seamlessly with other Azure services, and expose modular tools to intelligent agents like GitHub Copilot. This setup not only simplifies the deployment and management of MCP servers but also enhances the developer experience — allowing tools to be invoked contextually by AI agents in environments like VS Code, GitHub Codespaces, or Copilot Studio[2]. Whether you’re building a tool to query logs, calculate metrics, or manage data, Azure Functions provides the flexibility, security, and scalability needed to bring your AI-powered workflows to life. As the MCP spec continues to evolve, and GitHub Copilot expands its agentic capabilities, this architecture positions you to stay ahead — offering a robust foundation for cloud-native AI tooling that’s both powerful and future-proof.
muafzal
Oct 27, 2025 Place Microsoft Foundry Blog
1.5KViews
1like
1Comment
Interactive AI Avatars: Building Voice Agents with Azure Voice Live API
Azure Voice Live API recently reached General Availability, marking a significant milestone in conversational AI technology. This unified API surface doesn't just enable speech-to-speech capabilities for AI agents—it revolutionizes the entire experience by streaming interactions through lifelike avatars. Built on the powerful speech-to-speech capabilities of the GPT-4 Realtime model, Azure Voice Live API offers developers unprecedented flexibility: - Out-of-the-box or custom avatars from Azure AI Services - Wide range of neural voices, including Indic languages like the one featured in this demo - Single API interface that handles both audio processing and avatar streaming - Real-time responsiveness with sub-second latency In this post, I'll walk you through building a retail e-commerce voice agent that demonstrates this technology. While this implementation focuses on retail apparel, the architecture is entirely generic and can be adapted to any domain—healthcare, banking, education, or customer support—by simply changing the system prompt and implementing domain-specific tools integration. The Challenge: Navigating Uncharted Territory At the time of writing, documentation for implementing avatar features with Azure Voice Live API is minimal. The protocol-specific intricacies around avatar video streaming and the complex sequence of steps required to establish a live avatar connection were quite overwhelming. This is where Agent mode in GitHub Copilot in Visual Studio Code proved extremely useful. Through iterative conversations with the AI agent, I successfully discovered the approach to implement avatar streaming without getting lost in low-level protocol details. Here's how different AI models contributed to this solution: - Claude Sonnet 4.5: Rapidly architected the application structure, designing the hybrid WebSocket + WebRTC architecture with TypeScript/Vite frontend and FastAPI backend - GPT-5-Codex (Preview): Instrumental in implementing the complex avatar streaming components, handling WebRTC peer connections, and managing the bidirectional audio flow Architecture Overview: A Hybrid Approach The architecture comprises of these components 🐳 Container Application Architecture Vite Server: Node.js-based development server that serves the React application. In development, it provides hot module replacement and proxies API calls to `FastAPI`. In production, the React app is built into static files served by FastAPI. FastAPI with ASGI: Python web framework running on `uvicorn ASGI server`. ASGI (Asynchronous Server Gateway Interface) enables handling multiple concurrent connections efficiently, crucial for WebSocket connections and real-time audio processing. 🤖 AI & Voice Services Integration Azure Voice Live API: Primary service that manages the connection to GPT-4 Realtime Model, provides avatar video generation, neural text-to-speech, and WebSocket gateway functionality GPT-4 Realtime Model: Accessed through Azure Voice Live API for real-time audio processing, function calling, and intelligent conversation management 🔄 Communication Flows Audio Flow: Browser → WebSocket → FastAPI → WebSocket → Azure Voice Live API → GPT-4 Realtime Model Video Flow: Browser ↔ WebRTC Direct Connection ↔ Azure Voice Live API (bypasses backend for performance) Function Calls: GPT-4 Realtime (via Voice Live) → FastAPI Tools → Business APIs → Response → GPT-4 Realtime (via Voice Live) 🤖 Business process automation Workflows / RAG Shipment Logic App Agent: Analyzes orders, validates data, creates shipping labels, and updates tracking information Conversation Analysis Agent: Azure Logic App Reviews complete conversations, performs sentiment analysis, generates quality scores with justification, and stores insights for continuous improvement Knowledge Retrieval: Azure AI Search is used to reason over manuals and help respond to Customer queries on policies, products The solution implements a hybrid architecture that leverages both WebSocket proxying and direct WebRTC connections for optimal performance. This design ensures the conversational audio flow remains manageable and secure through the backend, while the bandwidth-intensive avatar video streams directly to the browser for optimal performance. The flow used in the Avatar communication: ``` Frontend FastAPI Backend Azure Voice Live API │ │ │ │ 1. Request Session │ │ │─────────────────────────►│ │ │ │ 2. Create Session │ │ │─────────────────────────►│ │ │ │ │ │ 3. Session Config │ │ │ (with avatar settings)│ │ │─────────────────────────►│ │ │ │ │ │ 4. session.updated │ │ │ (ICE servers) │ │ 5. ICE servers │◄─────────────────────────│ │◄─────────────────────────│ │ │ │ │ │ 6. Click "Start Avatar" │ │ │ │ │ │ 7. Create RTCPeerConn │ │ │ with ICE servers │ │ │ │ │ │ 8. Generate SDP Offer │ │ │ │ │ │ 9. POST /avatar-offer │ │ │─────────────────────────►│ │ │ │ 10. Encode & Send SDP │ │ │─────────────────────────►│ │ │ │ │ │ 11. session.avatar. │ │ │ connecting │ │ │ (SDP answer) │ │ 12. SDP Answer │◄─────────────────────────│ │◄─────────────────────────│ │ │ │ │ │ 13. setRemoteDescription │ │ │ │ │ │ 14. WebRTC Handshake │ │ │◄─────────────────────────┼─────────────────────────►│ │ (Direct Connection) │ │ │ │ │ │ 15. Video/Audio Stream │ │ │◄────────────────────────────────────────────────────│ │ (Bypasses Backend) │ │ ``` For more technical details, refer to the technical details behind the implementation, refer to the GitHub Repo shared in this post. Here is a video of the demo of the application in action.
srikantan
Oct 21, 2025 Place Microsoft Foundry Blog
2.2KViews
3likes
0Comments
GPT-5 Model Family Now Powers Azure AI Foundry Agent Service
The GPT-5 model family is now available in Azure AI Foundry Agent Service, which is generally available for enterprise customers. This means developers and enterprises can move beyond “just models” to build production-ready AI agents with: GPT-5’s advanced reasoning, coding, and multimodal intelligence Enterprise-grade trust, governance, and AgentOps built in Open standards and multi-agent orchestration for real-world workflows From insurance claims to supply chain optimization, Foundry enterprise agents are ready to power mission-critical AI at scale.
pavanli
Oct 09, 2025 Place Microsoft Foundry Blog
1.8KViews
0likes
0Comments
The Future of AI: Exploring Multi-Agent AI Systems
Image generated by Microsoft Designer
Marco_Casalaina
Oct 03, 2025 Place Microsoft Foundry Blog
25KViews
4likes
2Comments