azure openai service

283 Topics

Real-Time Speech Intelligence for Global Scale: gpt-4o-transcribe-diarize in Azure AI Foundry
Voice is a natural interface for communication. Now, with the general availability of gpt-4o-transcribe-diarize, the new automatic speech recognition (ASR) model in Azure AI Foundry, transforming speech into actionable text is faster, smarter, and more accurate than ever. This launch marks a significant milestone in our mission to empower organizations with AI that delivers speed, accuracy, and enterprise-grade reliability. With gpt-4o-transcribe-diarize seamlessly integrated, businesses can unlock critical insights from conversations, instantly converting audio into text with ultra-low latency and outstanding accuracy across 100+ languages. Whether you're enhancing live event accessibility, analyzing customer interactions, or enabling intelligent voice-driven applications, gpt-4o-transcribe-diarize helps capture spoken word and leverages it for real-time decision-making. Experience how Azure AI’s innovation in speech technology is helping to redefine productivity and global reach, setting a new standard for audio intelligence in the enterprise landscape. Why gpt-4o-transcribe-diarize Matters Businesses today operate in a world where conversations drive decisions. From customer support calls to virtual meetings, audio data holds critical insights. Gpt-4o-transcribe-diarize unlocks these insights, converting speech to text with ultra-low latency and high accuracy across 100+ languages. Whether you’re captioning live events, analyzing call center interactions, or building voice-driven applications, gpt-4o-transcribe-diarize offers the opportunity to help your workflows be powered by real-time intelligence. Key Features Lightning-Fast Transcription: Convert 10 minutes of audio in ~15 seconds with our new Fast Transcription API. Global Language Coverage: Support for 100+ languages and dialects for inclusive, global experiences. Seamless Integration: Available in Azure AI Foundry with managed endpoints for easy deployment and scale. Real-World Impact Imagine a reporter summarizing interviews in real time, a financial institution transcribing calls instantly, or a global retailer powering multilingual voice assistants; all with the speed and security of Azure AI Foundry. gpt-4o-transcribe-diarize can make these scenarios possible today. Pricing and regional availability for gpt-4o-transcribe-diarize Model Deployment Regions Price $/1m tokens gpt-4o-transcribe-diarize Global Standard (Paygo) East US 2, Sweden Central Text input: $2.50 Audio input: $6.00 Output: $10.00 gpt-4o-transcribe-diarize in audio AI innovation context gpt-4o-transcribe-diarize is part of a broader wave of audio AI innovation on Azure, joining new models like OpenAI gpt-realtime and gpt-audio that are purpose-built for expressive, low-latency voice experiences. While gpt-4o-transcribe-diarize delivers ultra-fast transcription with enterprise-grade accuracy, gpt-realtime enables natural, emotionally rich voice interactions with millisecond responsiveness—ideal for live conversations, voice agents, and multimodal applications. Meanwhile, audio models like gpt-4o-transcribe mini, and mini-tts extend the platform’s capabilities with customizable speech synthesis and real-time captioning, making Azure AI a comprehensive solution for building intelligent, production-ready voice systems. gpt-realtime Features OpenAI claims the gpt-realtime model introduces a new standard for voice-first applications, combining expressive audio generation with ultra-low latency and natural conversational flow. It’s designed to power real-time interactions that feel like natural, responsive speech. Key Features: Millisecond Latency: Enables live responsiveness suitable for real-time conversations, kiosks, and voice agents. Emotionally Expressive Voices: Supports nuanced speech delivery with voices like Marin and Cedar, capable of conveying tone, emotion, and intent. Natural Turn-Taking: Built-in mechanisms for detecting pauses and transitions, allowing fluid back-and-forth dialogue. Function Calling Support: Seamlessly integrates with backend systems to trigger actions based on voice input. Multimodal Readiness: Designed to work with text, audio, and visual inputs for rich, interactive experiences. Stable APIs for Production: Enterprise-grade reliability with consistent behavior across sessions and deployments. These features make gpt-realtime a foundational model for building intelligent voice interfaces that go beyond transcription—delivering conversational intelligence in real time. gpt-realtime Use Cases With its expressive audio capabilities and real-time responsiveness, gpt-realtime unlocks new possibilities across industries. Whether enhancing customer engagement or streamlining operations, it brings voice AI into the heart of enterprise workflows. Examples include: Customer Service Agents: Power virtual agents that respond instantly with natural, tones for rich expressiveness, improving customer satisfaction and reducing wait times. Retail Kiosks & Smart Devices: Enable voice-driven product discovery, troubleshooting, and checkout experiences with real-time feedback. Multilingual Voice Assistants: Deliver localized, expressive voice experiences across global markets with support for multiple languages and dialects. Live Captioning & Accessibility: Combine gpt-4o-transcribe-diarize gpt-realtime to provide real-time captions and voice synthesis for inclusive experiences. These use cases demonstrate how gpt-realtime transforms voice into a strategic interface—bridging human communication and intelligent systems with speed and accuracy. Ready to transform voice into value? Learn more and start building with gpt-4o-transcribe-diarize
Naomi Moneypenny
Nov 05, 2025 Place Azure AI Foundry Blog
2.4KViews
0likes
1Comment
Implementing MCP Remote Servers with Azure Function App and GitHub Copilot Integration
Introduction In the evolving landscape of AI-driven applications, the ability to seamlessly connect large language models (LLMs) with external tools and data sources is becoming a cornerstone of intelligent system design. Model Context Protocol (MCP) — a specification that enables AI agents to discover and invoke tools dynamically, based on context. While MCP is powerful, implementing it from scratch can be daunting !!! That’s where Azure Functions comes in handy. With its event-driven, serverless architecture, Azure Functions now supports a preview extension for MCP, allowing developers to build remote MCP servers that are scalable, secure, and cloud-native. Further, In VS Code, GitHub Copilot Chat in Agent Mode can connect to your deployed Azure Function App acting as an MCP server. This connection allows Copilot to leverage the tools and services exposed by your function app. Why Use Azure Functions for MCP? Serverless Simplicity: Deploy MCP endpoints without managing infrastructure. Secure by Design: Leverage HTTPS, system keys, and OAuth via EasyAuth or API Management. Language Flexibility: Build in .NET, Python, or Node.js using QuickStart templates. AI Integration: Enable GitHub Copilot, VS Code, or other AI agents to invoke your tools via SSE endpoints. Prerequisites Python version 3.11 or higher Azure Functions Core Tools >= 4.0.7030 Azure Developer CLI To use Visual Studio Code to run and debug locally: Visual Studio Code Azure Functions extension An storage emulator is needed when developing azure function app in VScode. you can deploy Azurite extension in VScode to meet this requirement. Press enter or click to view image in full size You can run the Azurite in VS Code as shown below. C:\Program Files\Microsoft Visual Studio\2022\Enterprise\Common7\IDE\Extensions\Microsoft\Azure Storage Emulator> .\azurite.exe Press enter or click to view image in full size alternatively, you can also run Azurite in docker container as shown below. docker run -p 10000:10000 -p 10001:10001 -p 10002:10002 \ mcr.microsoft.com/azure-storage/azurite For more information about setting up Azurite, visit Use Azurite emulator for local Azure Storage development | Microsoft Learn Github Repositories Following Github repos are needed to setup this PoC. Repository for MCP server using Azure Function App https://github.com/mafzal786/mcp-azure-functions-python.git Repository for AI Foundry agent as MCP Client https://github.com/mafzal786/ai-foundry-agent-with-remote-mcp-using-azure-functionapp.git Clone the repository Run the following command to clone the repository to start building your MCP server using Azure function app. git clone https://github.com/mafzal786/mcp-azure-functions-python.git Run the MCP server in VS Code Once cloned. Open the folder in VS Code. Create a virtual environment in VS Code. Change directory to “src” in a new terminal window, install the python dependencies and start the function host locally as shown below. cd src pip install -r requirements.txt func start Note: by default this will use the webhooks route: /runtime/webhooks/mcp/sse. Later we will use this in Azure to set the key on client/host calls: /runtime/webhooks/mcp/sse?code=<system_key> Press enter or click to view image in full size MCP Inspector In a new terminal window, install and run MCP Inspector. npx @modelcontextprotocol/inspector Click to load the MCP inspector. Also provide the generated proxy session token. http://127.0.0.1:6274/#resources In the URL type and click “Connect”: http://localhost:7071/runtime/webhooks/mcp/sse Once connected, click List Tools under Tools and select “hello_mcp” tool and click “Run Tool” for testing as shown below. Press enter or click to view image in full size Select another tool such as get_stockprice and run it as shown below. Press enter or click to view image in full size Deploy Function App to Azure from VS Code For deploying function app to azure from vs code, make sure you have Azure Tools extension enabled in VS Code. To learn more about Azure Tools extension, visit the following Azure Extensions if your VS code environment is not setup for Azure development, follow Configure Visual Studio Code for Azure development with .NET — .NET | Microsoft Learn Once Azure Tools are setup, sign in to Azure account with Azure Tools Press enter or click to view image in full size Once Sign-in is completed, you should be able to see all of your existing resources in the Resources view. These resources can be managed directly in VS Code. Look for Function App in Resource, right click and click “Deploy to Function App”. Press enter or click to view image in full size If you already have it deployed, you will get the following pop-up. Click “Deploy” Press enter or click to view image in full size This will start deploying your function app to Azure. In VS Code, Azure tab will display the following. Press enter or click to view image in full size Once the deployment is completed, you can view the function app and all the tools in Azure portal under function app as shown below. Press enter or click to view image in full size Get the mcp_extension key from Functions → App Keys in Function App. Press enter or click to view image in full size This mcp_extension key would be needed in mcp.json file in VS code, if you would like to test the MCP server using Github Copilot in VS Code. Your entries in mcp.json file will look like as below for example. { "inputs": [ { "type": "promptString", "id": "functions-mcp-extension-system-key", "description": "Azure Functions MCP Extension System Key", "password": true }, { "type": "promptString", "id": "functionapp-name", "description": "Azure Functions App Name" } ], "servers": { "remote-mcp-function": { "type": "sse", "url": "https://${input:functionapp-name}.azurewebsites.net/runtime/webhooks/mcp/sse", "headers": { "x-functions-key": "${input:functions-mcp-extension-system-key}" } }, "local-mcp-function": { "type": "sse", "url": "http://0.0.0.0:7071/runtime/webhooks/mcp/sse" } } } Test Azure Function MCP Server in MCP Inspector Launch MCP Inspector and provide the Azure Function in MCP inspector URL. Provide authentication as shown below. Bearer token is mcp_extension key. Testing an MCP server with GitHub Copilot Testing an MCP server with GitHub Copilot involves configuring and utilizing the server within your development environment to provide enhanced context and capabilities to Copilot Chat. Steps to Test an MCP Server with GitHub Copilot: Ensure Agent Mode is Enabled: Open Copilot Chat in Visual Studio Code and select “Agent” mode. This mode allows Copilot to interact with external tools and services, including MCP servers. Add the MCP Server: Open the Command Palette (Ctrl+Shift+P or Cmd+Shift+P) and run the command MCP: Add Server. Press enter or click to view image in full size Follow the prompts to configure the server. You can choose to add it to your workspace settings (creating a .vscode/mcp.json file) . Select HTTP or Server-Sent events Press enter or click to view image in full size Specify the URL and click Enter Press enter or click to view image in full size Provide a name of your choice Press enter or click to view image in full size Select scope as Global or workspace. I selected Workspace Press enter or click to view image in full size This will generate mcp.json file in .vscode or create a new entry if mcp.json already exists as shown below. Click Start to “start” the server. Also make sure your Azure function app is locally running with func start command. Press enter or click to view image in full size Now Type the prompt as shown below. Press enter or click to view image in full size Try another tool as below. Press enter or click to view image in full size VS code terminal output for reference. Press enter or click to view image in full size Testing an MCP server with Claude Desktop Claude Desktop is a standalone AI application that allows users to interact with Claude AI models directly from their desktop, providing a seamless and efficient experience. you can download Claude desktop at Download Claude In this article, I have added another tool to utilize to test your MCP server running in Azure Function app. Modify claude_desktop_config.json with the following. you can find this file in window environment at C:\Users\<username>\AppData\Roaming\Claude { "mcpServers": { "my mcp": { "command": "npx", "args": [ "mcp-remote", "http://localhost:7071/runtime/webhooks/mcp/sse" ] } } } Note: If claude_desktop_config.json does not exists, click on setting in Claude desktop under user and visit developer tab. You will see you MCP server in Claude Desktop as shown below. Press enter or click to view image in full size Type the prompt such as “What is the stock price of Tesla” . After submitting, you will notice that it is invoking the tool “get_stockprice” from the MCP server running locally and configured in the .json earlier. Click Allow once or Allow always as shown below. Following output will be displayed. Press enter or click to view image in full size Now lets try weather related prompt. As you can see, it has invoked “get_weatheralerts” tool from MCP server. Press enter or click to view image in full size Azure AI Foundry agent as MCP Client Use the following Github repo to set up Azure AI Foundry agent as MCP client. git clone https://github.com/mafzal786/ai-foundry-agent-with-remote-mcp-using-azure-functionapp.git Open the code in VS code and follow the instructions mentioned in README.md file at Github repo. Once you execute the code, following output will show up in VS code. Press enter or click to view image in full size In this code, message is hard coded. Change the content to “what is weather advisory for Florida” and rerun the program. It will call get_weatheralerts tool and output will look like as below. Press enter or click to view image in full size Conclusion The integration of Model Context Protocol (MCP) with Azure Functions marks a pivotal step in democratizing AI agent development. By leveraging Azure’s serverless architecture, developers can now build remote MCP servers that scale automatically, integrate seamlessly with other Azure services, and expose modular tools to intelligent agents like GitHub Copilot. This setup not only simplifies the deployment and management of MCP servers but also enhances the developer experience — allowing tools to be invoked contextually by AI agents in environments like VS Code, GitHub Codespaces, or Copilot Studio[2]. Whether you’re building a tool to query logs, calculate metrics, or manage data, Azure Functions provides the flexibility, security, and scalability needed to bring your AI-powered workflows to life. As the MCP spec continues to evolve, and GitHub Copilot expands its agentic capabilities, this architecture positions you to stay ahead — offering a robust foundation for cloud-native AI tooling that’s both powerful and future-proof.
muafzal
Oct 27, 2025 Place Azure AI Foundry Blog
723Views
1like
1Comment
Interactive AI Avatars: Building Voice Agents with Azure Voice Live API
Azure Voice Live API recently reached General Availability, marking a significant milestone in conversational AI technology. This unified API surface doesn't just enable speech-to-speech capabilities for AI agents—it revolutionizes the entire experience by streaming interactions through lifelike avatars. Built on the powerful speech-to-speech capabilities of the GPT-4 Realtime model, Azure Voice Live API offers developers unprecedented flexibility: - Out-of-the-box or custom avatars from Azure AI Services - Wide range of neural voices, including Indic languages like the one featured in this demo - Single API interface that handles both audio processing and avatar streaming - Real-time responsiveness with sub-second latency In this post, I'll walk you through building a retail e-commerce voice agent that demonstrates this technology. While this implementation focuses on retail apparel, the architecture is entirely generic and can be adapted to any domain—healthcare, banking, education, or customer support—by simply changing the system prompt and implementing domain-specific tools integration. The Challenge: Navigating Uncharted Territory At the time of writing, documentation for implementing avatar features with Azure Voice Live API is minimal. The protocol-specific intricacies around avatar video streaming and the complex sequence of steps required to establish a live avatar connection were quite overwhelming. This is where Agent mode in GitHub Copilot in Visual Studio Code proved extremely useful. Through iterative conversations with the AI agent, I successfully discovered the approach to implement avatar streaming without getting lost in low-level protocol details. Here's how different AI models contributed to this solution: - Claude Sonnet 4.5: Rapidly architected the application structure, designing the hybrid WebSocket + WebRTC architecture with TypeScript/Vite frontend and FastAPI backend - GPT-5-Codex (Preview): Instrumental in implementing the complex avatar streaming components, handling WebRTC peer connections, and managing the bidirectional audio flow Architecture Overview: A Hybrid Approach The architecture comprises of these components 🐳 Container Application Architecture Vite Server: Node.js-based development server that serves the React application. In development, it provides hot module replacement and proxies API calls to `FastAPI`. In production, the React app is built into static files served by FastAPI. FastAPI with ASGI: Python web framework running on `uvicorn ASGI server`. ASGI (Asynchronous Server Gateway Interface) enables handling multiple concurrent connections efficiently, crucial for WebSocket connections and real-time audio processing. 🤖 AI & Voice Services Integration Azure Voice Live API: Primary service that manages the connection to GPT-4 Realtime Model, provides avatar video generation, neural text-to-speech, and WebSocket gateway functionality GPT-4 Realtime Model: Accessed through Azure Voice Live API for real-time audio processing, function calling, and intelligent conversation management 🔄 Communication Flows Audio Flow: Browser → WebSocket → FastAPI → WebSocket → Azure Voice Live API → GPT-4 Realtime Model Video Flow: Browser ↔ WebRTC Direct Connection ↔ Azure Voice Live API (bypasses backend for performance) Function Calls: GPT-4 Realtime (via Voice Live) → FastAPI Tools → Business APIs → Response → GPT-4 Realtime (via Voice Live) 🤖 Business process automation Workflows / RAG Shipment Logic App Agent: Analyzes orders, validates data, creates shipping labels, and updates tracking information Conversation Analysis Agent: Azure Logic App Reviews complete conversations, performs sentiment analysis, generates quality scores with justification, and stores insights for continuous improvement Knowledge Retrieval: Azure AI Search is used to reason over manuals and help respond to Customer queries on policies, products The solution implements a hybrid architecture that leverages both WebSocket proxying and direct WebRTC connections for optimal performance. This design ensures the conversational audio flow remains manageable and secure through the backend, while the bandwidth-intensive avatar video streams directly to the browser for optimal performance. The flow used in the Avatar communication: ``` Frontend FastAPI Backend Azure Voice Live API │ │ │ │ 1. Request Session │ │ │─────────────────────────►│ │ │ │ 2. Create Session │ │ │─────────────────────────►│ │ │ │ │ │ 3. Session Config │ │ │ (with avatar settings)│ │ │─────────────────────────►│ │ │ │ │ │ 4. session.updated │ │ │ (ICE servers) │ │ 5. ICE servers │◄─────────────────────────│ │◄─────────────────────────│ │ │ │ │ │ 6. Click "Start Avatar" │ │ │ │ │ │ 7. Create RTCPeerConn │ │ │ with ICE servers │ │ │ │ │ │ 8. Generate SDP Offer │ │ │ │ │ │ 9. POST /avatar-offer │ │ │─────────────────────────►│ │ │ │ 10. Encode & Send SDP │ │ │─────────────────────────►│ │ │ │ │ │ 11. session.avatar. │ │ │ connecting │ │ │ (SDP answer) │ │ 12. SDP Answer │◄─────────────────────────│ │◄─────────────────────────│ │ │ │ │ │ 13. setRemoteDescription │ │ │ │ │ │ 14. WebRTC Handshake │ │ │◄─────────────────────────┼─────────────────────────►│ │ (Direct Connection) │ │ │ │ │ │ 15. Video/Audio Stream │ │ │◄────────────────────────────────────────────────────│ │ (Bypasses Backend) │ │ ``` For more technical details, refer to the technical details behind the implementation, refer to the GitHub Repo shared in this post. Here is a video of the demo of the application in action.
srikantan
Oct 21, 2025 Place Azure AI Foundry Blog
441Views
2likes
0Comments
GPT-5 Model Family Now Powers Azure AI Foundry Agent Service
The GPT-5 model family is now available in Azure AI Foundry Agent Service, which is generally available for enterprise customers. This means developers and enterprises can move beyond “just models” to build production-ready AI agents with: GPT-5’s advanced reasoning, coding, and multimodal intelligence Enterprise-grade trust, governance, and AgentOps built in Open standards and multi-agent orchestration for real-world workflows From insurance claims to supply chain optimization, Foundry enterprise agents are ready to power mission-critical AI at scale.
pavanli
Oct 09, 2025 Place Azure AI Foundry Blog
1.1KViews
0likes
0Comments
The Future of AI: Exploring Multi-Agent AI Systems
Image generated by Microsoft Designer
Marco_Casalaina
Oct 03, 2025 Place Azure AI Foundry Blog
25KViews
4likes
2Comments
The Future of AI: Unlocking the Power of Azure AI with the Book of AI
Image generated by Microsoft Designer
robchambers
Oct 03, 2025 Place Azure AI Foundry Blog
16KViews
10likes
1Comment
The Future of AI: The paradigm shifts in Generative AI Operations
Dive into the transformative world of Generative AI Operations (GenAIOps) with Microsoft Azure. Discover how businesses are overcoming the challenges of deploying and scaling generative AI applications. Learn about the innovative tools and services Azure AI offers, and how they empower developers to create high-quality, scalable AI solutions. Explore the paradigm shift from MLOps to GenAIOps and see how continuous improvement practices ensure your AI applications remain cutting-edge. Join us on this journey to harness the full potential of generative AI and drive operational excellence.
Yina Arenas
Oct 03, 2025 Place Azure AI Foundry Blog
7.3KViews
1like
1Comment
The Future of AI: Single Agent or Multi-Agent - How Should I Choose?
Image generated by Microsoft Designer
Marco_Casalaina
Oct 03, 2025 Place Azure AI Foundry Blog
7.7KViews
2likes
2Comments
The Future of AI: Oh! One! Applying o1 Models to Business Data
Image generated by Microsoft Designer
Marco_Casalaina
Oct 03, 2025 Place Azure AI Foundry Blog
4.6KViews
4likes
0Comments
The Future of AI: Maximize your fine-tuned model performance with the new Azure AI Evaluation SDK
In this article, we will explore how to effectively evaluate fine-tuned AI models using the new Azure AI Evaluation SDK. This comprehensive guide is the fourth part of our series on making large language model distillation easier. We delve into the importance of model evaluation, outline a systematic process for assessing the performance of a distilled student model against a baseline model, and demonstrate the use of advanced metrics provided by Azure's SDK. Join us as we navigate the intricacies of AI evaluation and provide insights for continuous model improvement and operational efficiency.
cedricvidal
Oct 03, 2025 Place Azure AI Foundry Blog
1.6KViews
1like
0Comments