azure ai foundry
66 TopicsAnnouncing GPT‑5‑Codex: Redefining Developer Experience in Azure AI Foundry
Today, we’re excited to announce OpenAI’s GPT‑5‑Codex is generally available in Azure AI Foundry, and in public preview for GitHub Copilot in Visual Studio Code. This release is the next step in our continuous commitment to empower developers with the latest model innovation, now building on the proven strengths of the earlier Codex generation along with the speed and CLI fluency many teams have adopted with the latest codex‑mini. Next-level features for developers Multimodal coding in a single flow: GPT-5-Codex accepts multimodal inputs including text and image. With this multimodal intelligence, developers are now empowered to tackle complex tasks, delivering context-aware, repository-scale solutions in one single workflow. Advanced tool use across various experiences: GPT-5-Codex is built for real-world developer experiences. Developers in Azure AI Foundry can get seamless automation and deep integration via the Response API, improving developers’ productivity and reducing development time. Code review expertise: GPT‑5‑Codex is specially trained to conduct code reviews and surface critical flows, helping developers catch issues early and improve code quality with AI-powered insights. It transforms code review from a manual bottleneck into an intelligent, adaptive and integrated process, empowering developers to deliver high-quality code experience. How GPT‑5‑Codex makes your life easier Stay in flow, not in friction: With GPT‑5‑Codex, move smoothly from reading issues to writing code and checking UI; all in one place. It keeps context, so developers stay focused and productive. No more jumping between tools or losing track of what they were doing. Refactor and migrate with confidence: Whether cleaning up code or moving to a new framework, GPT‑5‑Codex helps stage updates, run tests, and fix issues as you go. It’s like having a digital colleague for those tricky transitions. Hero use cases: real impact for developers Repo‑aware refactoring assistant: Feed repo and architecture diagrams to GPT‑5‑Codex. Get cohesive refactors, automated builds, and visual verification via screenshots. Flaky test hunter: Target failing test matrices. The model executes runs, polls status, inspects logs, and recommends fixes looping until stability. Cloud migration copilot: Edit IaC scripts, kick off CLI commands, and iterate on errors in a controlled loop, reducing manual toil. Pricing and Deployment available at GA Deployment Available Region Pricing ($/million tokens) Standard Global East US 2 Sweden Central Input Cached Input Output $1.25 $0.125 $10.00 GPT-5-Codex is bringing developers’ coding experience to a new level. Don’t just write code. Let’s redefine what’s possible. Start building with GPT-5-Codex today and turn your bold ideas into reality now powered by the latest innovation in Azure AI Foundry.3.2KViews2likes1CommentTrigger cant read fabric data agent
I make an agent in Azure AI Foundry. I use fabric data agent as a knowledge. Everything runs well until I try to use trigger to orchestrate my agent. I have added my trigger identity to fabric workspace where my fabric data agent and my lakehouse located. My trigger can work well and there is no error, but my agent cannot respond as if I do a prompt via the playground. Why?Announcing the Grok 4 Fast Models from xAI: Now Available in Azure AI Foundry
These models, grok-4-fast-reasoning and grok-4-fast-non-reasoning, empower developers with distinct approaches to suit their application needs. Each model brings advanced capabilities such as structured outputs, long-context processing, and seamless integration with enterprise-grade security and governance. This release marks a significant step toward scalable, agentic AI systems that orchestrate tools, APIs, and domain data with low latency. Leveraging the Grok 4 Fast models within Azure AI Foundry Models accelerates the development of intelligent applications that combine speed, flexibility, and compliance. The unified model experience, paired with Azure’s enterprise controls, positions the Grok 4 Fast models as foundational technologies for next-generation AI-powered workflows. Why use the Grok 4 Fast Models on Azure Modern AI applications are increasingly agentic—capable of orchestrating tools, APIs, and domain data at low latency. The Grok 4 Fast models were designed for these patterns: fast, intelligent, and agent-ready, enabling parallel tool use, JSON-structured outputs, and image input for multimodal understanding. Azure AI Foundry enhances these models with enterprise controls (RBAC, private networking, customer-managed keys), observability and evaluations, and first-party hosting through Foundry Models—helping teams move confidently from prototype to production. Beyond that, using the Grok 4 Fast models on Azure offers the following: Global scalability and reliability – Azure’s worldwide infrastructure supports resilient, high-availability deployments across multiple regions. Integrated security and compliance – Enterprise-grade identity management, network isolation, encryption at rest and in transit, and compliance certifications help safeguard sensitive data and comply with regulatory requirements. Unified management experience – Centralized monitoring, governance, and cost controls through Azure Portal and Azure Resource Manager simplify operations and oversight. Native integration across Azure services – Easily connect to data sources, analytics, and other services like Azure Synapse, Cosmos DB, and Logic Apps for end-to-end solutions. Enterprise support and SLAs – Azure delivers 24/7 support, service-level agreements, and best-in-class reliability for mission-critical workloads. By building withDeploying Grok 4 Fast models throughon Azure, enables organizations tocan build robust, secure, and scalable AI applications with confidence and agility. Key capabilities The Grok 4 Fast models introduce a suite of advanced features designed to enhance agentic workflows and multimodal integration. With flexible model choices and powerful context handling, the Grok 4 Fast models are engineered for efficiency, scalability, and seamless deployment. Choose reasoning level by selecting which Grok 4 Fast model to use: grok-4-fast-reasoning: Optimized for fast reasoning in agentic workflows. grok-4-fast-non-reasoning: Uses the same underlying weights but is constrained by a non-reasoning system prompt, offering a streamlined approach for specific tasks. Multimodal: Provides image understanding when deployed with Grok image tokenizer. Tool use & structured outputs: Enables parallel function calling and supports JSON schemas for predictable integration. Long context: Supports approximately 131K tokens for deep, comprehensive understanding. Efficient H100 performance: Designed to run efficiently on H100 GPUs for agentic search and real-time orchestration. Collectively, these features make the Grok 4 Fast models a robust and versatile solution for developers and enterprises looking to push the boundaries of AI-powered workflows. What you can do with the Grok 4 Fast models Building on the advanced capabilities of the Grok 4 Fast models, developers can unlock innovative solutions across a wide variety of applications. The following use cases highlight how these models streamline complex workflows, maximize efficiency, and accelerate intelligent automation with robust, scalable AI. Real-time agentic task orchestration : Automate and coordinate multi-step processes across systems with fast, flexible reasoning for dynamic business operations. Multimodal document analysis : Extract insights and process information from both text and images for comprehensive, context-aware understanding. Enterprise search and knowledge retrieval : Leverage long-context support for enhanced semantic search, surfacing relevant information from massive data repositories. Parallel tool integration : Invoke multiple APIs and functions simultaneously, enabling sophisticated workflows with structured, predictable outputs. Scalable conversational AI : Deploy high-capacity virtual agents capable of handling extended dialogues and nuanced queries with low latency. Customizable decision support- : Empower users with AI-driven recommendations and scenario analysis tailored to organizational needs and governance requirements. With the Grok 4 Fast models, developers are equipped to build and iterate on next-generation AI solutions, leveraging powerful tools and streamlined deployment workflows. Start shaping the future of intelligent applications by harnessing the speed, scalability, and multimodal capabilities of the Grok 4 Fast models today. The Grok 4 Fast models offer developers the speed, scalability, and multimodal capabilities needed to advance intelligent applications, supporting complex workflows and innovative solutions across a range of use cases. Pricing for Grok 4 Fast Models on Azure AI Foundry Model Deployment Price $/1m tokens grok-4-fast-reasoning Global Standard (PayGo) Input - $0.43 Output - $1.73 grok-4-fast-non-reasoning Get started in minutes With the Grok 4 Fast models, developers gain access to cutting-edge AI with a massive context window, efficient GPU performance, and enterprise-grade governance. Start building the future of AI today,visit the Model Catalog in Azure AI Foundry and deploy grok-4-fast-reasoning and grok-4-fast-non-reasoning to accelerate your innovation.569Views0likes0CommentsThe Future of AI: An Intern’s Adventure Turning Hours of Video into Minutes of Meaning
This blog post, part of The Future of AI series by Microsoft’s AI Futures team, follows an intern’s journey in developing AutoHighlight—a tool that transforms long-form video into concise, narrative-driven highlight reels. By combining Azure AI Content Understanding with OpenAI reasoning models, AutoHighlight bridges the gap between machine-detected moments and human storytelling. The post explores the challenges of video summarization, the technical architecture of the solution, and the lessons learned along the way.172Views0likes0CommentsBuilding AI Apps with the Foundry Local C# SDK
What Is Foundry Local? Foundry Local is a lightweight runtime designed to run AI models directly on user devices. It supports a wide range of hardware (CPU, GPU, NPU) and provides a consistent developer experience across platforms. The SDKs are available in multiple languages, including Python, JavaScript, Rust, and now C#. Why a C# SDK? The C# SDK brings Foundry Local into the heart of the .NET ecosystem. It allows developers to: Download and manage models locally. Run inference using OpenAI-compatible APIs. Integrate seamlessly with existing .NET applications. This means you can build intelligent apps that run offline, reduce latency, and maintain data privacy—all without sacrificing developer productivity. Bootstrap Process: How the SDK Gets You Started One of the most developer-friendly aspects of the C# SDK is its automatic bootstrap process. Here's what happens under the hood when you initialise the SDK: Service Discovery and Startup The SDK automatically locates the Foundry Local installation on the device and starts the inference service if it's not already running. Model Download and Caching If the specified model isn't already cached locally, the SDK will download the most performant model variant (e.g. GPU, CPU, NPU) for the end user's hardware from the Foundry model catalog. This ensures you're always working with the latest optimised version. Model Loading into Inference Service Once downloaded (or retrieved from cache), the model is loaded into the Foundry Local inference engine, ready to serve requests. This streamlined process means developers can go from zero to inference with just a few lines of code—no manual setup or configuration required. Leverage Your Existing AI Stack One of the most exciting aspects of the Foundry Local C# SDK is its compatibility with popular AI tools such as: OpenAI SDK - Foundry local provides an OpenAI compliant chat completions (and embedding) API meaning. If you’re already using `OpenAI` chat completions API, you can reuse your existing code with minimal changes. Semantic Kernel - Foundry Local also integrates well with Semantic Kernel, Microsoft’s open-source framework for building AI agents. You can use Foundry Local models as plugins or endpoints within Semantic Kernel workflows—enabling advanced capabilities like memory, planning, and tool calling. Quick Start Example Follow these three steps: 1. Create a new project Create a new C# project and navigate to it: dotnet new console -n hello-foundry-local cd hello-foundry-local 2. Install NuGet packages Install the following NuGet packages into your project: dotnet add package Microsoft.AI.Foundry.Local --version 0.1.0 dotnet add package OpenAI --version 2.2.0-beta.4 3. Use the OpenAI SDK with Foundry Local The following example demonstrates how to use the OpenAI SDK with Foundry Local. The code initializes the Foundry Local service, loads a model, and generates a response using the OpenAI SDK. Copy-and-paste the following code into a C# file named Program.cs: using Microsoft.AI.Foundry.Local; using OpenAI; using OpenAI.Chat; using System.ClientModel; using System.Diagnostics.Metrics; var alias = "phi-3.5-mini"; var manager = await FoundryLocalManager.StartModelAsync(aliasOrModelId: alias); var model = await manager.GetModelInfoAsync(aliasOrModelId: alias); ApiKeyCredential key = new ApiKeyCredential(manager.ApiKey); OpenAIClient client = new OpenAIClient(key, new OpenAIClientOptions { Endpoint = manager.Endpoint }); var chatClient = client.GetChatClient(model?.ModelId); var completionUpdates = chatClient.CompleteChatStreaming("Why is the sky blue'"); Console.Write($"[ASSISTANT]: "); foreach (var completionUpdate in completionUpdates) { if (completionUpdate.ContentUpdate.Count > 0) { Console.Write(completionUpdate.ContentUpdate[0].Text); } } Run the code using the following command: dotnet run Final thoughts The Foundry Local C# SDK empowers developers to build intelligent, privacy-preserving applications that run anywhere. Whether you're working on desktop, mobile, or embedded systems, this SDK offers a robust and flexible way to bring AI closer to your users. Ready to get started? Dive into the official documentation: Getting started guide C# Reference documentation You can also make contributions to the C# SDK by creating a PR on GitHub: Foundry Local on GitHub361Views0likes0CommentsThe Future of AI: Evaluating and optimizing custom RAG agents using Azure AI Foundry
This blog post explores best practices for evaluating and optimizing Retrieval-Augmented Generation (RAG) agents using Azure AI Foundry. It introduces the RAG triad metrics—Retrieval, Groundedness, and Relevance—and demonstrates how to apply them using Azure AI Search and agentic retrieval for custom agents. Readers will learn how to fine-tune search parameters, use end-to-end evaluation metrics and golden retrieval metrics like XDCG and Max Relevance, and leverage Azure AI Foundry tools to build trustworthy, high-performing AI agents.1.1KViews0likes0CommentsAnnouncing Live Interpreter API - Now in Public Preview
Today, we’re excited to introduce Live Interpreter –a breakthrough new capability in Azure Speech Translation – that makes real-time, multilingual communication effortless. Live Interpreter continuously identifies the language being spoken without requiring you to set an input language and delivers low latency speech-to-speech translation in a natural voice that preserves the speaker’s style and tone.5.1KViews1like0CommentsBuilding Enterprise Voice-Enabled AI Agents with Azure Voice Live API
The sample application covered in this post demonstrates two approaches in an end-to-end solution that includes product search, order management, automated shipment creation, intelligent analytics, and comprehensive business intelligence through Microsoft Fabric integration. Use Case Scenario: Retail Fashion Agent Core Business Capabilities: Product Discovery and Ordering: Natural language product search across fashion categories (Winter wear, Active wear, etc.) and order placement. REST APIs hosted in Azure Function Apps provide this functionality and a Swagger definition is configured in the Application for tool action. Automated Fulfillment: Integration with Azure Logic Apps for shipment creation in Azure SQL Database Policy Support: Vector-powered QnA for returns, payment issues, and customer policies. Azure AI Search & File Search capabilities are used for this requirement. Conversation Analytics: AI-powered analysis using GPT-4o for sentiment scoring and performance evaluation. The Application captures the entire conversation between the customer and Agent and sends them to an Agent running in Azure Logic Apps to perform call quality assessment, before storing the results in Azure CosmosDB. When during the voice call the customer indicates that the conversation can be concluded, the Agent autonomously sends the conversation history to the Azure Logic App to perform quality assessment. Advanced Analytics Pipeline: Real-time Data Mirroring: Automatic synchronization from Azure Cosmos DB to Microsoft Fabric OneLake Business Intelligence: Custom Data Agents in Fabric for trend analysis and insights Executive Dashboards: Power BI reports for comprehensive performance monitoring Technical Architecture Overview The solution presents two approaches, each optimized for different enterprise scenarios: 🎯Approach 1: Direct Model Integration with GPT-Realtime Architecture Components This approach provides direct integration with Azure Voice Live API using GPT-Realtime model for immediate speech-to-speech conversational experiences without intermediate text processing. The Application connects to the Voice Live API uses a Web socket connection. The semantics of this API are similar to the one used when connecting to the GPT-Realtime API directly. The Voice Live API provides additional configurability, like the choice of a custom Voice from Azure Speech Services, options for echo cancellation, noise reduction and plugging an Avatar integration. Core Technical Stack: GPT-Realtime Model: Direct audio-to-audio processing Azure Speech Voice: High-quality TTS synthesis (en-IN-AartiIndicNeural) WebSocket Communication: Real-time bidirectional audio streaming Voice Activity Detection: Server-side VAD for natural conversation flow Client-Side Function Calling: Full control over tool execution logic Key Session Configuration The Direct Model Integration uses the session configuration below: session_config = { "input_audio_sampling_rate": 24000, "instructions": system_instructions, "turn_detection": { "type": "server_vad", "threshold": 0.5, "prefix_padding_ms": 300, "silence_duration_ms": 500, }, "tools": tools_list, "tool_choice": "auto", "input_audio_noise_reduction": {"type": "azure_deep_noise_suppression"}, "input_audio_echo_cancellation": {"type": "server_echo_cancellation"}, "voice": { "name": "en-IN-AartiIndicNeural", "type": "azure-standard", "temperature": 0.8, }, "input_audio_transcription": {"model": "whisper-1"}, } Configuration Highlights: 24kHz Audio Sampling: High-quality audio processing for natural speech Server VAD: Optimized threshold (0.5) with 300ms padding for natural conversation flow Azure Deep Noise Suppression: Advanced noise reduction for clear audio Indic Voice Support: en-IN-AartiIndicNeural for localized customer experience Whisper-1 Transcription: Accurate speech recognition for conversation logging Connecting to the Azure Voice Live API The voicelive_modelclient.py demonstrates advanced WebSocket handling for real-time audio streaming: def get_websocket_url(self, access_token: str) -> str: """Generate WebSocket URL for Voice Live API.""" azure_ws_endpoint = endpoint.rstrip("/").replace("https://", "wss://") return ( f"{azure_ws_endpoint}/voice-live/realtime?api-version={api_version}" f"&model={model_name}" f"&agent-access-token={access_token}" ) async def connect(self): if self.is_connected(): # raise Exception("Already connected") self.log("Already connected") # Get access token access_token = self.get_azure_token() # Build WebSocket URL and headers ws_url = self.get_websocket_url(access_token) self.ws = await websockets.connect( ws_url, additional_headers={ "Authorization": f"Bearer {self.get_azure_token()}", "x-ms-client-request-id": str(uuid.uuid4()), }, ) print(f"Connected to Azure Voice Live API....") asyncio.create_task(self.receive()) await self.update_session() Function Calling Implementation The Direct Model Integration provides client-side function execution with complete control: tools_list = [ { "type": "function", "name": "perform_search_based_qna", "description": "call this function to respond to the user query on Contoso retail policies, procedures and general QnA", "parameters": { "type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"], }, }, { "type": "function", "name": "create_delivery_order", "description": "call this function to create a delivery order based on order id and destination location", "parameters": { "type": "object", "properties": { "order_id": {"type": "string"}, "destination": {"type": "string"}, }, "required": ["order_id", "destination"], }, }, { "type": "function", "name": "perform_call_log_analysis", "description": "call this function to analyze call log based on input call log conversation text", "parameters": { "type": "object", "properties": { "call_log": {"type": "string"}, }, "required": ["call_log"], }, }, { "type": "function", "name": "search_products_by_category", "description": "call this function to search for products by category", "parameters": { "type": "object", "properties": { "category": {"type": "string"}, }, "required": ["category"], }, }, { "type": "function", "name": "order_products", "description": "call this function to order products by product id and quantity", "parameters": { "type": "object", "properties": { "product_id": {"type": "string"}, "quantity": {"type": "integer"}, }, "required": ["product_id", "quantity"], }, } ] 🤖 Approach 2: Azure AI Foundry Agent Integration Architecture Components This approach leverages existing Azure AI Foundry Service Agents, providing enterprise-grade voice capabilities as a clean wrapper over pre-configured agents. It does not entail any code changes to the Agent itself to voice enable it. Core Technical Stack: Azure Fast Transcript: Advanced multi-language speech-to-text processing Azure AI Foundry Agent: Pre-configured Agent with autonomous capabilities GPT-4o-mini Model: Agent-configured model for text processing Neural Voice Synthesis: Indic language optimized TTS Semantic VAD: Azure semantic voice activity detection Session Configuration The Agent Integration approach uses advanced semantic voice activity detection: session_config = { "input_audio_sampling_rate": 24000, "turn_detection": { "type": "azure_semantic_vad", "threshold": 0.3, "prefix_padding_ms": 200, "silence_duration_ms": 200, "remove_filler_words": False, "end_of_utterance_detection": { "model": "semantic_detection_v1", "threshold": 0.01, "timeout": 2, }, }, "input_audio_noise_reduction": {"type": "azure_deep_noise_suppression"}, "input_audio_echo_cancellation": {"type": "server_echo_cancellation"}, "voice": { "name": "en-IN-AartiIndicNeural", "type": "azure-standard", "temperature": 0.8, }, "input_audio_transcription": {"model": "azure-speech", "language": "en-IN, hi-IN"}, } Key Differentiators: Semantic VAD: Intelligent voice activity detection with utterance prediction Multi-language Support: Azure Speech with en-IN and hi-IN language support End-of-Utterance Detection: AI-powered conversation turn management Filler Word Handling: Configurable processing of conversational fillers Agent Integration Code The voicelive_client.py demonstrates seamless integration with Azure AI Foundry Agents. Notice that we need to provide the Azure AI Foundry Project Name and an ID of the Agent in it. We do not need to pass the model's name here, since the Agent is already configured with one. def get_websocket_url(self, access_token: str) -> str: """Generate WebSocket URL for Voice Live API.""" azure_ws_endpoint = endpoint.rstrip("/").replace("https://", "wss://") return ( f"{azure_ws_endpoint}/voice-live/realtime?api-version={api_version}" f"&agent-project-name={project_name}&agent-id={agent_id}" f"&agent-access-token={access_token}" ) async def connect(self): """Connects the client using a WS Connection to the Realtime API.""" if self.is_connected(): # raise Exception("Already connected") self.log("Already connected") # Get access token access_token = self.get_azure_token() # Build WebSocket URL and headers ws_url = self.get_websocket_url(access_token) self.ws = await websockets.connect( ws_url, additional_headers={ "Authorization": f"Bearer {self.get_azure_token()}", "x-ms-client-request-id": str(uuid.uuid4()), }, ) print(f"Connected to Azure Voice Live API....") asyncio.create_task(self.receive()) await self.update_session() Advanced Analytics Pipeline GPT-4o Powered Call Analysis The solution implements conversation analytics using Azure Logic Apps with GPT-4o: { "functions": [ { "name": "evaluate_call_log", "description": "Evaluate call log for Contoso Retail customer service call", "parameters": { "properties": { "call_reason": { "description": "Categorized call reason from 50+ predefined scenarios", "type": "string" }, "customer_satisfaction": { "description": "Overall satisfaction assessment", "type": "string" }, "customer_sentiment": { "description": "Emotional tone analysis", "type": "string" }, "call_rating": { "description": "Numerical rating (1-5 scale)", "type": "number" }, "call_rating_justification": { "description": "Detailed reasoning for rating", "type": "string" } } } } ] } Microsoft Fabric Integration The analytics pipeline extends into Microsoft Fabric for enterprise business intelligence: Fabric Integration Features: Real-time Data Mirroring: Cosmos DB to OneLake synchronization Custom Data Agents: Business-specific analytics agents in Fabric Copilot Integration: Natural language business intelligence queries Power BI Dashboards: Interactive reports and executive summaries Artefacts for reference The source code of the solution is available in the GitHub Repo here. An article on this topic is published on LinkedIn here A video recording of the demonstration of this App is available below: Part1 - walkthrough of the Agent configuration in Azure AI Foundry - here Part2 - demonstration of the Application that integrates with the Azure Voice Live API - here Part 3 - demonstration of the Microsoft Fabric Integration, Data Agents, Copilot in Fabric and Power BI for insights and analysis - here Conclusion Azure Voice Live API enables enterprises to build sophisticated voice-enabled AI assistants using two distinct architectural approaches. The Direct Model Integration provides ultra-low latency for real-time applications, while the Azure AI Foundry Agent Integration offers enterprise-grade governance and autonomous operation. Both approaches deliver the same comprehensive business capabilities: Natural voice interactions with advanced VAD and noise suppression Complete retail workflow automation from inquiry to fulfillment AI-powered conversation analytics with sentiment scoring Enterprise business intelligence through Microsoft Fabric integration The choice between approaches depends on your specific requirements: Choose Direct Model Integration for custom function calling and minimal latency Choose Azure AI Foundry Agent Integration for enterprise governance and existing investments577Views1like0CommentsThe Future of AI: Optimize Your Site for Agents - It's Cool to be a Tool
Learn how to optimize your website for AI agents like Manus using NLWeb, MCP, structured data, and agent-responsive design. Discover best practices to improve discoverability, usability, and natural language access for autonomous assistants in the evolving agentic web.1.8KViews0likes1Comment