azure ai
284 TopicsVoice Live API now supports WebRTC (Preview)
We are thrilled to announce that Voice Live API now supports WebRTC (Web Real-Time Communication) connection, enabling low‑latency, real‑time voice interactions directly from web and mobile clients. Why WebRTC is required in real-time voice agent building WebRTC enables real‑time, bi‑directional audio and video streaming directly in the browser without plugins or native installs. Unlike WebSocket, which treat audio as generic data and require custom buffering and timing logic, WebRTC is purpose‑built for media‑aware streaming needed for responsive conversational experiences. Specifically: Lower latency: WebRTC is designed to minimize delay, making it more suitable for audio and video communication where low latency is critical for maintaining quality and synchronization. Built-in media handling: WebRTC has built-in support for audio and video codecs, providing optimized handling of media streams. Network resilience: WebRTC includes mechanisms for handling packet loss and jitter, which are essential for maintaining the quality of audio streams over unpredictable networks. How to set up WebRTC in Voice Live In a typical setup, the client establishes a WebSocket‑based control channel with the Voice Live API to exchange SDP offer and answer messages required for WebRTC session negotiation. Once negotiation completes, audio is transmitted over WebRTC RTP media tracks. Non‑audio events, such as voice activity and response lifecycle signals, are exchanged over WebRTC data channels alongside the media streams. Session configuration, control‑plane messages, and error notifications are delivered through the WebSocket control channel. When initiating a WebRTC call session, simply use the voice-live/realtime/calls endpoint instead of voice-live/realtime. For example: wss://<your-ai-foundry-resource-name>.services.ai.azure.com/voice-live/realtime/calls?api-version=2026-01-01-preview&model=gpt-realtime For more information, see the step-by-step instruction. Learn more Voice Live API is transforming how developers build voice-enabled agent systems by providing an integrated, scalable, and efficient solution. By combining speech recognition, generative AI, and text-to-speech functionalities into a unified interface, it addresses the challenges of traditional implementations, enabling faster development and superior user experiences. From streamlining customer service to enhancing education and public services, the opportunities are endless. The future of voice-first solutions is here—let’s build it together! Voice Live API introduction (video) Try Voice Live in Azure AI Foundry Voice Live API documents Voice Live quickstart Voice Live Agent code sample in GitHub248Views0likes0CommentsBuilding Intelligent Apps With Azure
Technology is changing fast, and apps today can do far more than they used to. With artificial intelligence (AI) becoming part of everyday tools, organizations of all sizes are looking for ways to build smarter, more helpful apps. Microsoft Azure makes this easier by giving teams the tools and training they need to learn, experiment, and create with confidence. What Makes an App “Intelligent”? An intelligent app uses AI to make experiences feel more natural and responsive — like understanding language, recognizing images, or offering helpful suggestions. You can build these apps from scratch or modernize the ones you already have using Microsoft Azure. Cloud technology plays a big role because it keeps everything fast, secure, and easy to scale. Helping Teams Build With Confidence Working with AI can feel overwhelming at first. Teams often face challenges like: Not having much experience with AI Feeling unsure about which tools to use Wanting to make sure AI is used responsibly Azure supports teams with hands‑on learning, expert guidance, and built‑in responsible AI tools through frameworks like Microsoft Responsible AI, helping teams build safely and confidently. Start With the Basics Before building intelligent apps, it helps to understand where your team is today and what skills they may need. Azure and Microsoft Learn offer simple, practical resources to get started: AI fundamentals – Learn the basics of how AI works https://learn.microsoft.com/ai Generative AI – Explore tools like AI copilots and how to write effective prompts Craft effective prompts for Microsoft 365 Copilot - Training | Microsoft Learn Cloud‑native development – Build apps designed to run smoothly in the cloud Create cloud native apps with Azure and open-source software - Training | Microsoft Learn Modernizing older apps: Azure application modernization overview - Assess, plan, and modernize existing workloads Microsoft App Modernization Guidance for Azure - App Modernization Guidance | Microsoft Learn Power Platform modernization - Rebuild or extend legacy apps using low-code tools. Modernize applications with Power Platform - Power Platform | Microsoft Learn Azure & .NET modernization – Upgrade ASP.NET apps to modern .NET and deploy to Azure Modernize ASP.NET and ASP.NET Core web applications - App Modernization Guidance | Microsoft Learn These resources help teams learn at their own pace and build confidence as they go. Deploying Apps the Right Way Once your app is ready, you need a smooth way to launch it. Azure provides tools and best practices to help teams: Set up the right environment Build and test apps quickly Use containers and DevOps to speed up delivery Azure DevOps | Microsoft Azure Containers on Azure | Microsoft Azure Deploy AI features safely using Azure’s governance and security tools As many developers have noted, “What used to take weeks now takes hours.” Keep Improving Over Time Building an intelligent app isn’t a one‑time project. Teams need to monitor performance, keep apps secure, and make improvements as technology evolves. Azure offers resources to help with: Scaling apps as usage grows — Automatically adjust resources to meet demand while maintaining performance and reliability. Protecting apps from security threats — Use built‑in security, identity, and compliance tools to safeguard data and reduce risk. Improving accuracy and performance — Monitor models and applications to fine‑tune quality, responsiveness, and user experience. Managing costs — Track usage, optimize resources, and control spending as apps grow and evolve. Create a Culture of Continuous Learning The most successful organizations treat learning as an ongoing investment. As Forbes notes, helping people understand new technologies builds trust and prepares teams for the future. Microsoft offers a wide range of learning paths, tutorials, and hands‑on experiences to support your team as they explore AI and intelligent app development: https://learn.microsoft.com/ https://www.microsoft.com/nonprofits/offers-for-nonprofits https://learn.microsoft.com/industry/nonprofit/microsoft-for-nonprofits/72Views0likes0CommentsIntroducing OpenAI's GPT-image-2 in Microsoft Foundry
Take a small design team running a global social campaign. They have the creative vision to produce localized imagery for every market, but not the resources to reshoot, reformat, or outsource that scale. Every asset needs to fit a different platform, a different dimension, a different cultural context, and they all need to ship at the same time. This is where flexible image generation comes in handy. OpenAI's GPT-image-2 is now generally available and rolling out today to Microsoft Foundry, introducing a step change in image generation. Developers and designers now get more control over image output, so a small team can execute with the reach and flexibility of a much larger one. What is new in GPT-image-2? GPT-image-2 brings real world intelligence, multilingual understanding, improved instruction following, increased resolution support, and an intelligent routing layer giving developers the tools to scale image generation for production workflows. Real world intelligence GPT-image-2 has a knowledge cut off of December 2025, meaning that it is able to give you more contextually relevant and accurate outputs. The model also comes with enhanced thinking capabilities that allow it to search the web, check its own outputs, and create multiple images from just one prompt. These enhancements shift image generation models away from being simple tools and runs them into creative sidekicks. Multilingual understanding GPT-image-2 includes increased language support across Japanese, Korean, Chinese, Hindi, and Bengali, as well as new thinking capabilities. This means the model can create images and render text that feels localized. Increased resolution support GPT-image-2 introduces 4K resolution support, giving developers the ability to generate rich, detailed, and photorealistic images at custom dimensions. Resolution guidelines to keep in mind: Constraint Detail Total pixel budget Maximum pixels in final image cannot exceed 8,294,400 Minimum pixels in final image cannot be less than 655,360 Requests exceeding this are automatically resized to fit. Resolutions 4K, 1024x1024, 1536x1024, and 1024x1536 Dimension alignment Each dimension must be a multiple of 16 Note: If your requested resolution exceeds the pixel budget, the service will automatically resize it down. Intelligent routing layer GPT-image-2 also includes an expanded routing layer with two distinct modes, allowing the service to intelligently select the right generation configuration for a request without requiring an explicitly set size value. Mode 1 — Legacy size selection In Mode 1, the routing layer selects one of the three legacy size tiers to use for generation: Size tier Description smimage Small image output image Standard image output xlimage Large image output This mode is useful for teams already familiar with the legacy size tiers who want to benefit from automatic selection without making any manual changes. Mode 2 — Token size bucket selection In Mode 2, the routing layer selects from six token size buckets — 16, 24, 36, 48, 64, 96 — which map roughly to the legacy size tiers: Token bucket Approximate legacy size 16, 24 smimage 36, 48 image 64, 96 xlimage This approach can allow for more flexibility in the number of tokens generated, which in turn helps to better optimize output quality and efficiency for a given prompt. See it in action GPT-image-2 shows improved image fidelity across visual styles, generating more detailed and refined images. But, don’t just take our word for it, let's see the model in action with a few prompts and edits. Here is the example we used: Prompt: Interior of an empty subway car (no people). Wide-angle view looking down the aisle. Clean, modern subway car with seats, poles, route map strip, and ad frames above the windows. Realistic lighting with a slight cool fluorescent tone, realistic materials (metal poles, vinyl seats, textured floor). As you can see, when using the same base prompt, the image quality and realism improved with each model. Now let’s take a look at adding incremental changes to the same image: Prompt: Populate the ad frames with a cohesive ad campaign for “Zava Flower Delivery” and use an array of flower types. And our subway is now full of ads for the new ZAVA flower delivery service. Let's ask for another small change: Prompt: In all Zava Flower Delivery advertisements, change the flowers shown to roses (red and pink roses). And in three simple prompts, we've created a mockup of a flower delivery ad. From marketing material to website creation to UX design, GPT-image-2 now allows developers to deliver production-grade assets for real business use cases. Image generation across industries These new capabilities open the door to richer, more production-ready image generation workflows across a range of enterprise scenarios: Retail & e-commerce: Generate product imagery at exact platform-required dimensions, from square thumbnails to wide banners, without post-processing. Marketing: Produce crisp, rich in color campaign visuals and social assets localized to different markets. Media & entertainment: Generate storyboard panels and scene at resolutions suited to production pipelines. Education & training: Create visual learning aids and course materials formatted to exact display requirements across devices. UI/UX design: Accelerate mockup and prototype workflows by generating interface assets at the precise dimensions your design system requires. Trust and safety At Microsoft, our mission to empower people and organizations remains constant. As part of this commitment, models made available through Foundry undergo internal reviews and are deployed with safeguards designed to support responsible use at scale. Learn more about responsible AI at Microsoft. For GPT-image-2, Microsoft applied an in-depth safety approach that addresses disallowed content and misuse while maintaining human oversight. The deployment combines OpenAI’s image generation safety mitigations with Azure AI Content Safety, including filters and classifiers for sensitive content. Pricing Model Offer type Pricing - Image Pricing - Text GPT-image-2 Standard Global Input Tokens: $8 Cached Input Tokens: $2 Output Tokens: $30 Input Tokens: $5 Cached Input Tokens: $1.25 Output Tokens: $10 Note: All prices are per 1M token. Getting started Whether you’re building a personalized retail experience, automating visual content pipelines or accelerating design workflows. GPT-image-2 gives your team the resolution control and intelligent routing to generate images that fit your exact needs. Try the GPT-image-2 in Microsoft Foundry today! Deploy the model in Microsoft Foundry Experiment with the model in the Image playground Read the documentation to learn more12KViews3likes3CommentsAutomate Prior Authorization with AI Agents - Now Available as a Foundry Template
By Amit Mukherjee · Principal Solutions Engineer, Microsoft Health & Life Sciences Lindsey Craft-Goins · Technology Leader - Cloud & AI Platforms, Health & Life Sciences Joel Borellis · Director Solutions Engineering - Cloud & AI Platforms, Health & Life Sciences Prior authorization (PA) is one of the most expensive bottlenecks in U.S. healthcare. Physicians complete an average of 39 PA requests per week, spending roughly 13 hours of physician-and-staff time on PA-related work (AMA 2024 Prior Authorization Physician Survey). Turnaround averages 5–14 business days, and PA alone accounts for an estimated $35 billion in annual administrative spending (Sahni et al., Health Affairs Scholar, 2024). The regulatory clock is now ticking. CMS-0057-F mandates electronic PA with 72-hour urgent response starting in 2026. Forty-nine states plus DC already have PA laws on the books, and at least half of all U.S. state legislatures introduced new PA reform bills this year, including laws specifically targeting AI use in PA decisions (KFF Health News, April 2026). Today we’re making the Prior Authorization Multi-Agent Solution Accelerator available as a Microsoft Foundry template. Health plan payers can deploy a working, four-agent PA review pipeline to Azure using the Azure Developer CLI (“azd”) with a single command in supported environments, then customize it to their policies, workflows, and EHR environment. Try it now: Find the template in the Foundry template gallery, or clone directly from github.com/microsoft/Prior-Authorization-Multi-Agent-Solution-Accelerator What the template delivers The accelerator deploys four specialist Foundry hosted agents (Compliance, Clinical Reviewer, Coverage, and Synthesis), each independently containerized and managed by Foundry. In internal testing with synthetic demo cases, the pipeline reduced review workflow, from beginning to completion in under 5 minutes per case. Agent Role Key capability Compliance Documentation check 10-item checklist with blocking/non-blocking flags Clinical Reviewer Clinical evidence ICD-10 validation, PubMed + ClinicalTrials.gov search Coverage Policy matching CMS NCD/LCD lookup, per-criterion MET/NOT_MET mapping Synthesis Decision rubric 3-gate APPROVE/PEND with weighted confidence scoring Compliance and Clinical run in parallel. Coverage runs after clinical findings are ready. Synthesis evaluates all three outputs through a three-gate rubric. The result is a structured recommendation with per-criterion confidence scores and a full audit trail, not a black-box answer. Solution architecture The accelerator runs entirely on Azure. The frontend and backend deploy as Azure Container Apps. The four specialist agents are hosted by Microsoft Foundry. Real-time healthcare data flows through third-party MCP servers. Figure 1: Azure solution architecture How the pipeline works The four agents execute in a structured parallel-then-sequential pipeline. Compliance and Clinical run simultaneously in Phase 1. Coverage runs after clinical findings are ready. The Synthesis agent applies a three-gate decision rubric over all prior outputs. Figure 2: Agentic architecture, hosted agent pipeline Compliance and Clinical run in parallel via asyncio.gather, since neither depends on the other. Coverage runs sequentially after Clinical because it needs the structured clinical profile for criterion mapping. Synthesis evaluates all three outputs through a three-gate rubric (Provider, Codes, Medical Necessity) with weighted confidence scoring: 40% coverage criteria + 30% clinical extraction + 20% compliance + 10% policy match. The total pipeline time is bound by the slowest parallel agent plus the sequential agents, not the sum. In internal testing with synthetic demo cases, this architecture indicated materially reduced processing time compared to sequential manual workflows. Under the hood For the architect in the room, here are four design decisions worth knowing about: Foundry hosted agents: Each agent is independently containerized, versioned, and managed by Foundry’s runtime. The FastAPI backend is a pure HTTP dispatcher. All reasoning happens inside the agent containers, and there are no code changes between local (Docker Compose) and production (Foundry); the environment variable is the only switch. Structured output: Every agent uses MAF’s response_format enforcement to produce typed Pydantic schemas at the token level. No JSON parsing, no malformed fences, no free-form text. The orchestrator receives typed Python objects; the frontend receives a stable API contract. Keyless security: DefaultAzureCredential throughout, so no API keys are stored anywhere. Managed Identity handles production; azd tokens handle local development. Role assignments are provisioned automatically by Bicep at deploy time. Observability: All agents emit OpenTelemetry traces to Azure Application Insights. The Foundry portal shows per-agent spans correlated by case ID. End-to-end latency, per-agent contribution, and error rates are visible from day one with no additional configuration. For the full architecture documentation, agent specifications, Pydantic schemas, and extension guides, see the GitHub repository. Why this matters now Human-in-the-loop by design The system runs in LENIENT mode by default: it produces only APPROVE or PEND and is not designed to produce automated DENY outcomes in its default configuration. Every recommendation requires a clinician to Accept or Override with documented rationale before the decision is finalized. Override records flow to the audit PDF, notification letters, and downstream systems. This directly addresses the emerging wave of state legislation governing AI use in PA decisions. Domain experts own the rules Agent behavior is defined in markdown skill files, not Python code. When CMS updates a coverage determination or a plan changes its commercial policy, a clinician or compliance officer edits a text file and redeploys. No engineering PR required. Real-time healthcare data via MCP Agents connect to five MCP servers for real-time data: ICD-10 codes, NPI Registry, CMS Coverage policies, PubMed, and ClinicalTrials.gov. This incorporates real‑time clinical reference data sources to inform agent recommendations. Third-party MCP servers are included for demonstration with synthetic data only. Their inclusion does not constitute an endorsement by Microsoft. See the GitHub repository for production migration guidance. Audit-ready from day one Every case generates an 8-section audit justification PDF with per-criterion evidence, data source attribution, timestamps, and confidence breakdowns. Clinician overrides are recorded in Section 9. Notification letters (approval and pend) are generated automatically. These artifacts are designed to support CMS-0057-F documentation requirements. Deploy in under 15 minutes From the Foundry template gallery or from the command line: git clone https://github.com/microsoft/Prior-Authorization-Multi-Agent-Solution-Accelerator cd Prior-Authorization-Multi-Agent-Solution-Accelerator azd up That single command provisions Foundry, Azure Container Registry, Container Apps, builds all Docker images, registers the four agents, and runs health checks. The demo is live with a synthetic sample case as soon as deployment completes. What’s included What you customize 4 Foundry hosted agents Payer-specific coverage policies FastAPI orchestrator + Next.js frontend EHR/FHIR integration for clinical notes 5 MCP healthcare data connections Self-hosted MCP servers for production PHI Audit PDF + notification letter generation Authentication (Microsoft Entra ID) Full Bicep infrastructure-as-code Persistent storage (Cosmos DB / PostgreSQL) OpenTelemetry + App Insights observability Additional agents (Pharmacy, Financial) Built on Microsoft Foundry + Foundry hosted agents · Microsoft Agent Framework (MAF) · Azure OpenAI gpt-5.4 · Azure Container Apps · Azure Developer CLI + Bicep · OpenTelemetry + Azure Application Insights · DefaultAzureCredential (keyless, no secrets) Full architecture documentation, agent specifications, and extension guides are in the GitHub repository. Get started Foundry template gallery: Search “AI-Powered Prior Authorization for Healthcare” in the Foundry template section GitHub: github.com/microsoft/Prior-Authorization-Multi-Agent-Solution-Accelerator Disclaimers Not a medical device. This solution accelerator is not a medical device, is not FDA-cleared, and is not intended for autonomous clinical decision-making. All AI recommendations require qualified clinical review before any authorization decision is finalized. Not production-ready software. This is an open-source reference architecture (MIT License), not a supported Microsoft product. Customers are solely responsible for testing, validation, regulatory compliance, security hardening, and production deployment. Performance figures are illustrative. Metrics cited (including processing time reductions) are based on internal testing with synthetic demo data. Actual results will vary based on case complexity, infrastructure, and configuration. Third-party services included for demonstration only; not endorsed by Microsoft. Customers should evaluate providers against their compliance and data residency requirements. The demo uses synthetic data only. Customers deploying real patient data are responsible for HIPAA compliance and establishing appropriate Business Associate Agreements. This accelerator is intended to help customers align documentation workflows with CMS‑0057‑F requirements but has not been independently validated or certified for regulatory compliance.1.4KViews0likes0CommentsResource Guide: Making Physical AI Practical for Real‑World Industrial Operations
Microsoft’s adaptive cloud approach enables organizations to turn operational technology (OT) data into intelligent actions, autonomously, without requiring everything to live in the cloud by unifying cloud-to-edge management plane, data plane, and intelligence platform. At the center of this approach are key foundational technologies: Key Purpose Offering Direct-to-cloud device management + telemetry ingestion Azure IoT Hub Industrial connectivity + edge data plane Azure IoT Operations Unified analytics + real-time intelligence Microsoft Fabric On-device AI inferencing runtime Foundry Local Microsoft Azure IoT Gartner winner: Microsoft named a Leader in the 2025 Gartner® Magic Quadrant™ for Global Industrial IoT Platforms This blog walks through where to get started with each: 1. Manage Cloud-Connected Devices and Telemetry with Azure IoT Hub Azure IoT Hub is a fully managed cloud service that enables secure bidirectional communication, device-to-cloud telemetry ingestion, cloud-to-device command execution, per-device authentication, remote management and more. Telemetry from IoT Hub can also be routed downstream into analytics platforms like Microsoft Fabric for visualization or AI modeling. Recommended Usage: Devices that utilize IoT Hub are distributed, stand-alone devices with fixed-functions. These devices typically do not require cloud-managed containerized workloads or cloud-managed proximal industrial protocol connectivity. Examples of appropriate device-to-cloud IoT Hub endpoint devices include water monitoring stations, vehicle telematics, distributed fluid level sensors, etc. Resources Current in-market services overview: IoT Hub: What is Azure IoT Hub? - Azure IoT Hub DPS: Overview of Azure IoT Hub Device Provisioning Service - Azure IoT Hub Device Provisioning Service ADU: Introduction to Device Update for Azure IoT Hub Building scalable solutions with Azure IoT platform: Best practices for large-scale IoT deployments - Azure IoT Hub Device Provisioning Service Scale Out an Azure IoT Hub-based Solution to Support Millions of Devices - Azure Architecture Center Azure IoT Hub scaling Try out our preview of new IoT Hub capabilities (integration with Azure Device Registry and Certificate Management) Learn more about these capabilities on our blog post: Azure IoT Hub + Azure Device Registry (Preview Refresh): Device Trust and Management at Fleet Scale… Integration with Azure Device Registry (preview): Integration with Azure Device Registry (preview) - Azure IoT Hub Microsoft-backed X.509 certificate management (preview): What is Microsoft-backed X.509 Certificate Management (Preview)? - Azure IoT Hub How to start with the preview: Deploy IoT Hub with ADR integration and certificate management (Preview) - Azure IoT Hub 2. Connect Industrial Assets with Azure IoT Operations Azure IoT Operations provides a unified data plane for the edge that runs on Azure Arc–enabled Kubernetes clusters and supports open industrial standards. It allows organizations to connect and capture equipment telemetry, normalize OT data locally, route hot-path signals to real-time analytics, securely manage layered industrial networks, and more. Edge‑processed data can then be sent upstream to Microsoft Fabric for AI‑driven analysis. Recommended Usage: Azure IoT Operations is intended to be the data plane for an adaptive cloud deployment extending the management, data, and AI capabilities of the Microsoft cloud to an on-prem device. This device binds to these cloud planes providing a platform for local data processing and intermittent connectivity. The target for these devices range from a small-gateway-style PC to a full data center. Azure IoT Operations endpoints enable cloud-managed containerized workloads and cloud-managed proximal industrial protocol connectivity. Examples of appropriate adaptive cloud and Azure IoT Operations endpoints include, on-robot computers, industrial machine controllers, retail store sensor/vision processing, and top-of-factory site infrastructure for line of business applications. Resources Azure IoT Operations Overview Azure IoT Operations Documentation Hub Quickstart: explore-iot-operations/quickstart at main · Azure-Samples/explore-iot-operations Open-source framework for scaling robotics from simulation to production on Azure + NVIDIA: microsoft/physical-ai-toolchain How we built the demo: explore-iot-operations/quickstart at main · Azure-Samples/explore-iot-operations Edge-AI: microsoft/edge-ai: Production-ready Infrastructure as Code, applications, pluggable components, and… Latest Announcements & Blogs Making Physical AI Practical for Real-World Industrial Operations: Part 1 | Microsoft Community Hub Making Physical AI Practical for Real-World Industrial Operations: Part 2 | Microsoft Community Hub Unlock Industrial Intelligence | Microsoft Hannover Messe 2026 From pilots to production: How Microsoft and partners are accelerating intelligent operations 3. Advanced Analytics with Microsoft Fabric Microsoft Fabric delivers a unified, end‑to‑end analytics platform that transforms streaming OT telemetry into real‑time insights and live dashboards. Fabric Operations Agents monitor industrial signals to recommend targeted actions, while Fabric IQ provides a shared semantic foundation that enables AI agents to reason over enterprise data with business context. Together, Fabric turns live industrial data into AI‑powered operational intelligence. Get Started Get Started with Microsoft Fabric Learning Path Fabric Real-Time Intelligence documentation - Microsoft Fabric | Microsoft Learn Create and Configure Operations Agents - Microsoft Fabric | Microsoft Learn Fabric IQ documentation - Microsoft Fabric | Microsoft Learn 4.Run AI Models On‑Device with Foundry Local Foundry Local extends on‑device AI to Arc‑enabled Kubernetes edge clusters, providing a Microsoft‑validated inferencing layer for running AI models in industrial, disconnected or sovereign environments. Get Started Foundry Local on Azure Local Documentation - link Participate in Foundry Local on Azure Local preview form - link Foundry Local on Azure Local: HELM deployment Demo - link Customer Stories Chevron: Chevron plans facilities of the future with Azure IoT Operations Husqvarna: Husqvarna Group Boosts Operational Efficiency with Azure Adaptive Cloud Ecopetrol: Azure IoT Operations and Azure IoT for energy help Ecopetrol optimize energy distribution while lowering operational costs P&G: Procter & Gamble cuts model deployment time up to 90% with Azure IoT Operations Toyota: Toyota Industries innovates its paint shop processes with Azure industrial AI and Azure IoT Hub459Views0likes0CommentsVector Drift in Azure AI Search: Three Hidden Reasons Your RAG Accuracy Degrades After Deployment
What Is Vector Drift? Vector drift occurs when embeddings stored in a vector index no longer accurately represent the semantic intent of incoming queries. Because vector similarity search depends on relative semantic positioning, even small changes in models, data distribution, or preprocessing logic can significantly affect retrieval quality over time. Unlike schema drift or data corruption, vector drift is subtle: The system continues to function Queries return results But relevance steadily declines Cause 1: Embedding Model Version Mismatch What Happens Documents are indexed using one embedding model, while query embeddings are generated using another. This typically happens due to: Model upgrades Shared Azure OpenAI resources across teams Inconsistent configuration between environments Why This Matters Embeddings generated by different models: Exist in different vector spaces Are not mathematically comparable Produce misleading similarity scores As a result, documents that were previously relevant may no longer rank correctly. Recommended Practice A single vector index should be bound to one embedding model and one dimension size for its entire lifecycle. If the embedding model changes, the index must be fully re-embedded and rebuilt. Cause 2: Incremental Content Updates Without Re-Embedding What Happens New documents are continuously added to the index, while existing embeddings remain unchanged. Over time, new content introduces: Updated terminology Policy changes New product or domain concepts Because semantic meaning is relative, the vector space shifts—but older vectors do not. Observable Impact Recently indexed documents dominate retrieval results Older but still valid content becomes harder to retrieve Recall degrades without obvious system errors Practical Guidance Treat embeddings as living assets, not static artifacts: Schedule periodic re-embedding for stable corpora Re-embed high-impact or frequently accessed documents Trigger re-embedding when domain vocabulary changes meaningfully Declining similarity scores or reduced citation coverage are often early signals of drift. Cause 3: Inconsistent Chunking Strategies What Happens Chunk size, overlap, or parsing logic is adjusted over time, but previously indexed content is not updated. The index ends up containing chunks created using different strategies. Why This Causes Drift Different chunking strategies produce: Different semantic density Different contextual boundaries Different retrieval behavior This inconsistency reduces ranking stability and makes retrieval outcomes unpredictable. Governance Recommendation Chunking strategy should be treated as part of the index contract: Use one chunking strategy per index Store chunk metadata (for example, chunk_version) Rebuild the index when chunking logic changes Design Principles Versioned embedding deployments Scheduled or event-driven re-embedding pipelines Standardized chunking strategy Retrieval quality observability Prompt and response evaluation Key Takeaways Vector drift is an architectural concern, not a service defect It emerges from model changes, evolving data, and preprocessing inconsistencies Long-lived RAG systems require embedding lifecycle management Azure AI Search provides the controls needed to mitigate drift effectively Conclusion Vector drift is an expected characteristic of production RAG systems. Teams that proactively manage embedding models, chunking strategies, and retrieval observability can maintain reliable relevance as their data and usage evolve. Recognizing and addressing vector drift is essential to building and operating robust AI solutions on Azure. Further Reading The following Microsoft resources provide additional guidance on vector search, embeddings, and production-grade RAG architectures on Azure. Azure AI Search – Vector Search Overview - https://learn.microsoft.com/azure/search/vector-search-overview Azure OpenAI – Embeddings Concepts - https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/embeddings?view=foundry-classic&tabs=csharp Retrieval-Augmented Generation (RAG) Pattern on Azure - https://learn.microsoft.com/en-us/azure/search/retrieval-augmented-generation-overview?tabs=videos Azure Monitor – Observability Overview - https://learn.microsoft.com/azure/azure-monitor/overview619Views3likes2CommentsGemma 4 now available in Microsoft Foundry
Experimenting with open-source models has become a core part of how innovative AI teams stay competitive: experimenting with the latest architectures and often fine-tuning on proprietary data to achieve lower latencies and cost. Today, we’re happy to announce that the Gemma 4 family, Google DeepMind’s newest model family, is now available in Microsoft Foundry via the Hugging Face collection. Azure customers can now discover, evaluate, and deploy Gemma 4 inside their Azure environment with the same policies they rely on for every other workload. Foundry is the only hyperscaler platform where developers can access OpenAI, Anthropic, Gemma, and over 11,000+ models under a single control plane. Through our close collaboration with Hugging Face, Gemma 4 joining that collection continues Microsoft’s push to bring customers the widest selection of models from any cloud – and fits in line with our enhanced investments in open-source development. Frontier Intelligence, open-source weights Released by Google DeepMind on April 2, 2026, Gemma 4 is built from the same research foundation as Gemini 3 and packaged as open weights under an Apache 2.0 license. Key capabilities across the Gemma 4 family: Native multimodal: Text + image + video inputs across all sizes; analyze video by processing sequences of frames; audio input on edge models (E2B, E4B) Enhanced reasoning & coding capabilities: Multi-step planning, deep logic, and improvements in math and instruction-following enabling autonomous agents Trained for global deployment: Pretrained on 140+ languages with support for 35+ languages out of the box Long context: Context windows of up to 128K tokens (E2B/E4B) and 256K tokens (26B A4B/31B) allow developers to reason across extensive codebases, lengthy documents, or multi-session histories Why choose Foundry? Foundry is built to give developers breadth -- access to models from major model providers, open and proprietary, under one roof. Stay within Azure to work leading models. When you deploy through Foundry, models run inside your Azure environment and are subject to the same network policies, identity controls, and audit processes your organization already has in place. Managed online endpoints handle serving, scaling, and monitoring without manually setting up and managing the underlying infrastructure. Serverless deployment with Azure Container Apps allows developers to deploy and run containerized applications while reducing infrastructure management and saving costs. Gated model access integrates directly with Hugging Face user tokens, so models that require license acceptance stay compliant can be accessed without manual approvals. Foundry Local lets you run optimized Hugging Face models directly on your own hardware using the same model catalog and SDK patterns as your cloud deployments. Read the documentation here: https://aka.ms/foundrylocal and https://aka.ms/HF/foundrylocal Microsoft’s approach to Responsible AI is grounded in our AI principles of fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability. Microsoft Foundry provides governance controls, monitoring, and evaluation capabilities to help organizations deploy new models responsibly in production environments. What are teams building with Gemma 4 in Foundry Gemma 4’s combination of multimodal input, agentic function calling, and long context offers a wide range of production use cases: Document intelligence: Processing PDFs, charts, invoices, and complex tables using native vision capabilities Multilingual enterprise apps: 140+ natively trained languages — ideal for multinational customer support, content platforms as well as language learning tools for grammar correction and writing practice Long-context analytics: Reasoning across entire codebases, legal documents, or multi-session conversation histories Getting started Try Gemma 4 in Microsoft Foundry today. New models from Hugging Face continue to roll out to Foundry on a regular basis through our ongoing collaboration. If there's a model you want to see added, let us know here. Stay connected to our developer community on Discord and stay up to date on what is new in Foundry through the Model Mondays series.1.6KViews1like0CommentsNow in Foundry: Microsoft Harrier and NVIDIA EGM-8B
This week's Model Mondays edition highlights three models that share a common thread: each achieves results comparable to larger leading models, as a result of targeted training strategies rather than scale. Microsoft Research's harrier-oss-v1-0.6b from achieves state-of-the-art results on the Multilingual MTEB v2 embedding benchmark at 0.6B parameters through contrastive learning and knowledge distillation. NVIDIA's EGM-8B scores 91.4 average IoU on the RefCOCO visual grounding benchmark by training a small Vision Language Model (VLM) with reinforcement learning to match the output quality of much larger models. Together they represent a practical argument for efficiency-first model development: the gap between small and large models continues to narrow when training methodology is the focus rather than parameter count alone. Models of the week Microsoft Research: harrier-oss-v1-0.6b Model Specs Parameters / size: 0.6B Context length: 32,768 tokens Primary task: Text embeddings (retrieval, semantic similarity, classification, clustering, reranking) Why it's interesting State-of-the-art on Multilingual MTEB v2 from Microsoft Research: harrier-oss-v1-0.6b is a new embedding model released by Microsoft Research, achieving a 69.0 score on the Multilingual MTEB v2 (Massive Text Embedding Benchmark) leaderboard—placing it at the top of its size class at release. It is part of the harrier-oss family spanning harrier-oss-v1-270m (66.5 MTEB v2), harrier-oss-v1-0.6b (69.0), and harrier-oss-v1-27b (74.3), with the 0.6B variant further trained with knowledge distillation from the larger family members. Benchmarks: Multilingual MTEB v2 Leaderboard. Decoder-only architecture with task-instruction queries: Unlike most embedding models that use encoder-only transformers, harrier-oss-v1-0.6b uses a decoder-only architecture with last-token pooling and L2 normalization. Queries are prefixed with a one-sentence task instruction (e.g., "Instruct: Retrieve relevant passages that answer the query\nQuery: ...") while documents are encoded without instructions—allowing the same deployed model to be specialized for retrieval, classification, or similarity tasks through the prompt alone. Broad task coverage across six embedding scenarios: The model is trained and evaluated on retrieval, clustering, semantic similarity, classification, bitext mining, and reranking—making it suitable as a general embedding backbone for multi-task pipelines rather than a single-use retrieval model. One endpoint, consistent embeddings across the stack. 100+ language support: Trained on a large-scale mixture of multilingual data covering Arabic, Chinese, Japanese, Korean, and 100+ additional languages, with strong cross-lingual transfer for tasks that span language boundaries. Try it Use Case Prompt Pattern Multilingual semantic search Prepend task instruction to query; encode documents without instruction; rank by cosine similarity Cross-lingual document clustering Embed documents across languages; apply clustering to group semantically related content Text classification with embeddings Encode labeled examples + new text; classify by nearest-neighbor similarity in embedding space Bitext mining Encode parallel corpora in source and target languages; align segments by embedding similarity Sample prompt for a global enterprise knowledge base deployment: You are building a multilingual internal knowledge base for a global professional services firm. Using the harrier-oss-v1-0.6b endpoint deployed in Microsoft Foundry, encode all internal documents—policy guides, project case studies, and technical documentation—across English, French, German, and Japanese. At query time, prepend the task instruction to each employee query: "Instruct: Retrieve relevant internal documents that answer the employee's question\nQuery: {question}". Retrieve the top-5 most similar documents by cosine similarity and pass them to a language model with the instruction: "Using only the provided documents, answer the question and cite the source document title for each claim. If no document addresses the question, say so." NVIDIA: EGM-8B Model Specs Parameters / size: ~8.8B Context length: 262,144 tokens Primary task: Image-text-to-text (visual grounding) Why it's interesting Preforms well on visual grounding compared to larger models even at its small size: EGM-8B achieves 91.4 average Intersection over Union (IoU) on the RefCOCO benchmark—the standard measure of how accurately a model localizes a described region within an image. Compared to its base model Qwen3-VL-8B-Thinking (87.8 IoU), EGM-8B achieves a +3.6 IoU gain through targeted Reinforcement Learning (RL) fine-tuning. Benchmarks: EGM Project Page. 5.9x faster than larger models at inference: EGM-8B achieves 737ms average latency. The research demonstrates that test-time compute can be scaled horizontally across small models—generating many medium-quality responses and selecting the best—rather than relying on a single expensive forward pass through a large model. Two-stage training: EGM-8B is trained first with Supervised Fine-Tuning (SFT) on detailed chain-of-thought reasoning traces generated by a proprietary VLM, then refined with Group Relative Policy Optimization (GRPO) using a reward function combining IoU accuracy and task success. The intermediate SFT checkpoint is available as nvidia/EGM-8B-SFT for developers who want to experiment with the intermediate stage. Addresses a root cause of small model grounding errors: The EGM research identifies that 62.8% of small model errors on visual grounding stem from complex multi-relational descriptions—where a model must reason about spatial relationships, attributes, and context simultaneously. By focusing test-time compute on reasoning through these complex prompts, EGM-8B closes the gap without increasing the underlying model size. Try it Use Case Prompt Pattern Object localization Submit image + natural language description; receive bounding box coordinates Document region extraction Provide scanned document image + field description; extract specific regions Visual quality control Submit product image + defect description; localize defect region for downstream classification Retail shelf analysis Provide shelf image + product description; return location of specified SKU Sample prompt for a retail and logistics deployment: You are building a visual inspection system for a logistics warehouse. Using the EGM-8B endpoint deployed in Microsoft Foundry, submit each incoming package scan image along with a natural language grounding query describing the region of interest: "Please provide the bounding box coordinate of the region this sentence describes: {description}". For example: "the label on the upper-left side of the box", "the barcode on the bottom face", or "the damaged corner on the right side". Use the returned bounding box coordinates to route each package to the appropriate inspection station based on the identified region. Getting started You can deploy open-source Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation: Follow along the Model Mondays series and access the GitHub to stay up to date on the latest Read Hugging Face on Azure docs Learn about one-click deployments from the Hugging Face Hub on Microsoft Foundry Explore models in Microsoft Foundry639Views0likes0CommentsPower Up Your Open WebUI with Azure AI Speech: Quick STT & TTS Integration
Introduction Ever found yourself wishing your web interface could really talk and listen back to you? With a few clicks (and a bit of code), you can turn your plain Open WebUI into a full-on voice assistant. In this post, you’ll see how to spin up an Azure Speech resource, hook it into your frontend, and watch as user speech transforms into text and your app’s responses leap off the screen in a human-like voice. By the end of this guide, you’ll have a voice-enabled web UI that actually converses with users, opening the door to hands-free controls, better accessibility, and a genuinely richer user experience. Ready to make your web app speak? Let’s dive in. Why Azure AI Speech? We use Azure AI Speech service in Open Web UI to enable voice interactions directly within web applications. This allows users to: Speak commands or input instead of typing, making the interface more accessible and user-friendly. Hear responses or information read aloud, which improves usability for people with visual impairments or those who prefer audio. Provide a more natural and hands-free experience especially on devices like smartphones or tablets. In short, integrating Azure AI Speech service into Open Web UI helps make web apps smarter, more interactive, and easier to use by adding speech recognition and voice output features. If you haven’t hosted Open WebUI already, follow my other step-by-step guide to host Ollama WebUI on Azure. Proceed to the next step if you have Open WebUI deployed already. Learn More about OpenWeb UI here. Deploy Azure AI Speech service in Azure. Navigate to the Azure Portal and search for Azure AI Speech on the Azure portal search bar. Create a new Speech Service by filling up the fields in the resource creation page. Click on “Create” to finalize the setup. After the resource has been deployed, click on “View resource” button and you should be redirected to the Azure AI Speech service page. The page should display the API Keys and Endpoints for Azure AI Speech services, which you can use in Open Web UI. Settings things up in Open Web UI Speech to Text settings (STT) Head to the Open Web UI Admin page > Settings > Audio. Paste the API Key obtained from the Azure AI Speech service page into the API key field below. Unless you use different Azure Region, or want to change the default configurations for the STT settings, leave all settings to blank. Text to Speech settings (TTS) Now, let's proceed with configuring the TTS Settings on OpenWeb UI by toggling the TTS Engine to Azure AI Speech option. Again, paste the API Key obtained from Azure AI Speech service page and leave all settings to blank. You can change the TTS Voice from the dropdown selection in the TTS settings as depicted in the image below: Click Save to reflect the change. Expected Result Now, let’s test if everything works well. Open a new chat / temporary chat on Open Web UI and click on the Call / Record button. The STT Engine (Azure AI Speech) should identify your voice and provide a response based on the voice input. To test the TTS feature, click on the Read Aloud (Speaker Icon) under any response from Open Web UI. The TTS Engine should reflect Azure AI Speech service! Conclusion And that’s a wrap! You’ve just given your Open WebUI the gift of capturing user speech, turning it into text, and then talking right back with Azure’s neural voices. Along the way you saw how easy it is to spin up a Speech resource in the Azure portal, wire up real-time transcription in the browser, and pipe responses through the TTS engine. From here, it’s all about experimentation. Try swapping in different neural voices or dialing in new languages. Tweak how you start and stop listening, play with silence detection, or add custom pronunciation tweaks for those tricky product names. Before you know it, your interface will feel less like a web page and more like a conversation partner.2.2KViews3likes2CommentsAzure IoT Operations 2603 is now available: Powering the next era of Physical AI
Industrial AI is entering a new phase. For years, AI innovation has largely lived in dashboards, analytics, and digital decision support. Today, that intelligence is moving into the real world, onto factory floors, oil fields, and production lines, where AI systems don’t just analyze data, but sense, reason, and act in physical environments. This shift is increasingly described as Physical AI: intelligence that operates reliably where safety, latency, and real‑world constraints matter most. With the Azure IoT Operations 2603 (v1.3.38) release, Microsoft is delivering one of its most significant updates to date, strengthening the platform foundation required to build, deploy, and operate Physical AI systems at industrial scale. Why Physical AI needs a new kind of platform Physical AI systems are fundamentally different from digital‑only AI. They require: Real‑time, low‑latency decision‑making at the edge Tight integration across devices, assets, and OT systems End‑to‑end observability, health, and lifecycle management Secure cloud‑to‑edge control planes with governance built in Industry leaders and researchers increasingly agree that success in Physical AI depends less on isolated models, and more on software platforms that orchestrate data, assets, actions, and AI workloads across the physical world. Azure IoT Operations was built for exactly this challenge. What’s new in Azure IoT Operations 2603 The 2603 release delivers major advancements across data pipelines, connectivity, reliability, and operational control, enabling customers to move faster from experimentation to production‑grade Physical AI. Cloud‑to‑edge management actions Cloud‑to‑edge management actions enable teams to securely execute control and configuration operations on on‑premises assets, such as invoking methods, writing values, or adjusting settings, using Azure Resource Manager and Event Grid–based MQTT messaging. This capability extends the Azure control plane beyond the cloud, allowing intent, policy, and actions to be delivered reliably to physical systems while remaining decoupled from protocol and device specifics. For Physical AI, this closes the loop between perception and action: insights and decisions derived from models can be translated into governed, auditable changes in the physical world, even when assets operate in distributed or intermittently connected environments. Built‑in RBAC, managed identity, and activity logs ensure every action is authorized, traceable, and compliant, preserving safety, accountability, and human oversight as intelligence increasingly moves from observation to autonomous execution at the edge. No‑code dataflow graphs Azure IoT Operations makes it easier to build real‑time data pipelines at the edge without writing custom code. No‑code data flow graphs let teams design visual processing pipelines using built‑in transforms, with improved reliability, validation, and observability. Visual Editor – Build multi-stage data processing systems in the Operations Experience canvas. Drag and connect sources, transforms, and destinations visually. Configure map rules, filter conditions, and window durations inline. Deploy directly from the browser or define in Bicep/YAML for GitOps. Composable Transforms, Any Order – Chain map, filter, branch, concatenate, and window transforms in any sequence. Branch splits messages down parallel paths based on conditions. Concatenate merges them back. Route messages to different MQTT topics based on content. No fixed pipeline shape. Expressions, Enrichment, and Aggregation – Unit conversions, math, string operations, regex, conditionals, and last-known-value lookups, all built into the expression language. Enrich messages with external data from a state store. Aggregate high-frequency sensor data over tumbling time windows to compute averages, min/max, and counts. Open and Extensible – Connect to MQTT, Kafka, and OpenTelemetry (OTel) endpoints with built-in security through Azure Key Vault and managed identities. Need logic beyond what no-code covers? Drop a custom Wasm module (even embed and run ONNX AI ML models) into the middle of any graph alongside built-in transforms. You're never locked into declarative configuration. Together, these capabilities allow teams to move from raw telemetry to actionable signals directly at the edge without custom code or fragile glue logic. Expanded, production‑ready connectivity The MQTT connector enables customers to onboard MQTT devices as assets and route data to downstream workloads using familiar MQTT topics, with the flexibility to support unified namespace (UNS) patterns when desired. By leveraging MQTT’s lightweight publish/subscribe model, teams can simplify connectivity and share data across consumers without tight coupling between producers and applications. This is especially important for Physical AI, where intelligent systems must continuously sense state changes in the physical world and react quickly based on a consistent, authoritative operational context rather than fragmented data pipelines. Alongside MQTT, Azure IoT Operations continues to deliver broad, industrial‑grade connectivity across OPC UA, ONVIF, Media, REST/HTTP, and other connectors, with improved asset discovery, payload transformation, and lifecycle stability, providing the dependable connectivity layer Physical AI systems rely on to understand and respond to real‑world conditions. Unified health and observability Physical AI systems must be trustworthy. Azure IoT Operations 2603 introduces unified health status reporting across brokers, dataflows, assets, connectors, and endpoints, using consistent states and surfaced through both Kubernetes and Azure Resource Manager. This enables operators to see—not guess—when systems are ready to act in the physical world. Optional OPC UA connector deployment Azure IoT Operations 2603 introduces optional OPC UA connector deployment, reinforcing a design goal to keep deployments as streamlined as possible for scenarios that don’t require OPC UA from day one. The OPC UA connector is a discrete, native component of Azure IoT Operations that can be included during initial instance creation or added later as needs evolve, allowing teams to avoid unnecessary footprint and complexity in MQTT‑only or non‑OPC deployments. This reflects the broader architectural principle behind Azure IoT Operations: a platform built for composability and decomposability, where capabilities are assembled based on scenario requirements rather than assumed defaults, supporting faster onboarding, lower resource consumption, and cleaner production rollouts without limiting future expansion. Broker reliability and platform hardening The 2603 release significantly improves broker reliability through graceful upgrades, idempotent replication, persistence correctness, and backpressure isolation—capabilities essential for always‑on Physical AI systems operating in production environments. Physical AI in action: What customers are achieving today Azure IoT Operations is already powering real‑world Physical AI across industries, helping customers move beyond pilots to repeatable, scalable execution. Procter & Gamble Consumer goods leader P&G continually looks for ways to drive manufacturing efficiency and improve overall equipment effectiveness—a KPI encompassing availability, performance, and quality that’s tracked in P&G facilities around the world. P&G deployed Azure IoT Operations, enabled by Azure Arc, to capture real-time data from equipment at the edge, analyze it in the cloud, and deploy predictive models that enhance manufacturing efficiency and reduce unplanned downtime. Using Azure IoT Operations and Azure Arc, P&G is extrapolating insights and correlating them across plants to improve efficiency, reduce loss, and continue to drive global manufacturing technology forward. More info. Husqvarna Husqvarna Group faced increasing pressure to modernize its fragmented global infrastructure, gain real-time operational insights, and improve efficiency across its supply chain to stay competitive in a rapidly evolving digital and manufacturing landscape. Husqvarna Group implemented a suite of Microsoft Azure solutions—including Azure Arc, Azure IoT Operations, and Azure OpenAI—to unify cloud and on-premises systems, enable real-time data insights, and drive innovation across global manufacturing operations. With Azure, Husqvarna Group achieved 98% faster data deployment and 50% lower infrastructure imaging costs, while improving productivity, reducing downtime, and enabling real-time insights across a growing network of smart, connected factories. More info. Chevron With its Facilities and Operations of the Future initiative, Chevron is reimagining the monitoring of its physical operations to support remote and autonomous operations through enhanced capabilities and real-time access to data. Chevron adopted Microsoft Azure IoT Operations, enabled by Azure Arc, to manage and analyze data locally at remote facilities at the edge, while still maintaining a centralized, cloud-based management plane. Real-time insights enhance worker safety while lowering operational costs, empowering staff to focus on complex, higher-value tasks rather than routine inspections. More info. A platform purpose‑built for Physical AI Across manufacturing, energy, and infrastructure, the message is clear: the next wave of AI value will be created where digital intelligence meets the physical world. Azure IoT Operations 2603 strengthens Microsoft’s commitment to that future—providing the secure, observable, cloud‑connected edge platform required to build Physical AI systems that are not only intelligent, but dependable. Get started To explore the full Azure IoT Operations 2603 release, review the public documentation and release notes, and start building Physical AI solutions that operate and scale confidently in the real world.649Views3likes0Comments