artificial intelligence
367 TopicsAzure IoT Operations 2603 is now available: Powering the next era of Physical AI
Industrial AI is entering a new phase. For years, AI innovation has largely lived in dashboards, analytics, and digital decision support. Today, that intelligence is moving into the real world, onto factory floors, oil fields, and production lines, where AI systems don’t just analyze data, but sense, reason, and act in physical environments. This shift is increasingly described as Physical AI: intelligence that operates reliably where safety, latency, and real‑world constraints matter most. With the Azure IoT Operations 2603 (v1.3.38) release, Microsoft is delivering one of its most significant updates to date, strengthening the platform foundation required to build, deploy, and operate Physical AI systems at industrial scale. Why Physical AI needs a new kind of platform Physical AI systems are fundamentally different from digital‑only AI. They require: Real‑time, low‑latency decision‑making at the edge Tight integration across devices, assets, and OT systems End‑to‑end observability, health, and lifecycle management Secure cloud‑to‑edge control planes with governance built in Industry leaders and researchers increasingly agree that success in Physical AI depends less on isolated models, and more on software platforms that orchestrate data, assets, actions, and AI workloads across the physical world. Azure IoT Operations was built for exactly this challenge. What’s new in Azure IoT Operations 2603 The 2603 release delivers major advancements across data pipelines, connectivity, reliability, and operational control, enabling customers to move faster from experimentation to production‑grade Physical AI. Cloud‑to‑edge management actions Cloud‑to‑edge management actions enable teams to securely execute control and configuration operations on on‑premises assets, such as invoking methods, writing values, or adjusting settings, using Azure Resource Manager and Event Grid–based MQTT messaging. This capability extends the Azure control plane beyond the cloud, allowing intent, policy, and actions to be delivered reliably to physical systems while remaining decoupled from protocol and device specifics. For Physical AI, this closes the loop between perception and action: insights and decisions derived from models can be translated into governed, auditable changes in the physical world, even when assets operate in distributed or intermittently connected environments. Built‑in RBAC, managed identity, and activity logs ensure every action is authorized, traceable, and compliant, preserving safety, accountability, and human oversight as intelligence increasingly moves from observation to autonomous execution at the edge. No‑code dataflow graphs Azure IoT Operations makes it easier to build real‑time data pipelines at the edge without writing custom code. No‑code data flow graphs let teams design visual processing pipelines using built‑in transforms, with improved reliability, validation, and observability. Visual Editor – Build multi-stage data processing systems in the Operations Experience canvas. Drag and connect sources, transforms, and destinations visually. Configure map rules, filter conditions, and window durations inline. Deploy directly from the browser or define in Bicep/YAML for GitOps. Composable Transforms, Any Order – Chain map, filter, branch, concatenate, and window transforms in any sequence. Branch splits messages down parallel paths based on conditions. Concatenate merges them back. Route messages to different MQTT topics based on content. No fixed pipeline shape. Expressions, Enrichment, and Aggregation – Unit conversions, math, string operations, regex, conditionals, and last-known-value lookups, all built into the expression language. Enrich messages with external data from a state store. Aggregate high-frequency sensor data over tumbling time windows to compute averages, min/max, and counts. Open and Extensible – Connect to MQTT, Kafka, and OpenTelemetry (OTel) endpoints with built-in security through Azure Key Vault and managed identities. Need logic beyond what no-code covers? Drop a custom Wasm module (even embed and run ONNX AI ML models) into the middle of any graph alongside built-in transforms. You're never locked into declarative configuration. Together, these capabilities allow teams to move from raw telemetry to actionable signals directly at the edge without custom code or fragile glue logic. Expanded, production‑ready connectivity The MQTT connector enables customers to onboard MQTT devices as assets and route data to downstream workloads using familiar MQTT topics, with the flexibility to support unified namespace (UNS) patterns when desired. By leveraging MQTT’s lightweight publish/subscribe model, teams can simplify connectivity and share data across consumers without tight coupling between producers and applications. This is especially important for Physical AI, where intelligent systems must continuously sense state changes in the physical world and react quickly based on a consistent, authoritative operational context rather than fragmented data pipelines. Alongside MQTT, Azure IoT Operations continues to deliver broad, industrial‑grade connectivity across OPC UA, ONVIF, Media, REST/HTTP, and other connectors, with improved asset discovery, payload transformation, and lifecycle stability, providing the dependable connectivity layer Physical AI systems rely on to understand and respond to real‑world conditions. Unified health and observability Physical AI systems must be trustworthy. Azure IoT Operations 2603 introduces unified health status reporting across brokers, dataflows, assets, connectors, and endpoints, using consistent states and surfaced through both Kubernetes and Azure Resource Manager. This enables operators to see—not guess—when systems are ready to act in the physical world. Optional OPC UA connector deployment Azure IoT Operations 2603 introduces optional OPC UA connector deployment, reinforcing a design goal to keep deployments as streamlined as possible for scenarios that don’t require OPC UA from day one. The OPC UA connector is a discrete, native component of Azure IoT Operations that can be included during initial instance creation or added later as needs evolve, allowing teams to avoid unnecessary footprint and complexity in MQTT‑only or non‑OPC deployments. This reflects the broader architectural principle behind Azure IoT Operations: a platform built for composability and decomposability, where capabilities are assembled based on scenario requirements rather than assumed defaults, supporting faster onboarding, lower resource consumption, and cleaner production rollouts without limiting future expansion. Broker reliability and platform hardening The 2603 release significantly improves broker reliability through graceful upgrades, idempotent replication, persistence correctness, and backpressure isolation—capabilities essential for always‑on Physical AI systems operating in production environments. Physical AI in action: What customers are achieving today Azure IoT Operations is already powering real‑world Physical AI across industries, helping customers move beyond pilots to repeatable, scalable execution. Procter & Gamble Consumer goods leader P&G continually looks for ways to drive manufacturing efficiency and improve overall equipment effectiveness—a KPI encompassing availability, performance, and quality that’s tracked in P&G facilities around the world. P&G deployed Azure IoT Operations, enabled by Azure Arc, to capture real-time data from equipment at the edge, analyze it in the cloud, and deploy predictive models that enhance manufacturing efficiency and reduce unplanned downtime. Using Azure IoT Operations and Azure Arc, P&G is extrapolating insights and correlating them across plants to improve efficiency, reduce loss, and continue to drive global manufacturing technology forward. More info. Husqvarna Husqvarna Group faced increasing pressure to modernize its fragmented global infrastructure, gain real-time operational insights, and improve efficiency across its supply chain to stay competitive in a rapidly evolving digital and manufacturing landscape. Husqvarna Group implemented a suite of Microsoft Azure solutions—including Azure Arc, Azure IoT Operations, and Azure OpenAI—to unify cloud and on-premises systems, enable real-time data insights, and drive innovation across global manufacturing operations. With Azure, Husqvarna Group achieved 98% faster data deployment and 50% lower infrastructure imaging costs, while improving productivity, reducing downtime, and enabling real-time insights across a growing network of smart, connected factories. More info. Chevron With its Facilities and Operations of the Future initiative, Chevron is reimagining the monitoring of its physical operations to support remote and autonomous operations through enhanced capabilities and real-time access to data. Chevron adopted Microsoft Azure IoT Operations, enabled by Azure Arc, to manage and analyze data locally at remote facilities at the edge, while still maintaining a centralized, cloud-based management plane. Real-time insights enhance worker safety while lowering operational costs, empowering staff to focus on complex, higher-value tasks rather than routine inspections. More info. A platform purpose‑built for Physical AI Across manufacturing, energy, and infrastructure, the message is clear: the next wave of AI value will be created where digital intelligence meets the physical world. Azure IoT Operations 2603 strengthens Microsoft’s commitment to that future—providing the secure, observable, cloud‑connected edge platform required to build Physical AI systems that are not only intelligent, but dependable. Get started To explore the full Azure IoT Operations 2603 release, review the public documentation and release notes, and start building Physical AI solutions that operate and scale confidently in the real world.92Views1like0CommentsGrok 4.20 is now available in Microsoft Foundry
Grok 4.20 from xAI is now available in Microsoft Foundry. Microsoft Foundry is designed to help teams move from model exploration to production-grade AI systems with consistency and control. With the addition of Grok 4.20, customers can evaluate advanced reasoning models within a governed environment, apply safety and content policies, and operationalize them with confidence. About Grok 4.20 Grok 4.20 is a general-purpose large language model in the Grok 4.x family, designed for reasoning-intensive and real-world problem-solving tasks. A key architectural concept is its agentic “swarm” approach, where multiple specialized agents collaborate across workflows combining reasoning, coding, retrieval, and coordination to improve accuracy on complex tasks. This release also reflects a rapid iteration model, with frequent updates that teams can continuously evaluate in Foundry using consistent benchmarks and datasets. Capabilities According to xAI, Grok 4.20 introduces a set of capabilities that teams can evaluate directly in Foundry for their specific workloads: High reliability and reduced hallucination Grok 4.20 is designed to prioritize truthfulness, favoring grounded responses and explicitly acknowledging uncertainty when needed. Multi-agent verification patterns help reduce errors and improve reliability for high-stakes applications. Strong instruction following and consistency The model demonstrates high adherence to prompts, system instructions, and structured workflows, enabling more predictable and controllable outputs across agentic and multi-step tasks. Multi-agent reasoning (agentic swarms) Specialized agents working in parallel enable stronger reasoning across complex workflows, improving performance in areas like coding, analysis, and decision-making. Fast performance and cost efficiency Grok 4.20 is optimized for high-throughput, low-latency workloads, making it suitable for interactive applications and large-scale deployments while maintaining strong cost efficiency. Tool use and real-time retrieval The model is designed to integrate with tools and external data sources, enabling real-time search, retrieval, and evidence-grounded responses for dynamic and time-sensitive scenarios. Coding and technical reasoning Grok 4.20 performs well on code generation, debugging, and iterative development tasks, making it a strong fit for developer workflows and agent-driven engineering scenarios. Long-form and creative workflows With strong instruction adherence and long context, the model supports consistent long-form generation, including document creation, storytelling, and structured content pipelines. Pricing Model Deployment Input/1M Tokens Output/1M Tokens Availability Grok 4.20 Global Standard $2.00 $6.00 Public Preview Using Grok 4.20 in Foundry Grok 4.20 is available in the Foundry model catalog for teams that want to evaluate and operationalize the model within a standardized pipeline. Foundry provides: Tools to run repeatable evaluations on your own datasets The ability to apply scenario-specific safety and content-filtering policies Managed endpoints with monitoring and governance for production deployments To get started, select Grok 4.20 in the Foundry model catalog and run an initial evaluation with a small prompt set. You can expand to broader and more complex scenarios, compare results across models, and deploy the configuration that best meets your requirements. Conclusion Microsoft Foundry enables teams to explore models like Grok 4.20 within a governed environment designed for evaluation and production deployment. By combining model choice with consistent evaluation, safety controls, and operational tooling, teams can move from experimentation to reliable AI systems faster. Try Grok 4.20 in Foundry Models Today497Views1like0CommentsAdvancing to Agentic AI with Azure NetApp Files VS Code Extension v1.2.0
The Azure NetApp Files VS Code Extension v1.2.0 introduces a major leap toward agentic, AI‑informed cloud operations with the debut of the autonomous Volume Scanner. Moving beyond traditional assistive AI, this release enables intelligent infrastructure analysis that can detect configuration risks, recommend remediations, and execute approved changes under user governance. Complemented by an expanded natural language interface, developers can now manage, optimize, and troubleshoot Azure NetApp Files resources through conversational commands - from performance monitoring to cross‑region replication, backup orchestration, and ARM template generation. Version 1.2.0 establishes the foundation for a multi‑agent system built to reduce operational toil and accelerate a shift toward self-managing enterprise storage in the cloud.105Views0likes0CommentsBringing GigaTIME to Microsoft Foundry: Unlocking Tumor Microenvironment Insights with Multimodal AI
Expanding Microsoft Foundry for Scientific AI Workloads AI is increasingly being applied to model complex real-world systems, from climate and industrial processes to human biology. In healthcare, one of the biggest challenges is translating routinely available data into deeper biological insight at scale. GigaTIME is now available in Microsoft Foundry, bringing advanced multimodal capabilities to healthcare and life sciences. This brings advanced multimodal capabilities into Foundry, with Foundry Labs enabling early exploration and the Foundry platform supporting scalable deployment aligned to the model’s intended use. Read on to understand how GigaTIME works and how you can start exploring it in Foundry. From Routine Slides to Deep Biological Insight Understanding how tumors interact with the immune system is central to modern cancer research. While techniques like multiplex immunofluorescence provide these insights, they are expensive and difficult to scale across large patient populations. GigaTIME addresses this challenge by translating widely available hematoxylin and eosin pathology slides into spatially resolved protein activation maps. This allows researchers to infer biological signals such as immune activity, tumor growth, and cellular interactions at a much deeper level. Developed in collaboration with Providence and the University of Washington, GigaTIME enables analysis of tumor microenvironments across diverse cancer types, helping accelerate discovery and improve understanding of disease biology. Use cases for GigaTIME GigaTIME is designed to support research and evaluation workflows across a range of real-world scientific scenarios: Population-Scale Tumor Microenvironment Analysis: Enable large-scale analysis of tumor–immune interactions by generating virtual multiplex immunofluorescence outputs from routine pathology slides. Biomarker Association Discovery: Identify relationships between protein activation patterns and clinical attributes such as mutations, biomarkers, and disease characteristics. Patient Stratification and Cohort Analysis: Segment patient populations across cancer types and subtypes using spatial and combinatorial signals for research and hypothesis generation. Clinical Trial Retrospective Analysis: Apply GigaTIME to H&E archives from completed clinical trials to retrospectively characterize tumor microenvironment features associated with treatment outcomes, enabling new insights from existing trial data without additional tissue processing. Tumor–Immune Interaction Analysis: Assess whether immune cells are infiltrating tumor regions or being excluded by analyzing spatial relationships between tumor and immune signals. Immune System Structure Characterization: Understand how immune cell populations are organized within tissue to evaluate coordination or fragmentation of immune response. Immune Checkpoint Context Interpretation: Examine how immune activity may be locally regulated by analyzing overlap between immune markers and checkpoint signals. Tumor Proliferation Analysis: Identify actively growing tumor regions by combining proliferation signals with tumor localization. Stromal and Vascular Context Understanding: Analyze how tissue architecture, such as vascular density and desmoplastic stroma, shapes immune cell access to tumor regions, helping characterize mechanisms of immune exclusion or infiltration. Start with Exploration, Then Go Deeper Explore in Foundry Labs Foundry Labs provides a lightweight environment for early exploration of emerging AI capabilities. It allows developers and researchers to quickly understand how models like GigaTIME behave before integrating them into production workflows. With GigaTIME in Foundry Labs, you can engage with real-world healthcare scenarios and explore how multimodal models translate pathology data into meaningful biological insight. Through curated experiences, you can: Run inference on pathology images Visualize spatial protein activation patterns Explore tumor and immune interactions in context This helps you build intuition and evaluate how the model can be applied to your specific use cases. Go Deeper with GitHub Examples For advanced scenarios, you can access the underlying notebooks and workflows via GitHub. These examples provide flexibility to customize pipelines, extend workflows, and integrate GigaTIME into broader research and application environments. Together, Foundry Labs and GitHub provide a path from guided exploration to deeper customization. Discover and Deploy GigaTIME in Microsoft Foundry Discover in the Foundry Catalog GigaTIME is available in the Foundry model catalog alongside a growing set of domain-specific models across healthcare, geospatial intelligence, physical systems and more. Deploy for Research Workflows For advanced usage, GigaTIME can be deployed as an endpoint within Foundry to support research and evaluation workflows such as: Biomarker discovery Patient stratification Clinical research pipelines You can start with early exploration in Foundry Labs and transition to scalable deployment on the Foundry platform using the tools and workflows designed for each stage, in line with the intended use of the model. A New Class of AI for Scientific Discovery GigaTIME reflects a broader shift toward AI systems designed to model real-world phenomena. These systems are multimodal, deeply tied to domain-specific data, and designed to produce spatial and contextual outputs. They rely on workflows that combine data processing, model inference, and interpretation, which requires platforms that support the full lifecycle from exploration to production. Microsoft Foundry is built to support this evolution. Learn More and Get Started To explore GigaTIME in more detail: Read the Microsoft Research blog on the underlying research, and population-scale findings. Try hands-on scenarios in Foundry Labs Access GitHub examples for advanced workflows Explore and deploy the model through the Foundry catalog Looking Ahead As AI continues to expand into domains like healthcare, climate science, and industrial systems, the ability to connect models, data, and workflows becomes increasingly important. GigaTIME highlights what is possible when these elements come together, transforming routinely available data into actionable scientific insight. We are excited to see what you build next.118Views0likes0CommentsOneDrive Photos Restyle with AI-now rolling out on mobile and web
Photos capture real moments. With AI Restyle in OneDrive, you can reimagine them in fresh new styles-right where your photos already live. Meet AI Restyle Photos capture real moments. With AI Restyle in OneDrive, you can reimagine those moments in expressive new styles-right where your photos already live. With just a tap, transform everyday photos into cinematic posters, hand‑painted artwork, pencil sketches, anime‑inspired scenes, and more. Choose a style, watch a new version appear in seconds, and keep exploring until it feels just right. Through it all, the people, places, and memories you care about stay unmistakably yours-just seen in a fresh new light. Your photos stay private When you use AI Restyle in OneDrive, your photos remain under your control and are processed only to generate the style you choose. For more information on how AI Restyle works, its intended uses, and limitations, see Transparency note for AI Restyle in OneDrive - Microsoft Support. What you can do with AI Restyle Create something beautiful instantly. Choose from a rotating set of one‑tap styles designed to match the content of your photo-so it’s easy to get a great result right away. New styles are added regularly, giving you fresh ways to reimagine your photos. Add a personal touch when you want. Include an optional prompt to guide the look-no design skills required. Explore until it feels right. Try multiple restyles, undo or redo changes, and keep experimenting until you find the look you love. Share in just a few taps. Go from viewing to restyling to sharing with your favourite apps-without ever leaving OneDrive Photos. Availability AI Restyle is rolling out on OneDrive for iOS, Android, and web for customers with a Microsoft 365 Premium subscription. Availability may vary by region as rollout continues. What’s next We’re continuing to expand AI-powered photo experiences in OneDrive-bringing AI Restyle to additional platforms and investing in new editing capabilities that help you create with confidence while keeping your photos authentic. Try it today Open OneDrive on iOS, Android, or web, sign in with a Microsoft 365 Premium account, open a photo, and tap on ‘AI Restyle’ to start exploring new styles. Have fun creating something new today! Try it on the OneDrive mobile app. iOS: Download Microsoft OneDrive from the App Store Android: Download Microsoft OneDrive from Google Play We’d love your feedback-use 👍👎 to help us improve AI Restyle. #Microsoft #OneDrive #Photos #iOS #Android #Web #AI * This blog was updated on April 7, 2026 to inform how AI Restyle in OneDrive protects users’ privacy and ensures their photos remain secure and under their control.1.2KViews1like0CommentsNow in Foundry: Cohere Transcribe, Nanbeige 4.1-3B, and Octen Embedding
This week's Model Mondays edition spans three distinct layers of the AI application stack: Cohere's cohere-transcribe, a 2B Automatic Speech Recognition (ASR) model that ranks first on the Open ASR Leaderboard across 14 languages; Nanbeige's Nanbeige4.1-3B, a compact 3B reasoning model that outperforms models ten times its size on coding, math, and deep-search benchmarks; and Octen's Octen-Embedding-0.6B, a lightweight text embedding model that achieves strong retrieval scores across 100+ languages and industry-specific domains. Together, these three models illustrate how developers can build full AI pipelines—from audio ingestion to language reasoning to semantic retrieval—entirely with open-source models deployed through Microsoft Foundry. Each operates in a different modality and fills a distinct architectural role, making this week's selection especially well-suited for teams assembling production-grade systems across speech, text, and search. Models of the week Cohere's cohere-transcribe-03-2026 Model Specs Parameters / size: 2B Primary task: Automatic Speech Recognition (audio-to-text) Why it's interesting Top-ranked on the Open ASR Leaderboard: cohere-transcribe-03-2026 achieves a 5.42% average Word Error Rate (WER) across 8 English benchmark datasets as of March 26, 2026—placing it first among open models. It reaches 1.25% WER on LibriSpeech Clean and 8.15% on AMI (meeting transcription), demonstrating consistent accuracy across both clean speech and real-world, multi-speaker environments. Benchmarks: Open ASR Leaderboard. 14 languages with a dedicated encoder-decoder architecture: The model uses a large Conformer encoder for acoustic representation extraction paired with a lightweight Transformer decoder for token generation, trained from scratch on 14 languages covering European, East Asian (Chinese Mandarin, Japanese, Korean, Vietnamese), and Arabic. Unlike general-purpose models adapted for ASR, this dedicated architecture makes it efficient without sacrificing accuracy. Long-form audio with automatic chunking: Audio longer than 35 seconds is automatically split into overlapping chunks and reassembled into a coherent transcript—no manual preprocessing required. Batched inference, punctuation control, and per-language configuration are all supported through the standard API. Try it Click on the window above, upload an audio file, and watch how quickly the model transcribes it for you. Or click the link to experiment with the Cohere Transcribe Space and record audio directly from your device. Use Case Prompt Pattern Meeting transcription Submit recorded audio with language tag; retrieve timestamped transcript per speaker turn Call center quality review Batch-process customer call recordings, extract transcript, pass to classification model Medical documentation Transcribe clinical encounters; feed transcript into summarization or structured note pipeline Multilingual content indexing Process podcasts or video audio in any of 14 supported languages; store as searchable text Sample prompt for a legal services deployment: You are building a contract negotiation assistant. A client submits a recorded audio of a 45-minute supplier negotiation call. Using the cohere-transcribe-03-2026 endpoint deployed in Microsoft Foundry, transcribe the call with punctuation enabled for the English audio. Once the transcript is available, pass it to a downstream language model with the following instruction: "Identify all pricing commitments, delivery deadlines, and liability clauses mentioned in this negotiation transcript. For each, note the speaker's position (client or supplier) and flag any terms that appear ambiguous or require legal review." Nanbeige's Nanbeige4.1-3B Model Specs Parameters / size: 3B Context length: 131,072 tokens Primary task: Text generation (reasoning, coding, tool use, deep search) Why it's interesting Reasoning performance that exceeds its size class: Nanbeige4.1-3B scores 76.9 on LiveCodeBench-V6, these results suggest that targeted post-training using Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) on a focused dataset can yield improvements that scale-based approaches cannot replicate at equivalent parameter counts. Read the technical report: https://huggingface.co/papers/2602.13367. Strong preference alignment at the 3B scale: On Arena-Hard-v2, Nanbeige4.1-3B scores 73.2, compared to 56.0 for Qwen3-32B and 60.2 for Qwen3-30B-A3B—both significantly larger models. This indicates that the model's outputs consistently match human preference for response quality and helpfulness, not just accuracy on structured tasks. Deep-search capability previously absent from small general models: On xBench-DeepSearch-2505, Nanbeige4.1-3B scores 75—matching search-specialized small agents. The model can sustain complex agentic tasks involving more than 500 sequential tool invocations, a capability gap that previously required either specialized search agents or significantly larger models. Native tool-use support: The model's chat template and generation pipeline natively support tool call formatting, making it straightforward to connect to external APIs and build multi-step agentic workflows without additional scaffolding. Try it Use Case Prompt Pattern Code review and fix Provide failing test + stack trace; ask model to diagnose root cause and write corrected implementation Competition-style math Submit problem as structured prompt; use temperature 0.6, top-p 0.95 for consistent reasoning steps Agentic task execution Provide tool definitions as JSON + goal; let model plan and execute tool calls sequentially Long-document Q&A Pass full document (up to 131K tokens) with targeted factual questions; extract structured answers Sample prompt for a software engineering deployment: You are automating pull request review for a backend engineering team. Using the Nanbeige4.1-3B endpoint deployed in Microsoft Foundry, provide the model with a unified diff of a proposed code change and the following system instruction: "You are a senior software engineer reviewing a pull request. For each modified function: (1) summarize what the change does, (2) identify any edge cases that are not handled, (3) flag any security or performance regressions relative to the original, and (4) suggest a specific improvement if one is warranted. Format your output as a structured list per function." Octen's Octen-Embedding-0.6B Model Specs Parameters / size: 0.6B Context length: 32,768 tokens Primary task: Text embeddings (semantic search, retrieval, similarity) Why it's interesting Retrieval performance above larger proprietary models at 0.6B: On the RTEB (Retrieval Text Embedding Benchmark) public leaderboard, Octen-Embedding-0.6B achieves a mean task score of 0.7241—above voyage-3.5 (0.7139), Cohere-embed-v4.0 (0.6534), and text-embedding-3-large (0.6110), despite being a fraction of their parameter count. The model is fine-tuned from Qwen3-Embedding-0.6B via Low-Rank Adaptation (LoRA), demonstrating that targeted fine-tuning on retrieval-specific data can close the gap with larger embedding models. Vertical domain coverage across legal, finance, healthcare, and code: Octen-Embedding-0.6B was trained with explicit coverage of domain-specific retrieval scenarios—legal document matching, financial report Q&A, clinical dialogue retrieval, and code search including SQL. This makes it suitable for regulated-industry applications where generic embedding models tend to underperform on specialized terminology. 32,768-token context for long-document retrieval: The extended context window supports encoding entire legal contracts, earnings reports, or clinical case notes as single embeddings—removing the need to chunk long documents and re-aggregate scores at query time, which can introduce ranking errors. 100+ language support with cross-lingual retrieval: The model handles multilingual and cross-lingual retrieval natively, with strong coverage across languages including English, Chinese, and other major languages via its Qwen3-based architecture—practical for global enterprise applications that span multiple languages. Use Case Prompt Pattern Semantic search Encode user query and document corpus; rank documents by cosine similarity to query embedding Legal precedent retrieval Embed case briefs and query with legal question; retrieve most semantically relevant precedents Cross-lingual document search Encode multilingual document set; submit query in any supported language for cross-lingual retrieval Financial Q&A pipeline Embed earnings reports or filings; retrieve relevant passages to ground downstream language model responses Sample prompt for a global enterprise knowledge base deployment: You are building a clinical decision support tool. Using the Octen-Embedding-0.6B endpoint deployed in Microsoft Foundry, embed a corpus of 10,000 clinical case notes at ingestion time and store the resulting 1024-dimensional vectors in a vector database. At query time, encode an incoming patient presentation summary and retrieve the 5 most semantically similar historical cases. Pass the retrieved cases and the current presentation to a language model with the following instruction: "Based on these five similar cases and their documented outcomes, summarize the most common treatment approaches and flag any cases where the outcome differed significantly from the initial prognosis." Getting started You can deploy open-source Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation: Follow along the Model Mondays series and access the GitHub to stay up to date on the latest Read Hugging Face on Azure docs Learn about one-click deployments from the Hugging Face Hub on Microsoft Foundry Explore models in Microsoft Foundry249Views1like0CommentsAnswer synthesis in Foundry IQ: Quality metrics across 10,000 queries
With answers, you can control your entire RAG pipeline directly in Foundry IQ by Azure AI Search, without integrations. Responding only when the data supports it, answers delivers grounded, steerable, citation-rich responses and traces each piece of information to its original source. Here’s how it works and how it performed across our experiments.1KViews0likes0CommentsTracking Every Token: Granular Cost and Usage Metrics for Microsoft Foundry Agents
As organizations scale their use of AI agents, one question keeps surfacing: how much is each agent actually costing us? Not at the subscription level. Not at the resource group level. Per agent, per model, per request. This post walks through a solution that answers that question by combining three Azure services Microsoft AI Foundry, Azure API Management (APIM), and Application Insights into an observable, metered AI gateway with granular token-level telemetry including custom dates greater than a month for deeper analysis. The Problem: AI Costs Can be a Black Box Foundry’s built-in monitoring and cost views are ultimately powered by telemetry stored in Application Insights, and the out-of-the-box dashboards don’t always provide the exact per-request/per-caller token breakdown or the custom aggregations/joins teams may want for bespoke dashboards (for example, breaking down tokens by APIM subscription, product, tenant, user, route, or agent step). Using APIM to stamp consistent caller/context metadata (headers/claims), Foundry to generate the agent/model run telemetry, and App Insights as the queryable store to let you correlate gateway, agent run, tool/model calls and then build custom KQL-driven dashboards. With data captured in App Insights and custom KQL queries, questions such as below can be answered: Which agent consumed the most tokens last week? What's the average cost per request for a specific agent? How do prompt tokens vs. completion tokens break down per model? Is one agent disproportionately expensive compared to others? Why This Solution Was Built This solution was built to close the observability gap between "we deployed agents" and "we understand what those agents cost." The goals were straightforward: Per-agent, per-model cost attribution - Know exactly which agent is consuming what, down to the token. Real-time telemetry, not batch reports - Metrics flow into Application Insights within minutes, query via KQL. Zero agent modification - The agents themselves don't need to know about telemetry. The tracking happens at the gateway layer. Extensibility - Any agent hosted in Microsoft Foundry and exposed through APIM can be added with a single function call. How It Works The architecture is intentionally simple three services, one data flow. The notebook serves as a testing and prototyping environment, but the same `call_agent()` and `track_llm_usage()` code can be lifted directly into any production Python application that calls Foundry agents. Azure API Management acts as the AI Gateway. Every request to a Foundry-hosted agent flows through APIM, which handles routing, rate limiting, authentication, and tracing. APIM adds its own trace headers (`Ocp-Apim-Trace-Location`) so you can correlate gateway-level diagnostics with your application telemetry. After the API request is successfully completed, we can extract the necessary data from response headers. The notebook is designed for testing and rapid iteration call an agent, inspect the response, verify that telemetry lands in App Insights. It uses `httpx` to call agents through APIM, authenticating with `DefaultAzureCredential` and an APIM subscription key. After each response, it extracts the `usage` object `input_tokens`, `output_tokens`, `total_tokens` — and calculates an estimated cost based on built-in per-model pricing. Application Insights receives this telemetry via OpenTelemetry. The solution sends data to two tables: customMetrics - Cumulative counters for prompt tokens, completion tokens, total tokens, and cost in USD. These power dashboards and alerts. traces - Structured log entries with `custom_dimensions` containing agent name, model, operation ID, token counts, and cost per request. These power ad-hoc KQL queries. traces - stores your application’s trace/log messages (plus custom properties/measurements) as queryable records in Azure Monitor Logs. Demonstrating Granular Cost and Usage Metrics This is where the solution shines. Once telemetry is flowing, you can answer detailed questions with simple KQL queries. Per-Request Detail Query the `traces` table to see every individual agent call with full token and cost breakdown: traces | where message == "llm.usage" | extend cd = parse_json(replace_string( tostring(customDimensions["custom_dimensions"]), "'", "\"")) | extend agent_name = tostring(cd["agent_name"]), model = tostring(cd["model"]), prompt_tokens = toint(cd["prompt_tokens"]), completion_tokens = toint(cd["completion_tokens"]), total_tokens = toint(cd["total_tokens"]), cost_usd = todouble(cd["cost_usd"]) | project timestamp, agent_name, model, prompt_tokens, completion_tokens, total_tokens, cost_usd | order by timestamp desc This gives you a line-item audit trail every request, every agent, every token. Aggregated Metrics Per Agent Summarize across all requests to see averages and totals grouped by agent and model: traces | where message == "llm.usage" | extend cd = parse_json(replace_string( tostring(customDimensions["custom_dimensions"]), "'", "\"")) | extend agent_name = tostring(cd["agent_name"]), model = tostring(cd["model"]), prompt_tokens = toint(cd["prompt_tokens"]), completion_tokens = toint(cd["completion_tokens"]), total_tokens = toint(cd["total_tokens"]), cost_usd = todouble(cd["cost_usd"]) | summarize calls = count(), avg_prompt = avg(prompt_tokens), avg_completion = avg(completion_tokens), avg_total = avg(total_tokens), avg_cost = avg(cost_usd), total_cost = sum(cost_usd) by agent_name, model | order by total_cost desc Now you can see at a glance: Which agent is the most expensive across all calls Average token consumption per request useful for prompt optimization Prompt-to-completion ratio a high ratio may indicate verbose system prompts that could be trimmed Cost trends by model is GPT-4.1 worth the premium over GPT-4o-mini for a particular agent? The same can be done in code with your custom solution: from datetime import timedelta from azure.identity import DefaultAzureCredential from azure.monitor.query import LogsQueryClient KQL = r""" traces | where message == "llm.usage" | extend cd_raw = tostring(customDimensions["custom_dimensions"]) | extend cd = parse_json(replace_string(cd_raw, "'", "\"")) | extend agent_name = tostring(cd["agent_name"]), model = tostring(cd["model"]), operation_id = tostring(cd["operation_id"]), prompt_tokens = toint(cd["prompt_tokens"]), completion_tokens = toint(cd["completion_tokens"]), total_tokens = toint(cd["total_tokens"]), cost_usd = todouble(cd["cost_usd"]) | project timestamp, agent_name, model, operation_id, prompt_tokens, completion_tokens, total_tokens, cost_usd | order by timestamp desc """ def query_logs(): credential = DefaultAzureCredential() client = LogsQueryClient(credential) resp = client.query_resource( resource_id=APP_INSIGHTS_RESOURCE_ID, # defined in config cell query=KQL, timespan=None, # No time filter — returns all available data (up to 90-day retention) ) if resp.status != "Success": raise RuntimeError(f"Query failed: {resp.status} - {getattr(resp, 'error', None)}") table = resp.tables[0] rows = [dict(zip(table.columns, r)) for r in table.rows] return rows if __name__ == "__main__": rows = query_logs() if not rows: print("No telemetry found. Wait 2-5 min after running the agent cell and try again.") else: print(f"Found {len(rows)} records\n") print(f"{'Timestamp':<28} {'Agent':<16} {'Model':<12} {'Op ID':<12} " f"{'Prompt':>8} {'Completion':>11} {'Total':>8} {'Cost ($)':>10}") print("-" * 110) for r in rows[:20]: ts = str(r.get("timestamp", ""))[:19] print(f"{ts:<28} {r.get('agent_name',''):<16} {r.get('model',''):<12} " f"{r.get('operation_id',''):<12} {r.get('prompt_tokens',0):>8} " f"{r.get('completion_tokens',0):>11} {r.get('total_tokens',0):>8} " f"{r.get('cost_usd',0):>10.6f}") What You Can Build on Top Azure Workbooks - Build interactive dashboards showing cost trends over time, agent comparison charts, and token distribution heatmaps. Alerts - Trigger notifications when a single agent exceeds a cost threshold or when token consumption spikes unexpectedly. Azure Dashboard pinning - Pin KQL query results directly to a shared Azure Dashboard for team visibility. Power BI integration - Export telemetry data for executive-level cost reporting across all AI agents. Extensibility: Add Any Agent in One Line The solution is designed to scale with your agent portfolio. Any agent hosted in Microsoft Foundry and exposed through APIM can be integrated without modifying the telemetry pipeline. Adding a new agent is a single function call: response = call_agent("YourNewAgent", "Your prompt here") Token tracking, cost estimation, and telemetry export happen automatically. No additional configuration, no new infrastructure. From Notebook to Production The notebook is a testing harness, a fast way to validate agent connectivity, inspect raw responses, and confirm that telemetry arrives in App Insights. But the code isn't limited to notebooks. The core functions `call_agent()`, `track_llm_usage()`, and the OpenTelemetry configuration are plain Python. They can be dropped directly into any production application that calls Foundry agents through APIM: FastAPI / Flask web service - Wrap `call_agent()` in an endpoint and get per-request cost tracking out of the box. Azure Functions - Call agents from a serverless function with the same telemetry pipeline. Background workers or batch pipelines - Process multiple agent calls and aggregate cost data across runs. CLI tools or scheduled jobs - Run agent evaluations on a schedule with automatic cost logging. The pattern stays the same regardless of where the code runs: # 1. Configure OpenTelemetry + App Insights (once at startup) configure_azure_monitor(connection_string=APP_INSIGHTS_CONN) # 2. Call any agent through APIM response = call_agent("FinanceAgent", "Summarize Q4 earnings") # 3. Token usage and cost are tracked automatically # → customMetrics and traces tables in App Insights Start with the notebook to prove the pattern works. Then move the same code into your production codebase, the telemetry travels with it. Key Takeaways AI cost observability matters. As agent counts grow, per-agent cost attribution becomes essential for budgeting and optimization. APIM as an AI Gateway gives you routing, rate limiting, and tracing in one place without touching agent code. OpenTelemetry + Application Insights provides a battle-tested telemetry pipeline that scales from a single notebook to production workloads. KQL makes the data actionable. Per-request audits, per-agent summaries, and cost trending are all a query away. The solution is additive, not invasive. Agents don't need modification. The telemetry layer wraps around them. This approach gives developers the abiility to view metrics per user, API Key, Agent, request / tool call, or business dimensions(Cost Center, app, environment). If you're running AI agents in Microsoft Foundry and want to understand what they cost at a granular level this pattern gives you the visibility to make informed decisions about model selection, prompt design, and budget allocation. The full solution is available on GitHub: https://github.com/ccoellomsft/foundry-agents-apim-appinsights478Views1like0CommentsVector Drift in Azure AI Search: Three Hidden Reasons Your RAG Accuracy Degrades After Deployment
What Is Vector Drift? Vector drift occurs when embeddings stored in a vector index no longer accurately represent the semantic intent of incoming queries. Because vector similarity search depends on relative semantic positioning, even small changes in models, data distribution, or preprocessing logic can significantly affect retrieval quality over time. Unlike schema drift or data corruption, vector drift is subtle: The system continues to function Queries return results But relevance steadily declines Cause 1: Embedding Model Version Mismatch What Happens Documents are indexed using one embedding model, while query embeddings are generated using another. This typically happens due to: Model upgrades Shared Azure OpenAI resources across teams Inconsistent configuration between environments Why This Matters Embeddings generated by different models: Exist in different vector spaces Are not mathematically comparable Produce misleading similarity scores As a result, documents that were previously relevant may no longer rank correctly. Recommended Practice A single vector index should be bound to one embedding model and one dimension size for its entire lifecycle. If the embedding model changes, the index must be fully re-embedded and rebuilt. Cause 2: Incremental Content Updates Without Re-Embedding What Happens New documents are continuously added to the index, while existing embeddings remain unchanged. Over time, new content introduces: Updated terminology Policy changes New product or domain concepts Because semantic meaning is relative, the vector space shifts—but older vectors do not. Observable Impact Recently indexed documents dominate retrieval results Older but still valid content becomes harder to retrieve Recall degrades without obvious system errors Practical Guidance Treat embeddings as living assets, not static artifacts: Schedule periodic re-embedding for stable corpora Re-embed high-impact or frequently accessed documents Trigger re-embedding when domain vocabulary changes meaningfully Declining similarity scores or reduced citation coverage are often early signals of drift. Cause 3: Inconsistent Chunking Strategies What Happens Chunk size, overlap, or parsing logic is adjusted over time, but previously indexed content is not updated. The index ends up containing chunks created using different strategies. Why This Causes Drift Different chunking strategies produce: Different semantic density Different contextual boundaries Different retrieval behavior This inconsistency reduces ranking stability and makes retrieval outcomes unpredictable. Governance Recommendation Chunking strategy should be treated as part of the index contract: Use one chunking strategy per index Store chunk metadata (for example, chunk_version) Rebuild the index when chunking logic changes Design Principles Versioned embedding deployments Scheduled or event-driven re-embedding pipelines Standardized chunking strategy Retrieval quality observability Prompt and response evaluation Key Takeaways Vector drift is an architectural concern, not a service defect It emerges from model changes, evolving data, and preprocessing inconsistencies Long-lived RAG systems require embedding lifecycle management Azure AI Search provides the controls needed to mitigate drift effectively Conclusion Vector drift is an expected characteristic of production RAG systems. Teams that proactively manage embedding models, chunking strategies, and retrieval observability can maintain reliable relevance as their data and usage evolve. Recognizing and addressing vector drift is essential to building and operating robust AI solutions on Azure. Further Reading The following Microsoft resources provide additional guidance on vector search, embeddings, and production-grade RAG architectures on Azure. Azure AI Search – Vector Search Overview - https://learn.microsoft.com/azure/search/vector-search-overview Azure OpenAI – Embeddings Concepts - https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/embeddings?view=foundry-classic&tabs=csharp Retrieval-Augmented Generation (RAG) Pattern on Azure - https://learn.microsoft.com/en-us/azure/search/retrieval-augmented-generation-overview?tabs=videos Azure Monitor – Observability Overview - https://learn.microsoft.com/azure/azure-monitor/overview294Views0likes0CommentsIntroducing OpenAI’s GPT-image-1.5 in Microsoft Foundry
Developers building with visual AI can often run into the same frustrations: images that drift from the prompt, inconsistent object placement, text that renders unpredictably, and editing workflows that break when iterating on a single asset. That’s why we are excited to announce OpenAI's GPT Image 1.5 is now generally available in Microsoft Foundry. This model can bring sharper image fidelity, stronger prompt alignment, and faster image generation that supports iterative workflows. Starting today, customers can request access to the model and start building in the Foundry platform. Meet GPT Image 1.5 AI driven image generation began with early models like OpenAI's DALL-E, which introduced the ability to transform text prompts into visuals. Since then, image generation models have been evolving to enhance multimodal AI across industries. GPT Image 1.5 represents continuous improvement in enterprise-grade image generation. Building on the success of GPT Image 1 and GPT Image 1 mini, these enhanced models introduce advanced capabilities that cater to both creative and operational needs. The new image model offer: Text-to-image: Stronger instruction following and highly precise editing. Image-to-image: Transform existing images to iteratively refine specific regions Improved visual fidelity: More detailed scenes and realistic rendering. Accelerated creation times: Up to 4x faster generation speed. Enterprise integration: Deploy and scale securely in Microsoft Foundry. GPT Image 1.5 delivers stronger image preservation and editing capabilities, maintaining critical details like facial likeness, lighting, composition, and color tone across iterative changes. You’ll see more consistent preservation of branded logos and key visuals, making it especially powerful for marketing, brand design, and ecommerce workflows—from graphics and logo creation to generating full product catalogs (variants, environments, and angles) from a single source image. Benchmarks Based on an internal Microsoft dataset, GPT Image 1.5 performs higher than other image generation models in prompt alignment and infographics tasks. It focuses on making clear, strong edits – performing best on single-turn modification, delivering the higher visual quality in both single and multi-turn settings. The following results were found across image generation and editing: Text to image Prompt alignment Diagram / Flowchart GPT Image 1.5 91.2% 96.9% GPT Image 1 87.3% 90.0% Qwen Image 83.9% 33.9% Nano Banana Pro 87.9% 95.3% Image editing Evaluation Aspect Modification Preservation Visual Quality Face Preservation Metrics BinaryEval SC (semantic) DINO (Visual) BinaryEval AuraFace Single-turn GPT image 1 99.2% 51.0% 0.14 79.5% 0.30 Qwen image 81.9% 63.9% 0.44 76.0% 0.85 GPT Image 1.5 100% 56.77% 0.14 89.96% 0.39 Multi-turn GPT Image 1 93.5% 54.7% 0.10 82.8% 0.24 Qwen image 77.3% 68.2% 0.43 77.6% 0.63 GPT image 1.5 92.49% 60.55% 0.15 89.46% 0.28 Using GPT Image 1.5 across industries Whether you’re creating immersive visuals for campaigns, accelerating UI and product design, or producing assets for interactive learning GPT Image 1.5 gives modern enterprises the flexibility and scalability they need. Image models can allow teams to drive deeper engagement through compelling visuals, speed up design cycles for apps, websites, and marketing initiatives, and support inclusivity by generating accessible, high‑quality content for diverse audiences. Watch how Foundry enables developers to iterate with multimodal AI across Black Forest Labs, OpenAI, and more: Microsoft Foundry empowers organizations to deploy these capabilities at scale, integrating image generation seamlessly into enterprise workflows. Explore the use of AI image generation here across industries like: Retail: Generate product imagery for catalogs, e-commerce listings, and personalized shopping experiences. Marketing: Create campaign visuals and social media graphics. Education: Develop interactive learning materials or visual aids. Entertainment: Edit storyboards, character designs, and dynamic scenes for films and games. UI/UX: Accelerate design workflows for apps and websites. Microsoft Foundry provides security and compliance with built-in content safety filters, role-based access, network isolation, and Azure Monitor logging. Integrated governance via Azure Policy, Purview, and Sentinel gives teams real-time visibility and control, so privacy and safety are embedded in every deployment. Learn more about responsible AI at Microsoft. Pricing Model Pricing (per 1M tokens) - Global GPT-image-1.5 Input Tokens: $8 Cached Input Tokens: $2 Output Tokens: $32 Cost efficiency improves as well: image inputs and outputs are now cheaper compared to GPT Image 1, enabling organizations to generate and iterate on more creative assets within the same budget. For detailed pricing, refer here. Getting started Learn more about image generation, explore code samples, and read about responsible AI protections here. Try GPT Image 1.5 in Microsoft Foundry and start building multimodal experiences today. Whether you’re designing educational materials, crafting visual narratives, or accelerating UI workflows, these models deliver the flexibility and performance your organization needs.8.6KViews2likes1Comment