artificial intelligence
377 TopicsTransforming Video Content into Structured SOPs Using Graph-based RAG
Introduction In today’s digital-first environments, a large portion of enterprise knowledge lives inside video content, training sessions, onboarding walkthroughs, and recorded operational procedures. While videos are great for learning, they are not ideal for quick reference, compliance, or repeatable processes. Converting that knowledge into structured documentation like Standard Operating Procedures (SOPs) is often manual and time-consuming. What if this process could be automated using AI? The Problem Transcripts alone don’t solve the problem. When videos are converted into text, the output typically lacks: Clear structure (sections, headings, hierarchy) Context (relationships between steps, tools, and roles) Completeness (definitions and dependencies spread across the content) This leads to a common challenge: Teams spend significant effort manually reading transcripts, interpreting context, and restructuring them into usable documentation. As seen in modern architecture challenges, manual and repetitive configurations don’t scale well and increase maintenance effort Enter Graph-based RAG (GraphRAG) GraphRAG extends traditional RAG by building a knowledge graph instead of treating content as disconnected chunks. What GraphRAG Does Extracts entities (tools, systems, roles, concepts) Maps relationships between them Groups related concepts into logical sections Preserves context across the entire document Architecture Overview Below is the high-level pipeline: Video → Transcription → Knowledge Graph → LLM Generation → Structured SOP Implementation Approach (Step-by-Step) Stage 1: Knowledge Graph Construction Convert video to transcript Split transcript into chunks Feed chunks into GraphRAG GraphRAG performs: Text Unit Extraction Entity Recognition Relationship Mapping Community Detection Result: A structured knowledge graph representation of the transcript Stage 2: Structure Extraction From the knowledge graph: Sequential Steps Preserve procedural flow from transcript order Logical Sections Derived using community detection Key Concepts Identified using graph centrality (importance via connections) This creates a framework for the SOP Stage 3: Intelligent Document Generation Using Azure OpenAI, each SOP section is generated: Section Generated From Title & Purpose High-level concepts Scope Entity boundaries Definitions Entity descriptions Responsibilities Role-based entities Procedures Sequential steps References Linked content The key advantage: LLM is grounded in graph structure not raw text Key Benefits Context Preservation - Relationships between concepts are maintained across sections. Comprehensive Coverage - Community detection ensures important topics are not missed. Reduced Hallucination - LLM generation is grounded in structured knowledge. Scalability- Works for: 30-minute tutorials, 3-hour training sessions and Enterprise knowledge bases Real-World Impact (Example) In enterprise scenarios like pharmaceutical SOP generation: Processing time: ~15–20 minutes for a multi-hour video Output quality: 8–10 structured SOP sections Consistency: Terminology and relationships preserved Coverage: Minimal missing topics Where This Approach Works Best Training videos → SOPs Meeting recordings → action summaries Technical demos → documentation Interview recordings → knowledge bases Tutorials → reference guides Key Takeaway This approach represents a shift from text processing → knowledge understanding. By combining: Knowledge graphs (structure) LLMs (language generation) We can transform raw, unstructured content into usable, enterprise-grade knowledge assets. Resources https://microsoft.github.io/graphrag/index/overview/ Final Thoughts Have you explored GraphRAG or similar approaches in your projects? What challenges did you face? How did you handle unstructured knowledge? Share your experiences — let’s learn together.118Views0likes0CommentsThree tiers of Agentic AI - and when to use none of them
Every enterprise has an AI agent. Almost none of them work in production. Walk into any enterprise technology review right now and you will find the same thing. Pilots running. Demos recorded. Steering committees impressed. And somewhere in the background, a quiet acknowledgment that the thing does not actually work at scale yet. OutSystems surveyed nearly 1,900 global IT leaders and found that 96% of organizations are already running AI agents in some capacity. Yet only one in nine has those agents operating in production at scale. The experiments are everywhere. The production systems are not. That gap is not a capability problem. The infrastructure has matured. Tool calling is standard across all major models. Frameworks like LangGraph, CrewAI, and Microsoft Agent Framework abstract orchestration logic. Model Context Protocol standardizes how agents access external tools and data sources. Google's Agent-to-Agent protocol now under Linux Foundation governance with over 50 enterprise technology partners including Salesforce, SAP, ServiceNow, and Workday standardizes how agents coordinate with each other. The protocols are in place. The frameworks are production ready. The gap is a selection and governance problem. Teams are building agents on problems that do not need them. Choosing the wrong tier for the ones that do. And treating governance as a compliance checkbox to add after launch, rather than an architectural input to design in from the start. The same OutSystems research found that 94% of organizations are concerned that AI sprawl is increasing complexity, technical debt, and security risk and only 12% have a centralized approach to managing it. Teams are deploying agents the way shadow IT spread through enterprises a decade ago: fast, fragmented, and without a shared definition of what production-ready actually means. I've built agentic systems across enterprise clients in logistics, retail, and B2B services. The failures I keep seeing are not technology failures. They are architecture and judgment failures problems that existed before the first line of code was written, in the conversation where nobody asked the prior question. This article is the framework I use before any platform conversation starts. What has genuinely shifted in the agentic landscape Three changes are shaping how enterprise agent architecture should be designed today and they are not incremental improvements on what existed before. The first is the move from single agents to multi-agent systems. Databricks' State of AI Agents report drawing on data from over 20,000 organizations, including more than 60% of the Fortune 500 found that multi-agent workflows on their platform grew 327% in just four months. This is not experimentation. It is production architecture shifting. A single agent handling everything routing, retrieval, reasoning, execution is being replaced by specialized agents coordinating through defined interfaces. A financial organization, for example, might run separate agents for intent classification, document retrieval, and compliance checking each narrow in scope, each connected to the next through a standardized protocol rather than tightly coupled code. The second is protocol standardization. MCP handles vertical connectivity how agents access tools, data sources, and APIs through a typed manifest and standardized invocation pattern. A2A handles horizontal connectivity how agents discover peer agents, delegate subtasks, and coordinate workflows. Production systems today use both. The practical consequence is that multi-agent architectures can be composed and governed as a platform rather than managed as a collection of one-off integrations. The third is governance as the differentiating factor between teams that ship and teams that stall. Databricks found that companies using AI governance tools get over 12 times more AI projects into production compared to those without. The teams running production agents are not running more sophisticated models. They built evaluation pipelines, audit trails, and human oversight gates before scaling not after the first incident. Tier 1 - Low-code agents: fast delivery with a defined ceiling The low-code tier is more capable than it was eighteen months ago. Copilot Studio, Salesforce Agentforce, and equivalent platforms now support richer connector libraries, better generative orchestration, and more flexible topic models. The ceiling is higher than it was. It is still a ceiling. The core pattern remains: a visual topic model drives a platform-managed LLM that classifies intent and routes to named execution branches. Connectors abstract credential management and API surface. A business team — analyst, citizen developer, IT operations — can build, deploy, and iterate without engineering involvement on every change. For bounded conversational problems, this is the fastest path from requirement to production. The production reality is documented clearly. Gartner data found that only 5% of Copilot Studio pilots moved to larger-scale deployment. A European telecom with dedicated IT resources and a full Microsoft enterprise agreement spent six months and did not deliver a single production agent. The visual builder works. The path from prototype to production, production-grade integrations, error handling, compliance logging, exception routing is where most enterprises get stuck, because it requires Power Platform expertise that most business teams do not have. The platform ceiling shows up predictably at four points. Async processing anything beyond a synchronous connector call, including approval chains, document pipelines, or batch operations cannot be handled natively. Full payload audit logs platform logs give conversation transcripts and connector summaries, not structured records of every API call and its parameters. Production volume concurrency limits and message throughput budgets bind faster than planning assumptions suggest. Root cause analysis in production you cannot inspect the LLM's confidence score or the alternatives it considered, which makes diagnosing misbehavior significantly harder than it should be. The correct diagnostic: can this use case be owned end-to-end by a business team, covered by standard connectors, with no latency SLA below three seconds and no payload-level compliance requirement? Yes, low code is the correct tier. Not a compromise. If no on any point, continue. If low-code is the right call for your use case: Copilot Studio quickstart Tier 2 - Pro-code agents: the architecture the current landscape demands The defining pattern in production pro-code architecture today is multi-agent. Specialized agents per domain, coordinating through MCP for tool access and A2A for peer-to-peer delegation, with a governance layer spanning the entire system. What this looks like in practice: a financial organization handling incoming compliance queries runs separate agents for intent classification, document retrieval, and the compliance check itself. None of these agents tries to do all three jobs. Each has a narrow responsibility, a defined input/output contract typed against a JSON Schema, and a clear handoff boundary. The 327% growth in multi-agent workflows reflects production teams discovering that the failure modes of monolithic agents topic collision, context overflow, degraded classification as scope expands are solved by specialization, not by making a single agent more capable. The discipline that makes multi-agent systems reliable is identical to what makes single-agent systems reliable, just enforced across more boundaries: the LLM layer reasons and coordinates; deterministic tool functions enforce. In a compliance pipeline, no LLM decides whether a document satisfies a regulatory requirement. That evaluation runs in a deterministic tool with a versioned rule set, testable outputs, and an immutable audit log. The LLM orchestrates the sequence. The tool produces the compliance record. Mixing these letting an LLM evaluate whether a rule pass collapses the audit trail and introduces probabilistic outputs on questions that have regulatory answers. MCP is the tool interface standard today. An MCP server exposes a typed manifest any compliant agent runtime can discover at startup. Tools are versioned, independently deployable, and reusable across agents without bespoke integration code. A2A extends this horizontally: agents advertise capability cards, discover peers, and delegate subtasks through a standardised protocol. The practical consequence is that multi-agent systems built on both protocols can be composed and governed as a platform rather than managed as a collection of one-off integrations. Observability is the architectural element that separates teams shipping production agents from teams perpetually in pilot. Build evaluation pipelines, distributed traces across all agent boundaries, and human review gates before scaling. The teams that add these after the first production incident spend months retrofitting what should have been designed in. If pro-code is the right call for your use case: Foundry Agent Service The hybrid pattern: still where production deployments land The shift to multi-agent architecture does not change the hybrid pattern it deepens it. Low-code at the conversational surface, pro-code multi-agent systems behind it, with a governance layer spanning both. On a logistics client engagement, the brief was a sales assistant for account managers shipment status, account health, and competitive context inside Teams. The business team wanted everything in Copilot Studio. Engineering wanted a custom agent runtime. Both were wrong. What we built: Copilot Studio handled all high-frequency, low-complexity queries shipment tracking, account status, open cases through Power Platform connectors. Zero custom code. That covered roughly 78% of actual interaction volume. Requests requiring multi-source reasoning competitive positioning on a specific lane, churn risk across an account portfolio, contract renewal analysis delegated via authenticated HTTP action to a pro-code multi-agent service on Azure. A retrieval agent pulled deal history and market intelligence through MCP-exposed tools. A synthesis agent composed the recommendation with confidence scoring. Structured JSON back to the low-code layer, rendered as an adaptive card in Teams. The HITL gate was non-negotiable and designed before deployment, not added after the first incident. No output reached a customer without a manager approval step. The agent drafts. A human sends. This boundary low-code for conversational volume, pro-code for reasoning depth maps directly to what the research shows separates teams that ship from teams that stall. The organizations running agents in production drew the line correctly between what the platform can own and what engineering needs to own. Then they built governance into both sides before scaling. The four gates - the prior question that still gets skipped Run every candidate use case through these four checks before the platform conversation begins. None of the recent infrastructure improvements change what they are checking, because none of them change the fundamental cost structure of agentic reasoning. Gate 1 - is the logic fully deterministic? If every valid output for every valid input can be enumerated in unit tests, the problem does not need an LLM. A rules engine executes in microseconds at zero inference cost and cannot produce a plausible-but-wrong answer. NeuBird AI's production ops agents which have resolved over a million alerts and saved enterprises over $2 million in engineering hours work because alert triage logic that can be expressed as rules runs in deterministic code, and the LLM only handles cases where pattern-matching is insufficient. That boundary is not incidental to the system's reliability. It is the reason for it. Gate 2 - is zero hallucination tolerance required? With over 80% of databases now being built by AI agents per Databricks' State of AI Agents report the surface area for hallucination-induced data errors has grown significantly. In domains where a wrong answer is a compliance event financial calculation, medical logic, regulatory determinations irreducible LLM output uncertainty is disqualifying regardless of model version or prompt engineering effort. Exit to deterministic code or classical ML with bounded output spaces. Gate 3 - is a sub-100ms latency SLA required? LLM inference is faster than it was eighteen months ago. It is not fast enough for payment transaction processing, real-time fraud scoring, or live inventory management. A three-agent system with MCP tool calls has a P50 latency measured in seconds. These problems need purpose-built transactional architecture. Gate 4 - is regulatory explainability required? A2A enables complex agent coordination and delegation. It does not make LLM reasoning reproducible in a regulatory sense. Temperature above zero means the same input produces different outputs across invocations. Regulators in financial services, healthcare, and consumer credit require deterministic, auditable decision rationale. Exit to deterministic workflow with structured audit logging at every Five production failure modes - one of them new The four original anti-patterns are still showing up in production. A fifth has been added by scale. Routing data retrieval through a reasoning loop. A direct API call returns account status in under 10ms. Routing the same request through an LLM reasoning step adds hundreds of milliseconds, consumes tokens on every call, and introduces output parsing on data that is already structured. The agent calls a structured tool. The tool calls the API. The agent never acts as the integration layer. Encoding business rules in prompts. Rules expressed in prompt text drift as models update. They produce probabilistic output across invocations and fail in ways that are difficult to reproduce and diagnose. A rule that must evaluate correctly every time belongs in a deterministic tool function unit-tested, version-controlled, independently deployable via MCP. No approval gate on CRUD operations. CRUD operations without a human approval step will eventually misfire on the input that testing did not cover. The gate needs to be designed before deployment, not added after the first incident involving a financial posting, a customer-facing communication, or a data deletion. Monolithic agent for all domains. A single agent accumulating every domain leads predictably to topic collision, context overflow, and maintenance that becomes impossible as scope expands. Specialized agents per domain, coordinating through A2A, is the architecture that scales. Ungoverned agent sprawl. This is the new one and currently the most prevalent. OutSystems found 94% of organizations concerned about it, with only 12% having a centralized response. Teams building agents independently across fragmented stacks, without shared governance, evaluation standards, or audit infrastructure, produce exactly the same organizational debt that shadow IT created but with higher stakes, because these systems make autonomous decisions rather than just storing and retrieving data. The fix is treating governance as an architectural input before deployment, not a compliance requirement after something breaks. The infrastructure is ready. The judgment is not. The tier decision sequence has not changed. Does the problem need natural language understanding or dynamic generation? No — deterministic system, stop. Can a business team own it through standard connectors with no sub-3-second latency SLA and no payload-level compliance requirement? Yes — low-code. Does it need custom orchestration, multi-agent coordination, or audit-grade observability? Yes — pro-code with MCP and A2A. Does it need both a conversational surface and deep backend reasoning? Hybrid, with a governance layer spanning both. What has changed is that governance is no longer optional infrastructure to add when you have time. The data is unambiguous. Companies with governance tools get over 12 times more AI projects into production than those without. Evaluation pipelines, distributed tracing across agent boundaries, human oversight gates, and centralised agent lifecycle management are not overhead. They are what converts experiments into production systems. The teams still stuck in pilot are not stuck because the technology failed them. They are stuck because they skipped this layer. The protocols are standardised. The frameworks are mature. The infrastructure exists. None of that is what is holding most enterprise agent programmes back. What is holding them back is a selection problem disguised as a technology problem — teams building agents before asking whether agents are warranted, choosing platforms before running the four gates, and treating governance as a checkpoint rather than an architectural input. I have built agents that should have been workflow engines. Not because the technology was wrong, but because nobody stopped early enough to ask whether it was necessary. The four gates in this article exist because I learned those lessons at clients' expense, not mine. The most useful thing I can offer any team starting an agentic AI project is not a framework selection guide. It is permission to say no — and a clear basis for saying it. Take the four gates framework to your next architecture review. If you have already shipped agents to production, I would like to hear what worked and what did not - comment below What to do next Three concrete steps depending on where you are right now. If you have pilots that have not reached production: Run them through the four gates in this article before the next sprint. Gate 1 alone will eliminate a meaningful percentage of them. The ones that survive all four are your real candidates for production investment. Download the attached file for gated checklist and take it into your next architecture review. If you are starting a new agent project: Do not open a platform before you have answered the gate questions. Once you have confirmed an agent is warranted and identified the tier, start here: Copilot Studio guided setup for low-code scenarios, or Foundry Agent Service for pro-code patterns with MCP and multi-agent coordination built in. Build governance infrastructure - evaluation pipeline, distributed tracing, HITL gates - before you scale, not after. If you have already shipped agents to production: Share what worked and what did not in the Azure AI Tech Community — tag posts with #AgentArchitecture. The most useful signal for teams still in pilot is hearing from practitioners who have been through production, not vendors describing what production should look like. References OutSystems — State of AI Development Report - https://www.outsystems.com/1/state-ai-development-report Databricks — State of AI Agents Report - https://www.databricks.com/resources/ebook/state-of-ai-agents Gartner — 2025 Microsoft 365 and Copilot Survey - https://www.gartner.com/en/documents/6548002 (Paywalled primary source — publicly reported via techpartner.news: https://www.techpartner.news/news/gartner-microsoft-copilot-hype-offset-by-roi-and-readiness-realities-618118) Anthropic — Model Context Protocol (MCP) - https://modelcontextprotocol.io Google Cloud — Agent-to-Agent Protocol (A2A) . https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability NeuBird AI — Production Operations Deployment Announcement NeuBird AI Closes $19.3M Funding Round to Scale Agentic AI Across Enterprise Production Operations ReAct: Synergizing Reasoning and Acting in Language Models — Yao et al. https://arxiv.org/abs/2210.03629 Enterprise Integration Patterns — Gregor Hohpe & Bobby Woolf, Addison-Wesley https://www.enterpriseintegrationpatterns.com1.5KViews4likes1CommentIntroducing OpenAI's GPT-image-2 in Microsoft Foundry
Take a small design team running a global social campaign. They have the creative vision to produce localized imagery for every market, but not the resources to reshoot, reformat, or outsource that scale. Every asset needs to fit a different platform, a different dimension, a different cultural context, and they all need to ship at the same time. This is where flexible image generation comes in handy. OpenAI's GPT-image-2 is now generally available and rolling out today to Microsoft Foundry, introducing a step change in image generation. Developers and designers now get more control over image output, so a small team can execute with the reach and flexibility of a much larger one. What is new in GPT-image-2? GPT-image-2 brings real world intelligence, multilingual understanding, improved instruction following, increased resolution support, and an intelligent routing layer giving developers the tools to scale image generation for production workflows. Real world intelligence GPT-image-2 has a knowledge cut off of December 2025, meaning that it is able to give you more contextually relevant and accurate outputs. The model also comes with enhanced thinking capabilities that allow it to search the web, check its own outputs, and create multiple images from just one prompt. These enhancements shift image generation models away from being simple tools and runs them into creative sidekicks. Multilingual understanding GPT-image-2 includes increased language support across Japanese, Korean, Chinese, Hindi, and Bengali, as well as new thinking capabilities. This means the model can create images and render text that feels localized. Increased resolution support GPT-image-2 introduces 4K resolution support, giving developers the ability to generate rich, detailed, and photorealistic images at custom dimensions. Resolution guidelines to keep in mind: Constraint Detail Total pixel budget Maximum pixels in final image cannot exceed 8,294,400 Minimum pixels in final image cannot be less than 655,360 Requests exceeding this are automatically resized to fit. Resolutions 4K, 1024x1024, 1536x1024, and 1024x1536 Dimension alignment Each dimension must be a multiple of 16 Note: If your requested resolution exceeds the pixel budget, the service will automatically resize it down. Intelligent routing layer GPT-image-2 also includes an expanded routing layer with two distinct modes, allowing the service to intelligently select the right generation configuration for a request without requiring an explicitly set size value. Mode 1 — Legacy size selection In Mode 1, the routing layer selects one of the three legacy size tiers to use for generation: Size tier Description smimage Small image output image Standard image output xlimage Large image output This mode is useful for teams already familiar with the legacy size tiers who want to benefit from automatic selection without making any manual changes. Mode 2 — Token size bucket selection In Mode 2, the routing layer selects from six token size buckets — 16, 24, 36, 48, 64, 96 — which map roughly to the legacy size tiers: Token bucket Approximate legacy size 16, 24 smimage 36, 48 image 64, 96 xlimage This approach can allow for more flexibility in the number of tokens generated, which in turn helps to better optimize output quality and efficiency for a given prompt. See it in action GPT-image-2 shows improved image fidelity across visual styles, generating more detailed and refined images. But, don’t just take our word for it, let's see the model in action with a few prompts and edits. Here is the example we used: Prompt: Interior of an empty subway car (no people). Wide-angle view looking down the aisle. Clean, modern subway car with seats, poles, route map strip, and ad frames above the windows. Realistic lighting with a slight cool fluorescent tone, realistic materials (metal poles, vinyl seats, textured floor). As you can see, when using the same base prompt, the image quality and realism improved with each model. Now let’s take a look at adding incremental changes to the same image: Prompt: Populate the ad frames with a cohesive ad campaign for “Zava Flower Delivery” and use an array of flower types. And our subway is now full of ads for the new ZAVA flower delivery service. Let's ask for another small change: Prompt: In all Zava Flower Delivery advertisements, change the flowers shown to roses (red and pink roses). And in three simple prompts, we've created a mockup of a flower delivery ad. From marketing material to website creation to UX design, GPT-image-2 now allows developers to deliver production-grade assets for real business use cases. Image generation across industries These new capabilities open the door to richer, more production-ready image generation workflows across a range of enterprise scenarios: Retail & e-commerce: Generate product imagery at exact platform-required dimensions, from square thumbnails to wide banners, without post-processing. Marketing: Produce crisp, rich in color campaign visuals and social assets localized to different markets. Media & entertainment: Generate storyboard panels and scene at resolutions suited to production pipelines. Education & training: Create visual learning aids and course materials formatted to exact display requirements across devices. UI/UX design: Accelerate mockup and prototype workflows by generating interface assets at the precise dimensions your design system requires. Trust and safety At Microsoft, our mission to empower people and organizations remains constant. As part of this commitment, models made available through Foundry undergo internal reviews and are deployed with safeguards designed to support responsible use at scale. Learn more about responsible AI at Microsoft. For GPT-image-2, Microsoft applied an in-depth safety approach that addresses disallowed content and misuse while maintaining human oversight. The deployment combines OpenAI’s image generation safety mitigations with Azure AI Content Safety, including filters and classifiers for sensitive content. Pricing Model Offer type Pricing - Image Pricing - Text GPT-image-2 Standard Global Input Tokens: $8 Cached Input Tokens: $2 Output Tokens: $30 Input Tokens: $5 Cached Input Tokens: $1.25 Output Tokens: $10 Note: All prices are per 1M token. Getting started Whether you’re building a personalized retail experience, automating visual content pipelines or accelerating design workflows. GPT-image-2 gives your team the resolution control and intelligent routing to generate images that fit your exact needs. Try the GPT-image-2 in Microsoft Foundry today! Deploy the model in Microsoft Foundry Experiment with the model in the Image playground Read the documentation to learn more11KViews3likes3CommentsCentralizing Enterprise API Access for Agent-Based Architectures
Problem Statement When building AI agents or automation solutions, calling enterprise APIs directly often means configuring individual HTTP actions within each agent for every API. While this works for simple scenarios, it quickly becomes repetitive and difficult to manage as complexity grows. The challenge becomes more pronounced when a single business domain exposes multiple APIs, or when the same APIs are consumed by multiple agents. This leads to duplicated configurations, higher maintenance effort, inconsistent behavior, and increased governance and security risks. A more scalable approach is to centralize and reuse API access. By grouping APIs by business domain using an API management layer, shaping those APIs through a Model Context Protocol (MCP) server, and exposing the MCP server as a standardized tool or connector, agents can consume business capabilities in a consistent, reusable, and governable manner. This pattern not only reduces duplication and configuration overhead but also enables stronger versioning, security controls, observability, and domain‑driven ownership—making agent-based systems easier to scale and operate in enterprise environments. Designing Agent‑Ready APIs with Azure API Management, an MCP Server, and Copilot Studio As enterprises increasingly adopt AI‑powered assistants and Copilots, API design must evolve to meet the needs of intelligent agents. Traditional APIs—often designed for user interfaces or backend integrations—can expose excessive data, lack intent-level abstraction, and increase security risk when consumed directly by AI systems. This document outlines a practical, enterprise-‑ready approach to organize APIs in Azure API Management (APIM), introduce a Model Context Protocol (MCP) server to shape and control context, and integrate the solution with Microsoft Copilot Studio. The goal is to make APIs truly agent-‑ready: secure, scalable, reusable, and easy to govern. Architecture at a glance Back-end services expose domain APIs. Azure API Management (APIM) groups and governs those APIs (products, policies, authentication, throttling, versions). An MCP server calls APIM, orchestrates/filters responses, and returns concise, model-friendly outputs. Copilot Studio connects to the MCP server and invokes a small set of predictable operations to satisfy user intents. Why Traditional API Designs Fall Short for AI Agents Enterprise APIs have historically been built around CRUD operations and service-‑to-‑service integration patterns. While this works well for deterministic applications, AI agents work best with intent-driven operations and context-aware responses. When agents consume traditional APIs directly, common issues include: overly verbose payloads, multiple calls to satisfy a single user intent, and insufficient guardrails for read vs. write operations. The result can be unpredictable agent behavior that is difficult to test, validate, and govern. Structuring APIs Effectively in Azure API Management Azure API Management (APIM) is the control plane between enterprise systems and AI agents. A well-‑structured APIM instance improves security, discoverability, and governance through products, policies, subscriptions, and analytics. Key design principles for agent consumption Organize APIs by business capability (for example, Customer, Orders, Billing) rather than technical layers. Expose agent-facing APIs via dedicated APIM products to enable controlled access, throttling, versioning, and independent lifecycle management. Prefer read-only operations where possible; scope write operations narrowly and protect them with explicit checks, approvals, and least-privilege identities. Read‑only APIs should be prioritized, while action‑oriented APIs must be carefully scoped and gated. The Role of the MCP Server in Agent‑Based Architectures APIM provides governance and security, but agents also need an intent-level interface and model-friendly responses. A Model Context Protocol (MCP) server fills this gap by acting as a mediator between Copilot Studio and APIM-exposed APIs. Instead of exposing many back-end endpoints directly to the agent, the MCP server can: orchestrate multiple API calls, filter irrelevant fields, enforce business rules, enrich results with additional context, and emit concise, predictable JSON outputs. This makes agent behavior more reliable and easier to validate. Instead of exposing multiple backend APIs directly to the agent, the MCP server aggregates responses, filters irrelevant data, enriches results with business context, and formats responses into LLM‑friendly schemas. By introducing this abstraction layer, Copilot interactions become simpler, safer, and more deterministic. The agent interacts with a small number of well‑defined MCP operations that encapsulate enterprise logic without exposing internal complexity. Designing an Effective MCP Server An MCP server should have a focused responsibility: shaping context for AI models. It should not replace core back-end services; it should adapt enterprise capabilities for agent consumption. What MCP should do An MCP server should be designed with a clear and focused responsibility: shaping context for AI models. Its primary role is not to replace backend services, but to adapt enterprise data for intelligent consumption. MCP does not orchestrate enterprise workflows or apply business logic. It standardizes how agents discover and invoke external tools and APIs by exposing them through a structured protocol interface. Orchestration, intent resolution, and policy-driven execution are handled by the agent runtime or host framework. It is equally important to understand what does not belong in MCP. Complex transactional workflows, long‑running processes, and UI‑specific formatting should remain in backend systems. Keeping MCP lightweight ensures scalability and easier maintenance. Call APIM-managed APIs and orchestrate multi-step retrieval when needed. Apply security checks and business rules consistently. Filter and minimize payloads (return only fields needed for the intent). Normalize and reshape responses into stable, predictable JSON schemas. Handle errors and edge cases with safe, descriptive messages. What MCP should not do Avoid implementing complex transactional workflows, long-running processes, or UI-specific formatting in MCP. Keep it lightweight so it remains scalable, testable, and easy to maintain. Step by step guide 1) Create an MCP server in Azure API Management (APIM) Open the Azure portal (portal.azure.com). Go to your API Management instance. In the left navigation, expand APIs. Create (or select) an API group for the business domain you want to expose (for example, Orders or Customers). Add the relevant APIs/operations to that API group. Create or select an APIM product dedicated for agent usage, and ensure the product requires a subscription (subscription key). Create an MCP server in APIM and map it to the API (or API group) you want to expose as MCP operations. In the MCP server settings, ensure Subscription key required is enabled. From the product’s Subscriptions page, copy the subscription key you will use in Copilot Studio. Screenshot placeholders: APIM API group, product configuration, MCP server mapping, subscription settings, subscription key location. * Note: Using an API Management subscription key to access MCP operations is one supported way to authenticate and consume enterprise APIs. However, this approach is best suited for initial setups, demos, or scenarios where key-based access is explicitly required. For production‑grade enterprise solutions, Microsoft recommends using managed identity–based access control. Managed identities for Azure resources eliminate the need to manage secrets such as subscription keys or client secrets, integrate natively with Microsoft Entra ID, and support fine‑grained role‑based access control (RBAC). This approach improves security posture while significantly reducing operational and governance overhead for agent and service‑to‑service integrations. Wherever possible, agents and MCP servers should authenticate using managed identities to ensure secure, scalable, and compliant access to enterprise APIs. 2) Create a Copilot Studio agent and connect to the APIM MCP server using a subscription key Copilot Studio natively supports Model Context Protocol (MCP) servers as tools. When an agent is connected to an MCP server, the tool metadata—including operation names, inputs, and outputs—is automatically discovered and kept in sync, reducing manual configuration and maintenance overhead. Sign in to Copilot Studio. Create a new agent and add clear instructions describing when to use the MCP tool and how to present results (for example, concise summaries plus key fields). Open Tools > Add tool > Model Context Protocol, then choose Create. Enter the MCP server details: Server endpoint URL: copy this from your MCP server in APIM. Authentication: select API Key. Header name: use the subscription key header required by your APIM configuration. Select Create new connection, paste the APIM subscription key, and save. Test the tool in the agent by prompting for a domain-specific task (for example, “Get order status for 12345”). Validate that responses are concise and that errors are handled safely. Screenshot placeholders: MCP tool creation screen, endpoint + auth configuration, connection creation, test prompt and response. Operational best practices and guardrails Least privilege by default: create separate APIM products and identities for agent scenarios; avoid broad access to internal APIs. Prefer intent-level operations: expose fewer, higher-level MCP operations instead of many low-level endpoints. Protect write operations: require explicit parameters, validation, and (when appropriate) approval flows; keep “read” and “write” tools separate. Stable schemas: return predictable JSON shapes and limit optional fields to reduce prompt brittleness. Observability: log MCP requests/responses (with sensitive fields redacted), monitor APIM analytics, and set alerts for failures and throttling. Versioning: version MCP operations and APIM APIs; deprecate safely. Security hygiene: treat subscription keys as secrets, rotate regularly, and avoid exposing them in prompts or logs. Summary As organizations scale agent‑based and Copilot‑driven solutions, directly exposing enterprise APIs to AI agents quickly becomes complex and risky. Centralizing API access through Azure API Management, shaping agent‑ready context via a Model Context Protocol (MCP) server, and consuming those capabilities through Copilot Studio establishes a clean and governable architecture. This pattern reduces duplication, enforces consistent security controls, and enables intent‑driven API consumption without exposing unnecessary backend complexity. By combining domain‑aligned API products, lightweight MCP operations, and least‑privilege identity‑based access, enterprises can confidently scale AI agents while maintaining strong governance, observability, and operational control. References Azure API Management (APIM) – Overview Azure API Management – Key Concepts Azure MCP Server Documentation (Model Context Protocol) Extend your agent with Model Context Protocol Managed identities for Azure resources – Overview321Views0likes0CommentsEnabling Agentic Data Governance with Hybrid Cloud Flexibility in Azure
The “Why” Do you manage data in a complex multi-cloud environment? Are you struggling with data silos, evolving regulations, and the pressure to maintain control and compliance across on-prem and multiple clouds? Do you ever wish an intelligent assistant could help shoulder the load of data governance? If so, I can relate. Let me tell you a story that might sound familiar. Meet Mark (pictured above). He is a data governance officer at Contoso (a fictional but very representative enterprise). Mark’s day job is ensuring data governance and compliance across his company’s vast hybrid cloud estate – think around ~2 million data assets sprawled across 12+ datacenters on-premises and in different public clouds. Regulatory requirements are constantly shifting. Customer data is increasingly sensitive. Each department and region has its own way of doing things. Mark is fighting an uphill battle with data silos and disconnected cloud operations. He bounces between a patchwork of tools – spreadsheets, cloud consoles, governance portals – trying to answer basic questions: Where is our data? Who’s using it? Are we in compliance? Armed with an old desk calculator and a pile of paper-based reports (a perfect 1990s backdrop), he is dealing with the data around him that has exploded in volume and complexity. What if Mark had a single pane of glass. The glass that reflects and acts. It reflects your governance state and enforces compliance – a self-hydrating pane of glass accompanied by a conversational AI. And he’s not alone. We’re all living in a data overload era. Every day, organizations generate and ingest more information than ever before. Transistors and mainframes gave way to the internet boom of the ’90s, then an explosion of mobile devices in the 2000s, social media in the 2010s, and now widespread cloud computing – all funneling data into our systems at an exponential rate. On top of that, a new wave of AI and conversational interfaces has arrived here in the mid-2020s, making data more accessible but also increasing expectations for real-time insight. It’s no wonder modern IT leaders feel overwhelmed. But these challenges are also opportunities. The way I see it, the incredible growth of data and cloud capabilities means we have a chance to reimagine data governance. The fact that I’m writing about this right now is no coincidence. My customers are looking to resolve problems in this space. In my conversations with them, I hear the same needs: We want better governance, more visibility, streamlined oversight… and cherry on top, we want it in an “agentic” fashion. In other words, they want to delegate the grunt work to the platform toolset augmented by AI, so they can focus on higher-value tasks. The “What” That vision – agentic data governance with hybrid cloud flexibility – became the driver for this work. This is a modular solution, and you have these building block style components (cloud services, governance tools, AI agents), which you can snap them together into an intended solution. Think of it as a jumpstart kit for continuous data governance across multiple clouds, with autonomous (“agentic”) assistance baked in that you can leverage and build upon. It’s not the final, productized solution – more a vision of what’s possible. Contoso’s Requirements These are the high-level requirements from Contoso: Data governance across clouds under one roof A single pane of glass dashboard consolidating reporting on the 5 governance domains: o Visibility on data residency and lineage o PII (Personally Identifiable Information) must run on a CC (Confidential Compute) o Security software (Defender) compliance o Resource tagging compliance (foundational for a good governance posture) o OS updates compliance Ability to enforce compliance in an agentic manner with a human in the loop Agentic enforcement of compliance pertaining to residency and confidential compute Solution – The breakdown The solution is comprised of 8 modules addressing these requirements. These solution modules are: Foundational (Landing zones, Data Sources, Operational setup, Policies, etc.) Dashboard Hydration + Agentic Reporting – Residency Compliance Dashboard Hydration + Agentic Reporting – Confidential Compute for PII Compliance Dashboard Hydration + Agentic Reporting – MS Defender Compliance Dashboard Hydration + Agentic Reporting – Resource Tag Compliance Dashboard Hydration + Agentic Reporting – OS Updates/Patch Compliance Enforce Compliance via Copilot Agent - Residency Compliance Enforce Compliance via Copilot Agent – CC PII Compliance Solution – The architecture view These are the main technical components that make up the solution architecture: Data sources of all shapes and sizes on the left, governed by the native Azure or the Arc plane. Additional Azure services across the bottom layer for the foundational governance posture Microsoft Purview, in the top middle, as the unified data governance platform Microsoft Fabric, in the bottom middle, as the end-to-end ingestion and analytics platform Microsoft Power Platform, on the right, as the low code/no code business flow and the copilot agent experience Solution – The end user view So how does Mark see this solution as a data governance officer? He doesn’t see all the intricacies of the solution integration and the logic execution. He sees two things: A Power BI dashboard running on Microsoft Fabric with A compliance dashboard with an overall score in each of the five compliance domains alongside scores for each of the data products across these domains Additional reporting views for more granular reporting Fabric-based pipeline that hydrates the underlying semantic models from various sources to keep the reports fresh and current A Copilot agent (in Teams) for both: Reporting on all compliance domains Enforcing in-scope compliance across selected domains The agent takes care of it - queries Fabric’s semantic model, calls Azure Function endpoints, updates Purview glossary terms, applies Azure tags, and sends Teams notifications. The “How” – Residency Compliance Let’s pick a few modules to walk through how these solution modules work together to give a cohesive agentic governance experience to Mark. It’s Monday morning, and Mark logs into the Contoso governance portal with a cup of coffee in hand. Instead of a dozen browser tabs, he has two main tools opened: the Data Governance Dashboard and the Contoso Governance Copilot agent. To address some inquiries that came as an assigned action to him, he interacted with the agent. During this interaction, not only did he validate if there were any residency missing in the unified data governance platform (Purview), but he was also able to address a mismatch between Purview and Azure resource, based on the designed principles. Here is the snippet of the chat: Now, under the hood, several components have worked on behalf of the agent in performing this governance checking and applying the necessary course of action: Even before Mark's conversation with the agent, an ongoing hydration process keeps the Fabric Power BI dashboard up to date. Dashboard Hydration + Agentic Reporting – Residency Compliance A Fabric notebook runs the residency scorecard code block through a pipeline. It reads two Lakehouse tables containing latest residency information from Purview and the approved region list Then, the notebook gets a Microsoft Entra bearer token Once acquired, the notebook then calls an Azure Function endpoint This endpoint, then searches for the Azure resources associated with the data products in Purview using an Azure resource tag. The notebook then compares the declared Purview residency with the approved region list and the associated resource’s region The notebook then calculates the final 0 / 25 / 50 / 75 / 100 residency compliance score and a reason. For example: A data product without an associated Azure resource gets a 0, while a data product whose residency in Purview is an approved region by Contoso, and also matches with the associated Azure resource, gets a 100. It then writes the results to the relevant residency compliance Lakehouse tables The dedicated compliance table then feeds to the semantic model for reporting The compliance Power BI dashboard is hydrated Enforce Compliance via Copilot Agent - Residency Compliance With the dashboard data regularly updated, the agent follows this logic, the updated reporting data, and the actions at its disposal, during the earlier conversation with Mark : Mark initiates the conversation with the agent The agent calls a Power Automate flow This flow retrieves Purview’s residency information stored in the Fabric semantic model 5, 6, 7 and 8. When Mark asks to investigate further on a data product, the agent carries the conversation using a topic, which then leverages a flow, which uses a Power Automate custom connector to access an Azure Function endpoint. This endpoint then retrieves latest glossary (residency) information about the data product in question, from Purview, and provides a preview back to the user 10, 11, 12, and 13. If the update criteria are met, and if there is no conflict, and with Mark’s blessings, the topic then calls another flow to access the Functions Purview Update endpoint, and make the glossary (residency) update in Purview for that data product The “How” – Confidential Compute for PII Compliance Dashboard Hydration + Agentic Reporting – Confidential Compute for PII Compliance The following snippet shows how Mark addresses the compliance risk with a critical data product (application), S/4 HANA, and performed the necessary compliance actions, such as tagging the associated resources and notifying the data product owners via Teams channel. The following diagram shows the under-the-hood hydration flow for confidential compute compliance: Enforce Compliance via Copilot Agent – CC PII Compliance Finally, the diagram below shows how Mark’s conversation flows through the main solution components: Outcome Stepping back, what did we accomplish for Mark and Contoso? We turned an onslaught of governance challenges into an opportunity to modernize how data is managed. This gave Mark: Centralized Visibility into data assets across the landscape through Purview and a unified dashboard Proactive compliance enabled with automated checks - controlled with Purview exports and Fabric pipeline schedules And compliance enforcement using an agent Hybrid Cloud Consistency. By using Azure Arc and a foundational data plane management setup Reduced Operational overhead with agentic reporting and compliance Though the solution is comprised of wide variety of components/services, it is built from standard building blocks and is relatively simple to implement. In total, the solution combined around a dozen Azure services and over 40 distinct components (from Purview catalogs to data pipelines, to custom functions and flows). You can choose to implement some or all the compliance domains. Or, better yet, build upon and create new domains and pave new paths. Wrap-up I believe many enterprises could take a similar journey. If you’re facing these issues, consider this an invitation to think differently about data governance. Start with the pieces you already have – your own building blocks of cloud services and data – and imagine what you could build. Chances are that a lot of the heavy lifting can be orchestrated with today’s technology. And with the rise of AI copilots, the dream of agentic data governance – where your policies are continuously enforced by smart agents – is no longer science fiction. It’s here, right now, waiting for you to take it for a spin. Next steps Watch the video narrative on SAP on Azure YouTube channel: Build it with the GitHub Repository: https://github.com/moazmirza/data-sov-and-hyb-cloud Comments/questions: Here, or @ LinkedIn /moazmirza Solution Selfies Azure Policy Compliance - Foundational Governance Posture Purview Data Product Catalog and Data Lineage Purview Governance Metadata à Fabric Lakehouse Fabric Semantic Model Additional Fabric Power BI Dashboard Copilot Studio Topic Flow Azure Function Endpoints219Views0likes0CommentsResource Guide: Making Physical AI Practical for Real‑World Industrial Operations
Microsoft’s adaptive cloud approach enables organizations to turn operational technology (OT) data into intelligent actions, autonomously, without requiring everything to live in the cloud by unifying cloud-to-edge management plane, data plane, and intelligence platform. At the center of this approach are key foundational technologies: Key Purpose Offering Direct-to-cloud device management + telemetry ingestion Azure IoT Hub Industrial connectivity + edge data plane Azure IoT Operations Unified analytics + real-time intelligence Microsoft Fabric On-device AI inferencing runtime Foundry Local Microsoft Azure IoT Gartner winner: Microsoft named a Leader in the 2025 Gartner® Magic Quadrant™ for Global Industrial IoT Platforms This blog walks through where to get started with each: 1. Manage Cloud-Connected Devices and Telemetry with Azure IoT Hub Azure IoT Hub is a fully managed cloud service that enables secure bidirectional communication, device-to-cloud telemetry ingestion, cloud-to-device command execution, per-device authentication, remote management and more. Telemetry from IoT Hub can also be routed downstream into analytics platforms like Microsoft Fabric for visualization or AI modeling. Recommended Usage: Devices that utilize IoT Hub are distributed, stand-alone devices with fixed-functions. These devices typically do not require cloud-managed containerized workloads or cloud-managed proximal industrial protocol connectivity. Examples of appropriate device-to-cloud IoT Hub endpoint devices include water monitoring stations, vehicle telematics, distributed fluid level sensors, etc. Resources Current in-market services overview: IoT Hub: What is Azure IoT Hub? - Azure IoT Hub DPS: Overview of Azure IoT Hub Device Provisioning Service - Azure IoT Hub Device Provisioning Service ADU: Introduction to Device Update for Azure IoT Hub Building scalable solutions with Azure IoT platform: Best practices for large-scale IoT deployments - Azure IoT Hub Device Provisioning Service Scale Out an Azure IoT Hub-based Solution to Support Millions of Devices - Azure Architecture Center Azure IoT Hub scaling Try out our preview of new IoT Hub capabilities (integration with Azure Device Registry and Certificate Management) Learn more about these capabilities on our blog post: Azure IoT Hub + Azure Device Registry (Preview Refresh): Device Trust and Management at Fleet Scale… Integration with Azure Device Registry (preview): Integration with Azure Device Registry (preview) - Azure IoT Hub Microsoft-backed X.509 certificate management (preview): What is Microsoft-backed X.509 Certificate Management (Preview)? - Azure IoT Hub How to start with the preview: Deploy IoT Hub with ADR integration and certificate management (Preview) - Azure IoT Hub 2. Connect Industrial Assets with Azure IoT Operations Azure IoT Operations provides a unified data plane for the edge that runs on Azure Arc–enabled Kubernetes clusters and supports open industrial standards. It allows organizations to connect and capture equipment telemetry, normalize OT data locally, route hot-path signals to real-time analytics, securely manage layered industrial networks, and more. Edge‑processed data can then be sent upstream to Microsoft Fabric for AI‑driven analysis. Recommended Usage: Azure IoT Operations is intended to be the data plane for an adaptive cloud deployment extending the management, data, and AI capabilities of the Microsoft cloud to an on-prem device. This device binds to these cloud planes providing a platform for local data processing and intermittent connectivity. The target for these devices range from a small-gateway-style PC to a full data center. Azure IoT Operations endpoints enable cloud-managed containerized workloads and cloud-managed proximal industrial protocol connectivity. Examples of appropriate adaptive cloud and Azure IoT Operations endpoints include, on-robot computers, industrial machine controllers, retail store sensor/vision processing, and top-of-factory site infrastructure for line of business applications. Resources Azure IoT Operations Overview Azure IoT Operations Documentation Hub Quickstart: explore-iot-operations/quickstart at main · Azure-Samples/explore-iot-operations Open-source framework for scaling robotics from simulation to production on Azure + NVIDIA: microsoft/physical-ai-toolchain How we built the demo: explore-iot-operations/quickstart at main · Azure-Samples/explore-iot-operations Edge-AI: microsoft/edge-ai: Production-ready Infrastructure as Code, applications, pluggable components, and… Latest Announcements & Blogs Making Physical AI Practical for Real-World Industrial Operations: Part 1 | Microsoft Community Hub Making Physical AI Practical for Real-World Industrial Operations: Part 2 | Microsoft Community Hub Unlock Industrial Intelligence | Microsoft Hannover Messe 2026 From pilots to production: How Microsoft and partners are accelerating intelligent operations 3. Advanced Analytics with Microsoft Fabric Microsoft Fabric delivers a unified, end‑to‑end analytics platform that transforms streaming OT telemetry into real‑time insights and live dashboards. Fabric Operations Agents monitor industrial signals to recommend targeted actions, while Fabric IQ provides a shared semantic foundation that enables AI agents to reason over enterprise data with business context. Together, Fabric turns live industrial data into AI‑powered operational intelligence. Get Started Get Started with Microsoft Fabric Learning Path Fabric Real-Time Intelligence documentation - Microsoft Fabric | Microsoft Learn Create and Configure Operations Agents - Microsoft Fabric | Microsoft Learn Fabric IQ documentation - Microsoft Fabric | Microsoft Learn 4.Run AI Models On‑Device with Foundry Local Foundry Local extends on‑device AI to Arc‑enabled Kubernetes edge clusters, providing a Microsoft‑validated inferencing layer for running AI models in industrial, disconnected or sovereign environments. Get Started Foundry Local on Azure Local Documentation - link Participate in Foundry Local on Azure Local preview form - link Foundry Local on Azure Local: HELM deployment Demo - link Customer Stories Chevron: Chevron plans facilities of the future with Azure IoT Operations Husqvarna: Husqvarna Group Boosts Operational Efficiency with Azure Adaptive Cloud Ecopetrol: Azure IoT Operations and Azure IoT for energy help Ecopetrol optimize energy distribution while lowering operational costs P&G: Procter & Gamble cuts model deployment time up to 90% with Azure IoT Operations Toyota: Toyota Industries innovates its paint shop processes with Azure industrial AI and Azure IoT Hub455Views0likes0CommentsVector Drift in Azure AI Search: Three Hidden Reasons Your RAG Accuracy Degrades After Deployment
What Is Vector Drift? Vector drift occurs when embeddings stored in a vector index no longer accurately represent the semantic intent of incoming queries. Because vector similarity search depends on relative semantic positioning, even small changes in models, data distribution, or preprocessing logic can significantly affect retrieval quality over time. Unlike schema drift or data corruption, vector drift is subtle: The system continues to function Queries return results But relevance steadily declines Cause 1: Embedding Model Version Mismatch What Happens Documents are indexed using one embedding model, while query embeddings are generated using another. This typically happens due to: Model upgrades Shared Azure OpenAI resources across teams Inconsistent configuration between environments Why This Matters Embeddings generated by different models: Exist in different vector spaces Are not mathematically comparable Produce misleading similarity scores As a result, documents that were previously relevant may no longer rank correctly. Recommended Practice A single vector index should be bound to one embedding model and one dimension size for its entire lifecycle. If the embedding model changes, the index must be fully re-embedded and rebuilt. Cause 2: Incremental Content Updates Without Re-Embedding What Happens New documents are continuously added to the index, while existing embeddings remain unchanged. Over time, new content introduces: Updated terminology Policy changes New product or domain concepts Because semantic meaning is relative, the vector space shifts—but older vectors do not. Observable Impact Recently indexed documents dominate retrieval results Older but still valid content becomes harder to retrieve Recall degrades without obvious system errors Practical Guidance Treat embeddings as living assets, not static artifacts: Schedule periodic re-embedding for stable corpora Re-embed high-impact or frequently accessed documents Trigger re-embedding when domain vocabulary changes meaningfully Declining similarity scores or reduced citation coverage are often early signals of drift. Cause 3: Inconsistent Chunking Strategies What Happens Chunk size, overlap, or parsing logic is adjusted over time, but previously indexed content is not updated. The index ends up containing chunks created using different strategies. Why This Causes Drift Different chunking strategies produce: Different semantic density Different contextual boundaries Different retrieval behavior This inconsistency reduces ranking stability and makes retrieval outcomes unpredictable. Governance Recommendation Chunking strategy should be treated as part of the index contract: Use one chunking strategy per index Store chunk metadata (for example, chunk_version) Rebuild the index when chunking logic changes Design Principles Versioned embedding deployments Scheduled or event-driven re-embedding pipelines Standardized chunking strategy Retrieval quality observability Prompt and response evaluation Key Takeaways Vector drift is an architectural concern, not a service defect It emerges from model changes, evolving data, and preprocessing inconsistencies Long-lived RAG systems require embedding lifecycle management Azure AI Search provides the controls needed to mitigate drift effectively Conclusion Vector drift is an expected characteristic of production RAG systems. Teams that proactively manage embedding models, chunking strategies, and retrieval observability can maintain reliable relevance as their data and usage evolve. Recognizing and addressing vector drift is essential to building and operating robust AI solutions on Azure. Further Reading The following Microsoft resources provide additional guidance on vector search, embeddings, and production-grade RAG architectures on Azure. Azure AI Search – Vector Search Overview - https://learn.microsoft.com/azure/search/vector-search-overview Azure OpenAI – Embeddings Concepts - https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/embeddings?view=foundry-classic&tabs=csharp Retrieval-Augmented Generation (RAG) Pattern on Azure - https://learn.microsoft.com/en-us/azure/search/retrieval-augmented-generation-overview?tabs=videos Azure Monitor – Observability Overview - https://learn.microsoft.com/azure/azure-monitor/overview617Views3likes2CommentsGemma 4 now available in Microsoft Foundry
Experimenting with open-source models has become a core part of how innovative AI teams stay competitive: experimenting with the latest architectures and often fine-tuning on proprietary data to achieve lower latencies and cost. Today, we’re happy to announce that the Gemma 4 family, Google DeepMind’s newest model family, is now available in Microsoft Foundry via the Hugging Face collection. Azure customers can now discover, evaluate, and deploy Gemma 4 inside their Azure environment with the same policies they rely on for every other workload. Foundry is the only hyperscaler platform where developers can access OpenAI, Anthropic, Gemma, and over 11,000+ models under a single control plane. Through our close collaboration with Hugging Face, Gemma 4 joining that collection continues Microsoft’s push to bring customers the widest selection of models from any cloud – and fits in line with our enhanced investments in open-source development. Frontier Intelligence, open-source weights Released by Google DeepMind on April 2, 2026, Gemma 4 is built from the same research foundation as Gemini 3 and packaged as open weights under an Apache 2.0 license. Key capabilities across the Gemma 4 family: Native multimodal: Text + image + video inputs across all sizes; analyze video by processing sequences of frames; audio input on edge models (E2B, E4B) Enhanced reasoning & coding capabilities: Multi-step planning, deep logic, and improvements in math and instruction-following enabling autonomous agents Trained for global deployment: Pretrained on 140+ languages with support for 35+ languages out of the box Long context: Context windows of up to 128K tokens (E2B/E4B) and 256K tokens (26B A4B/31B) allow developers to reason across extensive codebases, lengthy documents, or multi-session histories Why choose Foundry? Foundry is built to give developers breadth -- access to models from major model providers, open and proprietary, under one roof. Stay within Azure to work leading models. When you deploy through Foundry, models run inside your Azure environment and are subject to the same network policies, identity controls, and audit processes your organization already has in place. Managed online endpoints handle serving, scaling, and monitoring without manually setting up and managing the underlying infrastructure. Serverless deployment with Azure Container Apps allows developers to deploy and run containerized applications while reducing infrastructure management and saving costs. Gated model access integrates directly with Hugging Face user tokens, so models that require license acceptance stay compliant can be accessed without manual approvals. Foundry Local lets you run optimized Hugging Face models directly on your own hardware using the same model catalog and SDK patterns as your cloud deployments. Read the documentation here: https://aka.ms/foundrylocal and https://aka.ms/HF/foundrylocal Microsoft’s approach to Responsible AI is grounded in our AI principles of fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability. Microsoft Foundry provides governance controls, monitoring, and evaluation capabilities to help organizations deploy new models responsibly in production environments. What are teams building with Gemma 4 in Foundry Gemma 4’s combination of multimodal input, agentic function calling, and long context offers a wide range of production use cases: Document intelligence: Processing PDFs, charts, invoices, and complex tables using native vision capabilities Multilingual enterprise apps: 140+ natively trained languages — ideal for multinational customer support, content platforms as well as language learning tools for grammar correction and writing practice Long-context analytics: Reasoning across entire codebases, legal documents, or multi-session conversation histories Getting started Try Gemma 4 in Microsoft Foundry today. New models from Hugging Face continue to roll out to Foundry on a regular basis through our ongoing collaboration. If there's a model you want to see added, let us know here. Stay connected to our developer community on Discord and stay up to date on what is new in Foundry through the Model Mondays series.1.6KViews1like0CommentsIntroducing MAI-Image-2-Efficient: Faster, More Efficient Image Generation
Building on our momentum Just last week, we celebrated a major milestone: the public preview launch of three new first-party Microsoft AI models in Microsoft Foundry: MAI-Image-2, MAI-Voice-1, and MAI-Transcribe-1. Together, they represent a comprehensive multimedia AI stack purpose-built for developers that spans image generation, natural speech synthesis, and enterprise-grade transcription across 25 languages. The response from the developer community has been incredible, and we're not slowing down. Fast on the heels of that launch, we're thrilled to introduce the next addition to the MAI image generation family: MAI-Image-2-Efficient – or Image-2e for short. It’s now available in public preview in Microsoft Foundry and MAI Playground. What makes MAI-Image-2-Efficient unique? MAI-Image-2-Efficient is built on the same architecture as MAI-Image-2 which is the model that debuted at #3 on the Arena.ai leaderboard for image model families. Based on customer feedback, we’ve now improved it and engineered for speed and efficiency. It’s up to 22% faster with 4x more efficiency compared to MAI-Image-2 when normalized by latency and GPU usage 1 . It also outpaces leading text-to-image models by 40% on average 2 . In short, MAI-Image-2-Efficient gives developers more output for less compute, unlocking a whole new category of use cases. Who is MAI-Image-2-Efficient for? MAI-Image-2-Efficient is designed for builders who need high-quality image generation at speed and scale. Here are the top use cases where Image-2-Efficient shines: High-volume production workflows: E-commerce platforms, media companies, and marketing teams often need to generate thousands of images per day, as part of their business processes to generate targeted advertisements, concept art and mood boards. MAI-Image-2-Efficient's superior efficiency means larger batches at lower GPU cost, so your team can think and iterate as fast as you want and reach the end-product faster. Real-time and conversational experiences: When users expect images to appear mid-conversation (in a chatbot, a creative copilot, or an AI-powered design tool), every millisecond counts. Thanks to its lower latency, MAI-Image-2-Efficient serves as an excellent backbone for interactive applications that require fast response times. Rapid prototyping and creative iteration: MAI-Image-2-Efficient enables your team to quickly and affordably test new pipelines, experiment with creative ideas, or refine prompts. You don't need the complete model to validate a concept; what you need is speed, and that's exactly what MAI-Image-2-Efficient provides. MAI-Image-2 vs. MAI-Image-2-Efficient — which should you use? MAI-Image-2-Efficient and MAI-Image-2 are built for different strengths, so choosing the right model depends on the needs of your workflow. MAI-Image-2-Efficient is the ideal choice for high-volume workflows where latency and speed are priorities. If your pipeline needs to generate images quickly and at scale, MAI-Image-2-Efficient delivers without compromise. MAI-Image-2 is the recommended option when your images require precise, detailed text rendering, or when scenes demand the deepest photorealistic contrast and smoothness. The two models also have distinct visual signatures: MAI-Image-2-Efficient renders with sharpness and defined lines, making it a strong choice for illustration, animation, and photoreal images designed to grab attention. MAI-Image-2 delivers smoother, more nuanced contrast, making it the go-to for photorealistic imagery that prioritizes depth and subtlety. Try it today MAI-Image-2-Efficient is available now in Microsoft Foundry and MAI Playground. For builders in Foundry, MAI-Image-2-Efficient starts at $5 USD per 1M tokens for text input and $19.50 USD per 1M tokens for image output. And this is just the beginning. We have more exciting announcements lined up; stay tuned for what we're bringing to Microsoft Build 2026. References: As tested on April 13, 2026. Compared to MAI-Image-2 when normalized by latency and GPU usage. Throughput per GPU vs MAI-Image-2 on NVIDIA H100 at 1024×1024; measured with optimized batch sizes and matched latency targets. Results vary with batch size, concurrency, and latency constraints. As tested on April 13, 2026. Compared to Gemini 3.1 Flash (high reasoning), Gemini 3.1 Flash Image and Gemini 3 Pro Image: Measured at p50 latency via AI Studio API (1:1, 1K images; minimal reasoning unless noted; web search disabled). MAI-Image-2, MAI-Image-2-Efficient, GPT-Image-1.5-High: Measured at p50 latency via Foundry API.3.2KViews0likes0CommentsSimplifying Image Classification with Azure AutoML for Images: A Practical Guide
1. The Challenge of Traditional Image Classification Anyone who has worked with computer vision knows the drill: you need to classify images, so you dive into TensorFlow or PyTorch, spend days architecting a convolutional neural network, experiment with dozens of hyperparameters, and hope your model generalizes well. It’s time-consuming, requires deep expertise, and often feels like searching for a needle in a haystack. What if there was a better way? 2. Enter Azure AutoML for Images Azure AutoML for Images is a game-changer in the computer vision space. It’s a feature within Azure Machine Learning that automatically builds high-quality vision models from your image data with minimal code. Think of it as having an experienced ML engineer working alongside you, handling all the heavy lifting while you focus on your business problem. What Makes AutoML for Images Special? 1. Automatic Model Selection Instead of manually choosing between ResNet, EfficientNet, or dozens of other architectures, AutoML for Images (Azure ML) evaluates multiple state-of-the-art deep learning models and selects the best one for your specific dataset. It’s like having access to an entire model zoo with an intelligent curator. 2. Intelligent Hyperparameter Tuning The system doesn’t just pick a model — it optimizes it. Learning rates, batch sizes, augmentation strategies, and more are automatically tuned to squeeze out the best possible performance. What would take weeks of manual experimentation happens in hours. 3. Built-in Best Practices Data preprocessing, augmentation techniques, and training strategies that would require extensive domain knowledge are pre-configured and applied automatically. You get enterprise-grade ML without needing to be an ML expert. Key Capabilities The repository demonstrates several powerful features: Multi-class and Multi-label Classification: Whether you need to classify an image into a single category or tag it with multiple labels, AutoML manages both scenarios seamlessly. Format Flexibility: Works with standard image formats including JPEG and PNG, making it easy to integrate with existing datasets. Full Transparency: Unlike black-box solutions, you maintain complete visibility and control over the training process. You can monitor metrics, understand model decisions, and fine-tune as needed. Production-Ready Deployment: Once trained, models can be easily deployed to Azure endpoints, ready to serve predictions at scale. Real-World Applications The practical applications are vast: E-commerce: Automatically categorize product images for better search and recommendations. Healthcare: Classify medical images for diagnostic support. Manufacturing: Detect defects in production line images. Agriculture: Identify crop diseases or estimate yield from aerial imagery. Content Moderation: Automatically flag inappropriate visual content. 3. A Practical Example: Metal Defect Detection The repository includes a complete end-to-end example of detecting defects in metal surfaces — a critical quality control task in manufacturing. The notebooks demonstrate how to: Download and organize image data from sources like Kaggle, Create training and validation splits with proper directory structure, Upload data to Azure ML as versioned datasets, Configure GPU compute that scales based on demand, Train multiple models with automated hyperparameter tuning, Evaluate results with comprehensive metrics and visualizations, Deploy the best model as a production-ready REST API, Export to ONNX for edge deployment scenarios. The metal defect use case is particularly instructive because it mirrors real industrial applications where quality control is critical but expertise is scarce. The notebooks show how a small team can build production-grade computer vision systems without a dedicated ML research team. Getting Started: What You Need The prerequisites are straightforward: An Azure subscription (free tier available for experimentation) An Azure Machine Learning workspace Python 3.7 or later That’s it. No local GPU clusters to configure, no complex deep learning frameworks to master. Repository Structure The repository is thoughtfully organized into three progressive notebooks: Downloading images.ipynb Shows how to acquire and prepare image datasets Demonstrates proper directory structure for classification tasks Includes data exploration and visualization techniques image-classification-azure-automl-for-images/1. Downloading images.ipynb at main · retkowsky/image-classification-azure-automl-for-images Azure ML AutoML for Images.ipynb The core workflow: connect to Azure ML, upload data, configure training Covers both simple model training and advanced hyperparameter tuning Shows how to evaluate models and select the best performing one Demonstrates deployment to managed online endpoints image-classification-azure-automl-for-images/2. Azure ML AutoML for Images.ipynb at main · retkowsky/image-classification-azure-automl-for-images Edge with ONNX local model.ipynb Exports trained models to ONNX format Shows how to run inference locally without cloud connectivity Perfect for edge computing and IoT scenarios image-classification-azure-automl-for-images/3. Edge with ONNX local model.ipynb at main · retkowsky/image-classification-azure-automl-for-images Each Python notebook is self-contained with clear explanations, making it easy to understand each step of the process. You can run them sequentially to build a complete solution, or jump to specific sections relevant to your use case. The Developer Experience What sets this approach apart is the developer experience. The repository provides Python notebooks that guide you through the entire workflow. You’re not just reading documentation — you’re working with practical, runnable examples that demonstrate real scenarios. Let’s walk through the code to see how straightforward this actually is. Use-case description This image classification model is designed to identify and classify defects on metal surfaces in a manufacturing context. We want to classify defective images into Crazing, Inclusion, Patches, Pitted, Rolled & Scratches. Press enter or click to view image in full size All code and images are available here: retkowsky/image-classification-azure-automl-for-images: Azure AutoML for images — Image classification Step 1: Connect to Azure ML Workspace First, establish connection to your Azure ML workspace using Azure credentials: print("Connection to the Azure ML workspace…") credential = DefaultAzureCredential() ml_client = MLClient( credential, os.getenv("subscription_id"), os.getenv("resource_group"), os.getenv("workspace") ) print("✅ Done") That’s it. Step 2: Upload Your Dataset Upload your image dataset to Azure ML. The code handles this elegantly: my_images = Data( path=TRAIN_DIR, type=AssetTypes.URI_FOLDER, description="Metal defects images for images classification", name="metaldefectimagesds", ) uri_folder_data_asset = ml_client.data.create_or_update(my_images) print("🖼️ Informations:") print(uri_folder_data_asset) print("\n🖼️ Path to folder in Blob Storage:") print(uri_folder_data_asset.path) Your local images are now versioned data assets in Azure, ready for training. Step 3: Create GPU Compute Cluster AutoML needs compute power. Here’s how you create a GPU cluster that auto-scales: compute_name = "gpucluster" try: _ = ml_client.compute.get(compute_name) print("✅ Found existing Azure ML compute target.") except ResourceNotFoundError: print(f"🛠️ Creating a new Azure ML compute cluster '{compute_name}'…") compute_config = AmlCompute( name=compute_name, type="amlcompute", size="Standard_NC16as_T4_v3", # GPU VM idle_time_before_scale_down=1200, min_instances=0, # Scale to zero when idle max_instances=4, ) ml_client.begin_create_or_update(compute_config).result() print("✅ Done") The cluster scales from 0 to 4 instances based on workload, so you only pay for what you use. Step 4: Configure AutoML Training Now comes the magic. Here’s the entire configuration for an AutoML image classification job using a specific model (here a resnet34). It is possible as well to access all the available models from the image classification AutoML library. Press enter or click to view image in full size https://learn.microsoft.com/en-us/azure/machine-learning/how-to-auto-train-image-models?view=azureml-api-2&tabs=python#supported-model-architectures image_classification_job = automl.image_classification( compute=compute_name, experiment_name=exp_name, training_data=my_training_data_input, validation_data=my_validation_data_input, target_column_name="label", ) # Set training parameters image_classification_job.set_limits(timeout_minutes=60) image_classification_job.set_training_parameters(model_name="resnet34") That’s approximately 10 lines of code to configure what would traditionally require hundreds of lines and deep expertise. Step 5: Hyperparameter Tuning (Optional) Want to explore multiple models and configurations? image_classification_job = automl.image_classification( compute=compute_name, # Compute cluster experiment_name=exp_name, # Azure ML job training_data=my_training_data_input, # Training validation_data=my_validation_data_input, # Validation target_column_name="label", # Target primary_metric=ClassificationPrimaryMetrics.ACCURACY, # Metric tags={"usecase": "metal defect", "type" : "computer vision", "product" : "azure ML", "ai": "image classification", "hyper": "YES"}, ) image_classification_job.set_limits( timeout_minutes=60, # Timeout in min max_trials=5, # Max model number max_concurrent_trials=2, # Concurrent training ) image_classification_job.extend_search_space([ SearchSpace( model_name=Choice(["vitb16r224", "vits16r224"]), learning_rate=Uniform(0.001, 0.01), # LR number_of_epochs=Choice([15, 30]), # Epoch ), SearchSpace( model_name=Choice(["resnet50"]), learning_rate=Uniform(0.001, 0.01), # LR layers_to_freeze=Choice([0, 2]), # Layers to freeze ), ]) image_classification_job.set_sweep( sampling_algorithm="Random", # Random sampling to select combinations of hyperparameters. early_termination=BanditPolicy(evaluation_interval=2, # The model is evaluated every 2 iterations. slack_factor=0.2, # If a run’s performance is 20% worse than the best run so far, it may be terminated. delay_evaluation=6), # The policy waits until 6 iterations have completed before starting to # evaluate and potentially terminate runs. ) AutoML will now automatically try different model architectures, learning rates, and augmentation strategies to find the best configuration. Step 6: Launch Training Submit the job and monitor progress: # Submit the job returned_job = ml_client.jobs.create_or_update(image_classification_job) print(f"✅ Created job: {returned_job}") # Stream the logs in real-time ml_client.jobs.stream(returned_job.name) While training runs, you can monitor metrics, view logs, and track progress through the Azure ML Studio UI or programmatically. Step 7: Results Step 8: Deploy to Production Once training completes, deploy the best model as a REST endpoint: # Create endpoint configuration online_endpoint_name = "metal-defects-classification" endpoint = ManagedOnlineEndpoint( name=online_endpoint_name, description="Metal defects image classification", auth_mode="key", tags={ "usecase": "metal defect", "type": "computer vision" }, ) # Deploy the endpoint ml_client.online_endpoints.begin_create_or_update(endpoint).result() Your model is now a production API endpoint, ready to classify images at scale. Beyond the Cloud: Edge Deployment with ONNX One of the most powerful aspects of this approach is flexibility in deployment. The repository includes a third notebook demonstrating how to export your trained model to ONNX (Open Neural Network Exchange) format for edge deployment. This means you can: Deploy models on IoT devices for real-time inference without cloud connectivity Reduce latency by processing images locally on edge hardware Lower costs by eliminating constant cloud API calls Ensure privacy by keeping sensitive images on-premises The ONNX export process is straightforward and integrates seamlessly with the AutoML workflow. Your cloud-trained model can run anywhere ONNX Runtime is supported — from Raspberry Pi devices to industrial controllers. import onnxruntime # Load the ONNX model session = onnxruntime.InferenceSession("model.onnx") # Run inference locally results = session.run(None, {input_name: image_data}) This cloud-to-edge workflow is particularly valuable for manufacturing, retail, and remote monitoring scenarios where edge processing is essential. Interactive webapp for image classification Interpreting model predictions Deployed endpoint returns base64 encoded image string if both model_explainability and visualizations are set to True. Why This Matters? In the AI era, the competitive advantage isn’t about who can build the most complex models — it’s about who can deploy effective solutions fastest. Azure AutoML for Images democratizes computer vision by making sophisticated ML accessible to a broader audience. Small teams can now accomplish what previously required dedicated ML specialists. Prototypes that took months can be built in days. And the quality? Often on par with or better than manually crafted solutions, thanks to AutoML’s systematic approach and access to cutting-edge techniques. What the Code Reveals Looking at the actual implementation reveals several important insights: Minimal Boilerplate: The entire training pipeline — from data upload to model deployment — requires less than 50 lines of meaningful code. Compare this to traditional PyTorch or TensorFlow implementations that often exceed several hundred lines. Built-in Best Practices: Notice how the code automatically manages concerns like data versioning, experiment tracking, and compute auto-scaling. These aren’t afterthoughts — they’re integral to the platform. Production-Ready from Day One: The deployed endpoint isn’t a prototype. It includes authentication, scaling, monitoring, and all the infrastructure needed for production workloads. You’re building production systems, not demos. Flexibility Without Complexity: The simple API hides complexity without sacrificing control. Need to specify a particular model architecture? One parameter. Want hyperparameter tuning? Add a few lines. The abstraction level is perfectly calibrated. Observable and Debuggable: The `.stream()` method and comprehensive logging mean you’re never in the dark about what’s happening. You can monitor training progress, inspect metrics, and debug issues — all critical for real projects. The Cost of Complexity Traditional ML projects fail not because of technology limitations but because of complexity. The learning curve is steep, the iteration cycles are long, and the resource requirements are high. By abstracting away this complexity, AutoML for Images changes the economics of computer vision projects. You can now: Validate ideas quickly: Test whether image classification solves your problem before committing significant resources Iterate faster: Experiment with different approaches in hours rather than weeks Scale expertise: Enable more team members to work with computer vision, not just ML specialists Conclusion Image classification is a fundamental building block for countless AI applications. Azure AutoML for Images makes it accessible, practical, and production-ready. Whether you’re a seasoned data scientist looking to accelerate your workflow or a developer taking your first steps into computer vision, this approach offers a compelling path forward. The future of ML isn’t about writing more complex code — it’s about writing smarter code that leverages powerful platforms to deliver business value faster. This repository shows you exactly how to do that. Practical Tips from the Code After reviewing the notebooks, here are some key takeaways for your own projects: Start with a Single Model: The basic configuration with `model_name=”resnet34"` is perfect for initial experiments. Only move to hyperparameter sweeps once you’ve validated your data and use case. Use Tags Strategically: The code demonstrates adding tags to jobs and endpoints (e.g., `”usecase”: “metal defect”`). This becomes invaluable when managing multiple experiments and models in production. Leverage Auto-Scaling: The compute configuration with `min_instances=0` means you’re not paying for idle resources. The cluster scales up when needed and scales down to zero when idle. Monitor Training Live: The `ml_client.jobs.stream()` method is your best friend during development. You see exactly what’s happening and can catch issues early. Version Your Data: Creating named data assets (`name=”metaldefectimagesds”`) means your experiments are reproducible. You can always trace back which data version produced which model. Think Cloud-to-Edge: Even if you’re deploying to the cloud initially, the ONNX export capability gives you flexibility for future edge scenarios without retraining. Resources Azure ML: https://azure.microsoft.com/en-us/products/machine-learning Demos notebooks: https://github.com/retkowsky/image-classification-azure-automl-for-images AutoML for Images documentation: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-auto-train-image-models Available models: Set up AutoML for computer vision — Azure Machine Learning | Microsoft Learn Connect with the author: https://www.linkedin.com/in/serger/227Views0likes0Comments