artificial intelligence
166 TopicsFoundry IQ: boost response relevance by 36% with agentic retrieval
The latest RAG performance evaluations and results for knowledge bases and built-in agentic retrieval engine. Foundry IQ by Azure AI Search is a unified knowledge layer for agents, designed to improve response performance, automate RAG workflows and enable enterprise-ready grounding. These evaluations tested RAG performance for knowledge bases and new features including retrieval reasoning effort and federated sources like web and SharePoint for M365. Foundry IQ and Azure AI Search are part of Microsoft Foundry.2.7KViews4likes0CommentsSecuring Azure AI Applications: A Deep Dive into Emerging Threats | Part 1
Why AI Security Can’t Be Ignored? Generative AI is rapidly reshaping how enterprises operate—accelerating decision-making, enhancing customer experiences, and powering intelligent automation across critical workflows. But as organizations adopt these capabilities at scale, a new challenge emerges: AI introduces security risks that traditional controls cannot fully address. AI models interpret natural language, rely on vast datasets, and behave dynamically. This flexibility enables innovation—but also creates unpredictable attack surfaces that adversaries are actively exploiting. As AI becomes embedded in business-critical operations, securing these systems is no longer optional—it is essential. The New Reality of AI Security The threat landscape surrounding AI is evolving faster than any previous technology wave. Attackers are no longer focused solely on exploiting infrastructure or APIs; they are targeting the intelligence itself—the model, its prompts, and its underlying data. These AI-specific attack vectors can: Expose sensitive or regulated data Trigger unintended or harmful actions Skew decisions made by AI-driven processes Undermine trust in automated systems As AI becomes deeply integrated into customer journeys, operations, and analytics, the impact of these attacks grows exponentially. Why These Threats Matter? Threats such as prompt manipulation and model tampering go beyond technical issues—they strike at the foundational principles of trustworthy AI. They affect: Confidentiality: Preventing accidental or malicious exposure of sensitive data through manipulated prompts. Integrity: Ensuring outputs remain accurate, unbiased, and free from tampering. Reliability: Maintaining consistent model behavior even when adversaries attempt to deceive or mislead the system. When these pillars are compromised, the consequences extend across the business: Incorrect or harmful AI recommendations Regulatory and compliance violations Damage to customer trust Operational and financial risk In regulated sectors, these threats can also impact audit readiness, risk posture, and long-term credibility. Understanding why these risks matter builds the foundation. In the upcoming blogs, we’ll explore how these threats work and practical steps to mitigate them using Azure AI’s security ecosystem. Why AI Security Remains an Evolving Discipline? Traditional security frameworks—built around identity, network boundaries, and application hardening—do not fully address how AI systems operate. Generative models introduce unique and constantly shifting challenges: Dynamic Model Behavior: Models adapt to context and data, creating a fluid and unpredictable attack surface. Natural Language Interfaces: Prompts are unstructured and expressive, making sanitization inherently difficult. Data-Driven Risks: Training and fine-tuning pipelines can be manipulated, poisoned, or misused. Rapidly Emerging Threats: Attack techniques evolve faster than most defensive mechanisms, requiring continuous learning and adaptation. Microsoft and other industry leaders are responding with robust tools—Azure AI Content Safety, Prompt Shields, Responsible AI Frameworks, encryption, isolation patterns—but technology alone cannot eliminate risk. True resilience requires a combination of tooling, governance, awareness, and proactive operational practices. Let's Build a Culture of Vigilance: AI security is not just a technical requirement—it is a strategic business necessity. Effective protection requires collaboration across: Developers Data and AI engineers Cybersecurity teams Cloud platform teams Leadership and governance functions Security for AI is a shared responsibility. Organizations must cultivate awareness, adopt secure design patterns, and continuously monitor for evolving attack techniques. Building this culture of vigilance is critical for long-term success. Key Takeaways: AI brings transformative value, but it also introduces risks that evolve as quickly as the technology itself. Strengthening your AI security posture requires more than robust tooling—it demands responsible AI practices, strong governance, and proactive monitoring. By combining Azure’s built-in security capabilities with disciplined operational practices, organizations can ensure their AI systems remain secure, compliant, and trustworthy, even as new threats emerge. What’s Next? In future blogs, we’ll explore two of the most important AI threats—Prompt Injection and Model Manipulation—and share actionable strategies to mitigate them using Azure AI’s security capabilities. Stay tuned for practical guidance, real-world scenarios, and Microsoft-backed best practices to keep your AI applications secure. Stay Tuned.!531Views3likes0CommentsGPT‑5.1 in Foundry: A Workhorse for Reasoning, Coding, and Chat
The pace of AI innovation is accelerating, and developers—across startups and global enterprises—are at the heart of this transformation. Today marks a significant moment for enterprise AI innovation: Azure AI Foundry is unveiling OpenAI’s GPT-5.1 series, the next generation of reasoning, analytics, and conversational intelligence. The following models will be rolling out in Foundry today: GPT-5.1: adaptive, more efficient reasoning GPT-5.1-chat: chat with new chain-of-thought for end-users GPT-5.1-codex: optimized for long-running conversations with enhanced tools and agentic workflows GPT-5.1-codex-mini: a compact variant for resource-constrained environments What’s new with GPT-5.1 series The GPT-5.1 series is built to respond faster to users in a variety of situations with adaptive reasoning, improving latency and cost efficiency across the series by varying thinking time more significantly. This, combined with other tooling improvements, enhanced stepwise reasoning visibility, multimodal intelligence, and enterprise-grade compliance. GPT-5.1: Adaptive and Efficient Reasoning GPT-5.1 is the mainline model engineered to deliver adaptive, stepwise reasoning that adjusts its approach based on the complexity of each task. Core capabilities included: Adaptive reasoning for nuanced, context-aware thinking time Multimodal intelligence: supporting text, image, and audio inputs/outputs Enterprise-grade performance, security, and compliance This model’s flexibility empowers developers to tackle a wide spectrum of tasks—from simple queries to deep, multi-step workflows for enterprise-grade solutions. With its ability to intelligently balance speed, cost, and intelligence, GPT-5.1 sets a new standard for both performance and efficiency in AI-powered development. GPT-5.1-chat: Elevating Interactive Experiences with Smart, Safe Conversations GPT-5.1-chat powers fast, context-aware chat experiences with adaptive reasoning and robust safety guardrails. With chain-of-thought added in the chat for the first time, it brings an interactive experience to the next level. It’s tuned for safety and instruction-following, making it ideal for customer support, IT helpdesk, HR, and sales enablement. Multimodal chat (text, image, and audio) improves long-turn consistency for real problem solving, delivering brand-aligned, safe conversations, and supporting next-best-action recommendations. GPT-5.1-codex and GPT-5.1-codex-mini: Frontier Models for Agentic Coding GPT-5.1-codex builds on the foundation set by GPT-5-codex, advancing developer tooling with: Enhanced reasoning frameworks for stepwise, context-aware code analysis and generation; plus Enhanced tool handling for certain development scenario's Multimodal intelligence for richer developer experiences when coding With Foundry’s enterprise-grade security and governance, GPT-5.1-codex is ideal for automated code generation and review, accelerating development cycles with intelligent code suggestions, refactoring, and bug detection. GPT-5.1-codex-mini is a compact, efficient variant optimized for resource-constrained environments. It maintains near state-of-the-art performance, multimodal intelligence, and the same safety stack and tool access as GPT-5.1-codex, making it best for cost-effective, scalable solutions in education, startups, and cost-conscience settings. Together, these Codex models empower teams to innovate faster and with greater confidence. Selecting Your AI Engine: Match Model Strengths to Your Business Goals One of the advantages of the GPT-5.1 series is unified access to deep reasoning, adaptive chat, and advanced coding—all in one place. Here’s how to match model strengths to your needs: Opt for GPT-5.1 for general ai application use—tasks like analytics, research, legal/financial review, or consolidating large documents and codebases. It’s the model of choice for reliability and high-impact outputs. Go with GPT-5.1-chat for interactive assistants and product UX, especially when adaptive reasoning is required for complex cases. Reasoning hints and adaptive reasoning help with customer latency perception. Leverage GPT-5.1-codex for deep, stepwise reasoning in complex code generation, refactoring, or multi-step analysis—ideal for demanding agentic workflows and enterprise automation. Utilize GPT-5.1-codex-mini for efficient, cost-effective coding intelligence in broad-scale deployment, education, or resource-constrained environments—delivering near-mainline performance in a compact model. Deployment and Pricing Model Deployment Available Regions Pricing ($/million tokens) Input Cached Input Output GPT-5.1 Standard Global Global $1.25 $0.125 $10.00 Standard Data Zone Data Zone (US & EU) $1.38 $0.14 $11.00 GPT-5.1-chat Standard Global Global $1.25 $0.125 $10.00 GPT-5.1-codex Standard Global Global $1.25 $0.125 $10.00 GPT-5.1-codex-mini Standard Global Global $0.25 $0.025 $2.00 Start Building Today The GPT-5.1 series is now available in Foundry Models. Whether you’re building for enterprise, small and medium-sized business, or launching the next digital-native app, these models and the Foundry platform are designed to help you innovate faster, safer, and at scale.16KViews1like22CommentsMiniMax-M2: The Open-Source Innovator in Coding and Agentic Workflows Now in Azure AI Foundry
We’re thrilled to announce that MiniMax-M2, the latest breakthrough from MiniMax, is now available in Azure AI Foundry through Hugging Face. Built for developers, this model advances capabilities for what’s possible in coding, multi-turn reasoning, and agentic workflows—while delivering enhanced efficiency and scalability. What makes MiniMax-M2 different? MiniMax-M2 isn’t just another large language model—it’s a 230B-parameter Mixture of Experts (MoE) architecture that activates 10B parameters per task, ensuring better performance at a lower cost. This design enables: Enhanced efficiency: Achieve top-tier results up to 8% of the cost of comparable models. Increased context handling: With an industry-leading 204K token context window and 131K output capacity, MiniMax-M2 can process entire codebases, multi-file projects, and long-form documentation without losing coherence. Commercial ready: Released under Apache 2.0, MiniMax-M2 is open-source and ready to deploy into your workflow. The model was ranked #5 overall on the Artificial Analysis Intelligence Index, making MiniMax-M2 one of the highest-ranked open-source model globally, outperforming many proprietary systems in reasoning, coding, and language understanding. For organizations seeking high-throughput, low-latency deployments, MiniMax-M2 runs seamlessly on an 8xH100 setup using vLLM, making it both powerful and practical. The graphic above compares MiniMax-M2’s performance across multiple industry-standard benchmarks against leading models like DeepSeek-V3.2, GLM-4.6, and Gemini 2.5 Pro. While proprietary models such as GPT-5 (thinking) and Claude Sonnet 4.5 remain strong in certain areas, MiniMax-M2 delivers competitive results as an open-source solution, offering enterprise-grade performance for organizations seeking high-quality AI without compromising scalability or flexibility. Why it matters for developers MiniMax-M2 is built for modern development workflows. Whether you’re generating production-ready code, automating agentic tasks, or managing large-scale projects, this model delivers accuracy, speed, and flexibility while keeping infrastructure costs in check. Mixture of Experts Architecture: 230B total parameters, 10B active per task for cost-effective scalability. Ultra-Large Context Window: 204K tokens for comprehensive project understanding. Advanced Coding Intelligence: Optimized for code generation, debugging, multi-file editing, and test-driven development. Agentic Workflow Support: Handles complex tool integrations and multi-step problem-solving with ease. Open Source Freedom: Apache 2.0 license for commercial use. MiniMax-M2 can support finance and legal workflows by automating document-heavy tasks. In finance, it could help generate audit reports, investment summaries, and portfolio analyses by processing large datasets and regulatory guidelines in a single pass, which can improve accuracy and reduce manual effort. In legal, it could assist with case law research by summarizing extensive statutes and precedents, extracting relevant insights, and providing context-specific recommendations. With its large context window and reasoning capabilities, MiniMax-M2 can enable faster, more efficient handling of complex information, allowing professionals to focus on higher-value activities. Get started today MiniMax-M2 is now live in Azure AI Foundry, explore its capabilities and try it today.687Views0likes0CommentsAnnouncing GPT‑5‑Codex: Redefining Developer Experience in Azure AI Foundry
Today, we’re excited to announce OpenAI’s GPT‑5‑Codex is generally available in Azure AI Foundry, and in public preview for GitHub Copilot in Visual Studio Code. This release is the next step in our continuous commitment to empower developers with the latest model innovation, now building on the proven strengths of the earlier Codex generation along with the speed and CLI fluency many teams have adopted with the latest codex‑mini. Next-level features for developers Multimodal coding in a single flow: GPT-5-Codex accepts multimodal inputs including text and image. With this multimodal intelligence, developers are now empowered to tackle complex tasks, delivering context-aware, repository-scale solutions in one single workflow. Advanced tool use across various experiences: GPT-5-Codex is built for real-world developer experiences. Developers in Azure AI Foundry can get seamless automation and deep integration via the Response API, improving developers’ productivity and reducing development time. Code review expertise: GPT‑5‑Codex is specially trained to conduct code reviews and surface critical flows, helping developers catch issues early and improve code quality with AI-powered insights. It transforms code review from a manual bottleneck into an intelligent, adaptive and integrated process, empowering developers to deliver high-quality code experience. How GPT‑5‑Codex makes your life easier Stay in flow, not in friction: With GPT‑5‑Codex, move smoothly from reading issues to writing code and checking UI; all in one place. It keeps context, so developers stay focused and productive. No more jumping between tools or losing track of what they were doing. Refactor and migrate with confidence: Whether cleaning up code or moving to a new framework, GPT‑5‑Codex helps stage updates, run tests, and fix issues as you go. It’s like having a digital colleague for those tricky transitions. Hero use cases: real impact for developers Repo‑aware refactoring assistant: Feed repo and architecture diagrams to GPT‑5‑Codex. Get cohesive refactors, automated builds, and visual verification via screenshots. Flaky test hunter: Target failing test matrices. The model executes runs, polls status, inspects logs, and recommends fixes looping until stability. Cloud migration copilot: Edit IaC scripts, kick off CLI commands, and iterate on errors in a controlled loop, reducing manual toil. Pricing and Deployment available at GA Deployment Available Region Pricing ($/million tokens) Standard Global East US 2 Sweden Central Input Cached Input Output $1.25 $0.125 $10.00 GPT-5-Codex is bringing developers’ coding experience to a new level. Don’t just write code. Let’s redefine what’s possible. Start building with GPT-5-Codex today and turn your bold ideas into reality now powered by the latest innovation in Azure AI Foundry.6.4KViews2likes2CommentsThe Future of AI: The Model is Key, but the App is the Doorway
This post explores the real-world impact of GPT-5 beyond benchmark scores, focusing on how application design shapes user experience. It highlights early developer feedback, common integration challenges, and practical strategies for adapting apps to leverage the advanced capabilities of GPT-5 in Foundry Models. From prompt refinement to fine-tuning to new API controls, learn how to make the most of this powerful model.568Views3likes0CommentsReal-Time Speech Intelligence for Global Scale: gpt-4o-transcribe-diarize in Azure AI Foundry
Voice is a natural interface for communication. Now, with the general availability of gpt-4o-transcribe-diarize, the new automatic speech recognition (ASR) model in Azure AI Foundry, transforming speech into actionable text is faster, smarter, and more accurate than ever. This launch marks a significant milestone in our mission to empower organizations with AI that delivers speed, accuracy, and enterprise-grade reliability. With gpt-4o-transcribe-diarize seamlessly integrated, businesses can unlock critical insights from conversations, instantly converting audio into text with ultra-low latency and outstanding accuracy across 100+ languages. Whether you're enhancing live event accessibility, analyzing customer interactions, or enabling intelligent voice-driven applications, gpt-4o-transcribe-diarize helps capture spoken word and leverages it for real-time decision-making. Experience how Azure AI’s innovation in speech technology is helping to redefine productivity and global reach, setting a new standard for audio intelligence in the enterprise landscape. Why gpt-4o-transcribe-diarize Matters Businesses today operate in a world where conversations drive decisions. From customer support calls to virtual meetings, audio data holds critical insights. Gpt-4o-transcribe-diarize unlocks these insights, converting speech to text with ultra-low latency and high accuracy across 100+ languages. Whether you’re captioning live events, analyzing call center interactions, or building voice-driven applications, gpt-4o-transcribe-diarize offers the opportunity to help your workflows be powered by real-time intelligence. Key Features Lightning-Fast Transcription: Convert 10 minutes of audio in ~15 seconds with our new Fast Transcription API. Global Language Coverage: Support for 100+ languages and dialects for inclusive, global experiences. Seamless Integration: Available in Azure AI Foundry with managed endpoints for easy deployment and scale. Real-World Impact Imagine a reporter summarizing interviews in real time, a financial institution transcribing calls instantly, or a global retailer powering multilingual voice assistants; all with the speed and security of Azure AI Foundry. gpt-4o-transcribe-diarize can make these scenarios possible today. Pricing and regional availability for gpt-4o-transcribe-diarize Model Deployment Regions Price $/1m tokens gpt-4o-transcribe-diarize Global Standard (Paygo) East US 2, Sweden Central Text input: $2.50 Audio input: $6.00 Output: $10.00 gpt-4o-transcribe-diarize in audio AI innovation context gpt-4o-transcribe-diarize is part of a broader wave of audio AI innovation on Azure, joining new models like OpenAI gpt-realtime and gpt-audio that are purpose-built for expressive, low-latency voice experiences. While gpt-4o-transcribe-diarize delivers ultra-fast transcription with enterprise-grade accuracy, gpt-realtime enables natural, emotionally rich voice interactions with millisecond responsiveness—ideal for live conversations, voice agents, and multimodal applications. Meanwhile, audio models like gpt-4o-transcribe mini, and mini-tts extend the platform’s capabilities with customizable speech synthesis and real-time captioning, making Azure AI a comprehensive solution for building intelligent, production-ready voice systems. gpt-realtime Features OpenAI claims the gpt-realtime model introduces a new standard for voice-first applications, combining expressive audio generation with ultra-low latency and natural conversational flow. It’s designed to power real-time interactions that feel like natural, responsive speech. Key Features: Millisecond Latency: Enables live responsiveness suitable for real-time conversations, kiosks, and voice agents. Emotionally Expressive Voices: Supports nuanced speech delivery with voices like Marin and Cedar, capable of conveying tone, emotion, and intent. Natural Turn-Taking: Built-in mechanisms for detecting pauses and transitions, allowing fluid back-and-forth dialogue. Function Calling Support: Seamlessly integrates with backend systems to trigger actions based on voice input. Multimodal Readiness: Designed to work with text, audio, and visual inputs for rich, interactive experiences. Stable APIs for Production: Enterprise-grade reliability with consistent behavior across sessions and deployments. These features make gpt-realtime a foundational model for building intelligent voice interfaces that go beyond transcription—delivering conversational intelligence in real time. gpt-realtime Use Cases With its expressive audio capabilities and real-time responsiveness, gpt-realtime unlocks new possibilities across industries. Whether enhancing customer engagement or streamlining operations, it brings voice AI into the heart of enterprise workflows. Examples include: Customer Service Agents: Power virtual agents that respond instantly with natural, tones for rich expressiveness, improving customer satisfaction and reducing wait times. Retail Kiosks & Smart Devices: Enable voice-driven product discovery, troubleshooting, and checkout experiences with real-time feedback. Multilingual Voice Assistants: Deliver localized, expressive voice experiences across global markets with support for multiple languages and dialects. Live Captioning & Accessibility: Combine gpt-4o-transcribe-diarize gpt-realtime to provide real-time captions and voice synthesis for inclusive experiences. These use cases demonstrate how gpt-realtime transforms voice into a strategic interface—bridging human communication and intelligent systems with speed and accuracy. Ready to transform voice into value? Learn more and start building with gpt-4o-transcribe-diarize3.7KViews0likes1CommentQuick look at journey of Agentic Solutions, from No‑code to Developer tools
Why this journey matters My journey with Bot, virtual agents and personal assistants has been quite long and, in this time, not only has the usage and user scenario evolved but the technology and platforms that fueled it significantly changed as well. Agentic solutions are no longer just “chat with documents, knowledgebases or hand curate the decision making into the AI services” - The bar has moved to systems that understand context, invoke tools, and complete workflows—with the governance and telemetry your business requires, and the new tools that are at our disposal. In this article, I’m going through the notes that I have made and formulated approaches that I go through as I work on new AI solutions and AI projects. I have also added a checklist and a 90-day plan, if you are lucky enough to launch an AI Agentic project and want to start in a structured way from small wins to big bang. While navigating various scenarios and projects, I have developed and refined this practical approach/progression. This methodology gradually evolved as I encountered different timeline constraints and use cases. No‑code for rapid wins inside Microsoft 365 Low‑code for richer conversation design and workflow orchestration Pro‑code for robust model choice, evaluation, safety, and operations on Azure Use it as a blueprint to decide where to start, when to step up, and how to land production quality without over‑engineering day one. With this approach, I have seen team formation evolve as well. While some use cases will hit fruition at Low-code stage itself, there will be few that will be adopted for Pro-code and involve larger Development team and more matured, DevOps processes. The spectrum at a glance Layer Primary Builder Best For Integration Depth Time‑to‑Value Microsoft 365 Copilot – Agent Builder (No‑code) Smart users, business leads Q&A, task helpers, quick pilots in Teams/Outlook Connect org content and simple actions Fastest Microsoft Copilot Studio (Low‑code) Citizen developers, power users Multi‑turn conversations, API actions, enterprise data Custom connectors, policies, orchestration Weeks Azure AI Foundry (Pro‑code) Developers, architects Model selection, evaluation, safety, observability Prompt flows, CI/CD, monitoring, scale Project lifecycle Start: No‑code with Microsoft 365 Copilot Agent Builder When you need impact now, or something that you want to automate quickly, including your daily routine or a quick business process - embedded intelligence where people work every day. What you can achieve Answer policy and product questions grounded in your internal content Automate simple tasks (drafts, reminders, status messages) Share quickly in Teams to capture user feedback Collaborate and share with your teammates. How to approach Define one job to be done (e.g., “answer 80% of field FAQs”). Attach one high‑quality content source (structured SharePoint library beats scattered files). Add one action that saves clicks (create a task, send a summary). Pilot with a small group; measure deflection, satisfaction, and turnaround time. Guardrails from day one Keep scope narrow, content curated, and responses concise. Document the agent’s mandate and what it won’t do (set expectations). Level up: Low‑code with Copilot Studio Transition to this approach when your project requires designed conversations, conditional logic, and system actions—all without needing to move into full pro-code development. This method is especially effective for quickly deploying agents across a department, particularly for straightforward use cases, simple automations, and workflows that require more extensive reach. It enables broader automation and process improvement while maintaining a low-code approach that remains accessible to a wider range of users. What you can achieve Model topics/intents and multi‑turn dialogues. Call internal and external APIs via custom connectors Apply business rules before actions are carried out. Design tips Structure the conversation: greet → clarify → retrieve/act → confirm → summarize. Separate knowledge from behavior: keep content where it’s governed; keep logic in Studio. Instrument outcomes: track successful task completion, not just messages exchanged. Deep analytics into usage etc. Integration patterns Internal systems (HR, finance, CRM) through connectors. Event-driven flows (create tickets, update records, trigger notifications). Approval handoffs when confidence is low. Production grade: Pro‑code with Azure AI Foundry When correctness, safety, scale, and cost matter, graduate to developer tooling on Azure. Why this layer Model choice: right‑fit models (capability, latency, cost) for each task. Prompt orchestration: multi‑step reasoning and tool calling. Evaluation: offline tests before release and live monitoring after. Safety: input/output filtering and policy enforcement. Operations: CI/CD, observability, and performance management. Standard development process and tooling: I emphasize largely AI Models and Azure AI Foundry here, however the standard development practices, code security, Identity and access, compliance, testing etc. will remain same. Engineering flow that works Frame the objective: Define success metrics (quality, safety, and business KPIs). Prototype prompt flows: Start small, version them, and add tool calls only where needed. Evaluate before you ship: Use curated datasets for offline tests; include tricky edge cases. Harden safety: Enable content filters, set thresholds, and log decisions for auditability. Ship with telemetry: Track latency, cost per task, answer accuracy, and user feedback. Continuously improve: Roll updates behind flags, watch for drift, and retrain or return when needed. Reference architecture (conceptual) Experience → Teams/web/app Orchestration → Copilot Studio (dialog, routing, actions) AI Services → Azure AI Foundry (models, prompt flows, evaluation, safety, monitoring) Enterprise systems → Data platforms, line‑of‑business APIs, automation services Key principles Separation of concerns: UI ≠ Conversation logic ≠ Model/runtime ≠ Business systems. Least privilege: Only the permissions and scopes the agent truly needs. Observability first: Logs, traces, and quality events from day one. Human‑in‑the‑loop: Escalation paths for low‑confidence or sensitive requests. My 90‑day plan Days 1–30: Prove value Ship two no‑code agents for different teams. Measure deflection %, response helpfulness, and time saved. Days 31–60: Orchestrate actions Rebuild one agent in Copilot Studio with a clear dialog flow. Add a secure API action and an approval fallback. Days 61–90: Operationalize Port the highest‑impact scenario to Foundry. Implement offline evaluation, enable safety filters, deploy to a controlled audience, and set up monitoring dashboards. Design checklists (save for later) No-code launch checklist ☐ One job to be done ☐ Single, high quality knowledge source ☐ One user visible action ☐ Pilot cohort & feedback channel Low-code orchestration checklist ☐ Dialog flow defined (happy path + clarifications) ☐ Input validation before actions ☐ Connector secrets managed securely ☐ Outcome metrics (task completion, reengagement) Pro-code readiness checklist ☐ Model fit (capability, latency, cost) documented ☐ Offline evaluation set with edge cases ☐ Safety filters configured and logged ☐ Monitoring, alerting, and rollback plan Common pitfalls and how to avoid them Starting big: Begin with one clearly defined outcome; expand only after you see measurable impact. Over‑indexing on chat: Instrument task completion, not just message counts. Hidden coupling: Don’t bury business logic inside prompts; keep rules visible and testable. Skipping eval: Always gate releases with a small, representative test set. No feedback loop: Capture user feedback in‑product and close the loop with updates. Final take Stay on the course and go progressive: 1) No‑code for momentum and adoption, 2) Low‑code for richer conversations and actions, and 3) Pro‑code for the rigor that production demands. Treat evaluation, safety, and observability as core features and focus on it from day 1, not afterthoughts. That’s how you build agentic solutions that are useful on day one and trustworthy on day one hundred. These links cover the full journey from no-code to pro-code, including responsible AI practices: Microsoft 365 Copilot Agent Builder Overview https://learn.microsoft.com/en-us/microsoft-365-copilot/extensibility/agents-overview Microsoft Copilot Studio Documentation https://learn.microsoft.com/en-us/microsoft-copilot-studio/ Azure AI Foundry Documentation https://learn.microsoft.com/en-us/azure/ai-foundry/ Responsible AI and Content Safety in Azure https://learn.microsoft.com/en-us/azure/ai-services/content-safety/ Introduction to Microsoft AI Agent Solutions (Microsoft Learn module) https://learn.microsoft.com/en-us/training/modules/introduction-microsoft-ai-agent-solutions/ Software Development best practices & using AI in software development AI in Software Development | Microsoft Copilot Architecture strategies for formalizing software development management practices - Microsoft Azure Well-Architected Framework | Microsoft Learn About the Author Dipanjan Ghosh is a seasoned technology leader at Microsoft with extensive experience in AI solutions, enterprise architecture, and modern developer practices. He enables organizations to adopt Microsoft AI platforms such as Copilot, Copilot Studio, and Azure AI Foundry, ensuring scalability, security, and operational excellence. With a strong foundation in cloud architecture and automation, Dipanjan bridges innovation with practical implementation. Passionate about evangelizing technology innovations, he simplifies complex concepts and inspires businesses to embrace responsible, cutting-edge solutions. #SkilledByMTT, #MSLearn, #MTTBloggingGroup469Views0likes0CommentsDeepening our Partnership with Mistral AI on Azure AI Foundry
We’re excited to mark a new chapter in our collaboration with Mistral AI, a leading European AI innovator, with the launch of Mistral Document AI in Azure AI Foundry Models. This marks the first in a series of Mistral models coming to Azure as a serverless API, giving customers seamless access to Mistral’s cutting-edge capabilities, fully hosted, managed, and integrated into the Foundry ecosystem. This launch also deepens our support for sovereign cloud customers —especially in Europe. At Microsoft, we believe Sovereign AI is essential for enabling organizations and regulated industries to harness the full potential of AI while maintaining control over their security, data, and governance. As Satya Nadella has said, “We want every country, every organization, to build AI in a way that respects their sovereignty—of data, of applications, and of infrastructure.” By combining Mistral’s state-of-the-art models with Azure’s enterprise-grade reliability and scale we’re enabling customers to confidently deploy AI that meets strict regulatory and data sovereignty requirements. Mistral Document AI By the Mistral AI Team “Enterprises today are overwhelmed with documents—contracts, forms, research papers, invoices—holding critical information that’s often trapped in scanned images and PDFs. With nearly 90% of enterprise data stored in unstructured formats, traditional OCR simply can’t keep up. Mistral Document AI is built with a multimodal approach that combines vision and language understanding, it interprets documents with contextual intelligence and delivers structured outputs that reflect the original layout—tables remain tables, headings remain headings, and images are preserved alongside the text.” Key Capabilities Document Parsing: Mistral Document AI interprets complex layouts and extracts rich structures such as tables, charts, and LaTeX-formatted equations with markdown-style clarity. Multilingual & Multimodal: The model supports dozens of languages and understands both text and visual elements, making it well-suited for global, diverse datasets. Structured Output & Doc-as-Prompt: Mistral Document AI delivers results in structured formats like JSON, enabling easy downstream integration with databases or AI agents. This supports use cases like Retrieval-Augmented Generation (RAG), where document content becomes a prompt for subsequent queries. Use Cases Document Digitization: Process archives of scanned PDFs or handwritten forms into structured digital records. Knowledge Extraction: Transform research papers, technical manuals, or customer guides into machine-readable formats. RAG pipelines and Intelligent Agents: Integrate structured output into pipelines that feed AI systems for Q&A, summarization, and more. Mistral Document AI on Azure AI Foundry You can now access Mistral Document AI’s capabilities through Azure AI Foundry as a serverless Azure model, sold directly from Microsoft. One-Click Deployment (Serverless) – With a few clicks, you can deploy the model as a serverless REST API, without needing to provision any GPU machines or container hosts. This makes it easy to get started. Enterprise-Grade Security & Privacy – Because the model runs within your Azure environment, you get network isolation and data security out of the box. All inferencing happens in Azure’s cloud under your account, so your documents aren’t sent to a third-party server. Azure AI Foundry ensures your data stays private (no data leaves the Azure region you choose) and offers compliance with enterprise security standards. This is critical for sensitive use cases like banking or healthcare documents. Integrated Responsible AI Capabilities – With Mistral Doc AI running in Azure AI Foundry, you can apply Azure’s built-in Responsible AI tools—such as content filtering, safety system monitoring, and evaluation frameworks—to ensure your deployments align with your organization’s ethical and compliance standards. Observability & Monitoring – Foundry’s monitoring features give you full visibility into model usage, performance, and cost. You can track API calls, latency, and error rates, enabling proactive troubleshooting and optimization. Agent Services Enablement – You can connect Mistral Document AI to Azure AI Agent Service, enabling intelligent agents to process, reason over, and act on extracted document data—unlocking new automation and decision-making scenarios. Azure Ecosystem Integration – Once deployed, the Mistral Document AI endpoint can easily plug into your existing Azure workflows. And because it’s part of Foundry, you can manage it alongside other models in a unified way. This interoperability accelerates the development of intelligent applications. Getting Started: Deploying and Using Mistral Document AI on Azure Setting up Mistral Document AI on Azure AI Foundry is straightforward. Here’s a quick guide to get you up and running: Create an Azure AI Foundry workspace – Ensure you have an Azure subscription (pay-as-you-go, not a free trial) and create an AI Foundry hub and project in the Azure portal Deploy the Mistral Document AI model – In the Azure AI Foundry Model Catalog, search for “mistral-document-ai-2505”. Then click the Deploy button. You’ll be prompted to select a pricing plan – choose deploy. Call the Mistral Document AI API – Once deployed, using the model is as easy as calling a REST API. You can do this from any programming language or even a command-line tool like cURL. Integrate and iterate – With the OCR results in hand, you can integrate Mistral Document AI into your workflows. Conclusion Mistral Document AI joins Azure AI Foundry as one of the several tools available to help organizations unlock insights from unstructured documents. This launch reflects our continued commitment to bringing the latest, most capable models into Foundry, giving developers and enterprises more choice than ever. Whether you’re digitizing records, building knowledge bases, or enhancing your AI workflows, Azure AI Foundry offers powerful and accessible solutions. Pricing Model Name Pricing /1K pages mistral-document-ai-2505 Global $3 mistral-document-ai-2505 DataZone $3.3 Mistral OCR Global $1 Resources Explore Mistral Document AI MS Learn Github Code Samples11KViews3likes3CommentsContext-Aware RAG System with Azure AI Search to Cut Token Costs and Boost Accuracy
🚀 Introduction As AI copilots and assistants become integral to enterprises, one question dominates architecture discussions: “How can we make large language models (LLMs) provide accurate, source-grounded answers — without blowing up token costs?” Retrieval-Augmented Generation (RAG) is the industry’s go-to strategy for this challenge. But traditional RAG pipelines often use static document chunking, which breaks semantic context and drives inefficiencies. To address this, we built a context-aware, cost-optimized RAG pipeline using Azure AI Search and Azure OpenAI, leveraging AI-driven semantic chunking and intelligent retrieval. The result: accurate answers with up to 85% lower token consumption. Majorly in this blog we are considering: Tokenization Chunking The Problem with Naive Chunking Most RAG systems split documents by token or character count (e.g., every 1,000 tokens). This is easy to implement but introduces real-world problems: 🧩 Loss of context — sentences or concepts get split mid-idea. ⚙️ Retrieval noise — irrelevant fragments appear in top results. 💸 Higher cost — you often send 5× more text than necessary. These issues degrade both accuracy and cost efficiency. 🧠 Context-Aware Chunking: Smarter Document Segmentation Instead of breaking text arbitrarily, our system uses an LLM-powered preprocessor to identify semantic boundaries — meaning each chunk represents a complete and coherent concept. Example Naive chunking: “Azure OpenAI Service offers… [cut] …integrates with Azure AI Search for intelligent retrieval.” Context-aware chunking: “Azure OpenAI Service provides access to models like GPT-4o, enabling developers to integrate advanced natural language understanding and generation into their applications. It can be paired with Azure AI Search for efficient, context-aware information retrieval.” ✅ The chunk is self-contained and semantically meaningful. This allows the retriever to match queries with conceptually complete information rather than partial sentences — leading to precision and fewer chunks needed per query. Architecture Diagram Chunking Service: Purpose: Transforms messy enterprise data (wikis, PDFs, transcripts, repos, images) into structured, model-friendly chunks for Retrieval-Augmented Generation (RAG). ChallengeChunking FixLLM context limitsBreaks docs into smaller piecesEmbedding sizeKeeps within token boundsRetrieval accuracyGranular, relevant sections onlyNoiseRemoves irrelevant blocksTraceabilityChunk IDs for auditabilityCost/latencyRe-embed only changed chunks The Chunking Flow (End-to-End) The Chunking Service sits in the ingestion pipeline and follows this sequence: Ingestion: Raw text arrives from sources (wiki, repo, transcript, PDF, image description). Token-aware splitting: Large text is cut into manageable pre-chunks with a 100-token overlap, ensuring no semantic drift across boundaries. Semantic segmentation: Each pre-chunk is passed to an Azure OpenAI Chat model with a structured prompt. Output = JSON array of semantic chunks (sectiontitle, speaker, content). Optional overlap injection: Character-level overlap can be applied across chunks for discourse-heavy text like meeting transcripts. Embedding generation: Each chunk is passed to Azure OpenAI Embeddings API (text-embedding-3-small), producing a 1536-dimension vector. Indexing: Chunks (text + vectors) are uploaded to Azure AI Search. Retrieval: During question answering or document generation, the system pulls top-k chunks, concatenates them, and enriches the prompt for the LLM. Resilience & Traceability The service is built to handle real-world pipeline issues. It retries once on rate limits, validates JSON outputs, and fails fast on malformed data instead of silently dropping chunks. Each chunk is assigned a unique ID (chunk_<sequence>_<sourceTag>), making retrieval auditable and enabling selective re-embedding when only parts of a document change. ☁️ Why Azure AI Search Matters Here Azure AI Search (formerly Cognitive Search) is the heart of the retrieval pipeline. Key Roles: Vector Search Engine: Stores embeddings of chunks and performs semantic similarity search. Hybrid Search (Keyword + Vector): Combines lexical and semantic matching for high precision and recall. Scalability: Supports millions of chunks with blazing-fast search latency. Metadata Filtering: Enables fine-grained retrieval (e.g., by document type, author, section). Native Integration with Azure OpenAI: Allows a seamless, end-to-end RAG pipeline without third-party dependencies. In short, Azure AI Search provides the speed, scalability, and semantic intelligence to make your RAG pipeline enterprise-grade. 💡 Importance of Azure OpenAI Azure OpenAI complements Azure AI Search by providing: High-quality embeddings (text-embedding-3-large) for accurate vector search. Powerful generative reasoning (GPT-4o or GPT-4.1) to craft contextually relevant answers. Security and compliance within your organization’s Azure boundary — critical for regulated environments. Together, these two services form the retrieval (Azure AI Search) and generation (Azure OpenAI) halves of your RAG system. 💰 Token Efficiency By limiting the model’s input to only the most relevant, semantically meaningful chunks, you drastically reduce prompt size and cost. Approach Tokens per Query Typical Cost Accuracy Full-document prompt ~15,000–20,000 Very high Medium Fixed-size RAG chunks ~5,000–8,000 Moderate Medium-high Context-aware RAG (this approach) ~2,000–3,000 Low High 💰 Token Cost Reduction Analysis Let’s quantify it: Step Naive Approach (no RAG) Your Approach (Context-Aware RAG) Prompt context size Entire document (e.g., 15,000 tokens) Top 3 chunks (e.g., 2,000 tokens) Tokens per query ~16,000 (incl. user + system) ~2,500 Cost reduction — ~84% reduction in token usage Accuracy Often low (hallucinations) Higher (targeted retrieval) That’s roughly an 80–85% reduction in token usage while improving both accuracy and response speed. 🧱 Tech Stack Overview Component Service Purpose Chunking Engine Azure OpenAI (GPT models) Generate context-aware chunks Embedding Model Azure OpenAI Embedding API Create high-dimensional vectors Retriever Azure AI Search Perform hybrid and vector search Generator Azure OpenAI GPT-4o Produce final answer Orchestration Layer Python / FastAPI / .NET c# Handle RAG pipeline 🔍 The Bottom Line By adopting context-aware chunking and Azure AI Search-powered RAG, you achieve: ✅ Higher accuracy (contextually complete retrievals) 💸 Lower cost (token-efficient prompts) ⚡ Faster latency (smaller context per call) 🧩 Scalable and secure architecture (fully Azure-native) This is the same design philosophy powering Microsoft Copilot and other enterprise AI assistants today. 🧪 Real-Life Example: Context-Aware RAG in Action To bring this architecture to life, let’s walk through a simple example of how documents can be chunked, embedded, stored in Azure AI Search, and then queried to generate accurate, cost-efficient answers. Imagine you want to build an internal knowledge assistant that answers developer questions from your company’s Azure documentation. ⚙️ Step 1: Intelligent Document Chunking We’ll use a small LLM call to segment text into context-aware chunks — rather than fixed token counts //Context Aware Chunking //text can be your retrieved text from any page/ document private async Task<List<SemanticChunk>> AzureOpenAIChunk(string text) { try { string prompt = $@" Divide the following text into logical, meaningful chunks. Each chunk should represent a coherent section, topic, or idea. Return the result as a JSON array, where each object contains: - sectiontitle - speaker (if applicable, otherwise leave empty) - content Do not add any extra commentary or explanation. Only output the JSON array. Do not give content an array, try to keep all in string. TEXT: {text}" var client = GetAzureOpenAIClient(); var chatCompletionsOptions = new ChatCompletionOptions { Temperature = 0, FrequencyPenalty = 0, PresencePenalty = 0 }; var Messages = new List<OpenAI.Chat.ChatMessage> { new SystemChatMessage("You are a text processing assistant."), new UserChatMessage(prompt) }; var chatClient = client.GetChatClient( deploymentName: _appSettings.Agent.Model); var response = await chatClient.CompleteChatAsync(Messages, chatCompletionsOptions); string responseText = response.Value.Content[0].Text.ToString(); string cleaned = Regex.Replace(responseText, @"```[\s\S]*?```", match => { var match1 = match.Value.Replace("```json", "").Trim(); return match1.Replace("```", "").Trim(); }); // Try to parse the response as JSON array of chunks return CreateChunkArray(cleaned); } catch (JsonException ex) { _logger.LogError("Failed to parse GPT response: " + ex.Message); throw; } catch (Exception ex) { _logger.LogError("Error in AzureOpenAIChunk: " + ex.Message); throw; } } 🧠 Step 2: Adding Overlaps for better result We are adding overlapping between chunks for better and accurate answers. Overlapping window can be modified based on the documents. public List<SemanticChunk> AddOverlap(List<SemanticChunk> chunks, string IDText, int overlapChars = 0) { var overlappedChunks = new List<SemanticChunk>(); for (int i = 0; i < chunks.Count; i++) { var current = chunks[i]; string previousOverlap = i > 0 ? chunks[i - 1].Content[^Math.Min(overlapChars, chunks[i - 1].Content.Length)..] : ""; string combinedText = previousOverlap + "\n" + current.Content; var Id = $"chunk_{i + '_' + IDText}"; overlappedChunks.Add(new SemanticChunk { Id = Regex.Replace(Id, @"[^A-Za-z0-9_\-=]", "_"), Content = combinedText, SectionTitle = current.SectionTitle }); } return overlappedChunks; } 🧠 Step 3: Generate and Store Embeddings in Azure AI Search We convert each chunk into an embedding vector and push it to an Azure AI Search index. public async Task<List<SemanticChunk>> AddEmbeddings(List<SemanticChunk> chunks) { var client = GetAzureOpenAIClient(); var embeddingClient = client.GetEmbeddingClient("text-embedding-3-small"); foreach (var chunk in chunks) { // Generate embedding using the EmbeddingClient var embeddingResult = await embeddingClient.GenerateEmbeddingAsync(chunk.Content).ConfigureAwait(false); chunk.Embedding = embeddingResult.Value.ToFloats(); } return chunks; } public async Task UploadDocsAsync(List<SemanticChunk> chunks) { try { var indexClient = GetSearchindexClient(); var searchClient = indexClient.GetSearchClient(_indexName); var result = await searchClient.UploadDocumentsAsync(chunks); } catch (Exception ex) { _logger.LogError("Failed to upload documents: " + ex); throw; } } 🤖 Step 4: Generate the Final Answer with Azure OpenAI Now we combine the top chunks with the user query to create a cost-efficient, context-rich prompt. P.S. : Here in this example we have used semantic kernel agent , in real time any agent can be used and any prompt can be updated. var context = await _aiSearchService.GetSemanticSearchresultsAsync(UserQuery); // Gets chunks from Azure AI Search //here UserQuery is query asked by user/any question prompt which need to be answered. string questionWithContext = $@"Answer the question briefly in short relevant words based on the context provided. Context : {context}. \n\n Question : {UserQuery}?"; var _agentModel = new AgentModel() { Model = _appSettings.Agent.Model, AgentName = "Answering_Agent", Temperature = _appSettings.Agent.Temperature, TopP = _appSettings.Agent.TopP, AgentInstructions = $@"You are a cloud Migration Architect. " + "Analyze all the details from top to bottom in context based on the details provided for the Migration of APP app using Azure Services. Do not assume anything." + "There can be conflicting details for a question , please verify all details of the context. If there are any conflict please start your answer with word - **Conflict**." + "There might not be answers for all the questions, please verify all details of the context. If there are no answer for question just mention - **No Information**" }; _agentModel = await _agentService.CreateAgentAsync(_agentModel); _agentModel.QuestionWithContext = questionWithContext; var modelWithResponse = await _agentService.GetAnswerAsync(_agentModel); 🧠 Final Thoughts Context-aware RAG isn’t just a performance optimization — it’s an architectural evolution. It shifts the focus from feeding LLMs more data to feeding them the right data. By letting Azure AI Search handle intelligent retrieval and Azure OpenAI handle reasoning, you create an efficient, explainable, and scalable AI assistant. The outcome: Smarter answers, lower costs, and a pipeline that scales with your enterprise. Wiki Link: Tokenization and Chunking IP Link: AI Migration Accelerator1.2KViews4likes0Comments