ai agents
92 TopicsContext-Aware RAG System with Azure AI Search to Cut Token Costs and Boost Accuracy
đ Introduction As AI copilots and assistants become integral to enterprises, one question dominates architecture discussions: âHow can we make large language models (LLMs) provide accurate, source-grounded answers â without blowing up token costs?â Retrieval-Augmented Generation (RAG) is the industryâs go-to strategy for this challenge. But traditional RAG pipelines often use static document chunking, which breaks semantic context and drives inefficiencies. To address this, we built a context-aware, cost-optimized RAG pipeline using Azure AI Search and Azure OpenAI, leveraging AI-driven semantic chunking and intelligent retrieval. The result: accurate answers with up to 85% lower token consumption. Majorly in this blog we are considering: Tokenization Chunking The Problem with Naive Chunking Most RAG systems split documents by token or character count (e.g., every 1,000 tokens). This is easy to implement but introduces real-world problems: đ§Š Loss of context â sentences or concepts get split mid-idea. âď¸ Retrieval noise â irrelevant fragments appear in top results. đ¸ Higher cost â you often send 5Ă more text than necessary. These issues degrade both accuracy and cost efficiency. đ§ Context-Aware Chunking: Smarter Document Segmentation Instead of breaking text arbitrarily, our system uses an LLM-powered preprocessor to identify semantic boundaries â meaning each chunk represents a complete and coherent concept. Example Naive chunking: âAzure OpenAI Service offers⌠[cut] âŚintegrates with Azure AI Search for intelligent retrieval.â Context-aware chunking: âAzure OpenAI Service provides access to models like GPT-4o, enabling developers to integrate advanced natural language understanding and generation into their applications. It can be paired with Azure AI Search for efficient, context-aware information retrieval.â â The chunk is self-contained and semantically meaningful. This allows the retriever to match queries with conceptually complete information rather than partial sentences â leading to precision and fewer chunks needed per query. Architecture Diagram Chunking Service: Purpose: Transforms messy enterprise data (wikis, PDFs, transcripts, repos, images) into structured, model-friendly chunks for Retrieval-Augmented Generation (RAG). ChallengeChunking FixLLM context limitsBreaks docs into smaller piecesEmbedding sizeKeeps within token boundsRetrieval accuracyGranular, relevant sections onlyNoiseRemoves irrelevant blocksTraceabilityChunk IDs for auditabilityCost/latencyRe-embed only changed chunks The Chunking Flow (End-to-End) The Chunking Service sits in the ingestion pipeline and follows this sequence: Ingestion: Raw text arrives from sources (wiki, repo, transcript, PDF, image description). Token-aware splitting: Large text is cut into manageable pre-chunks with a 100-token overlap, ensuring no semantic drift across boundaries. Semantic segmentation: Each pre-chunk is passed to an Azure OpenAI Chat model with a structured prompt. Output = JSON array of semantic chunks (sectiontitle, speaker, content). Optional overlap injection: Character-level overlap can be applied across chunks for discourse-heavy text like meeting transcripts. Embedding generation: Each chunk is passed to Azure OpenAI Embeddings API (text-embedding-3-small), producing a 1536-dimension vector. Indexing: Chunks (text + vectors) are uploaded to Azure AI Search. Retrieval: During question answering or document generation, the system pulls top-k chunks, concatenates them, and enriches the prompt for the LLM. Resilience & Traceability The service is built to handle real-world pipeline issues. It retries once on rate limits, validates JSON outputs, and fails fast on malformed data instead of silently dropping chunks. Each chunk is assigned a unique ID (chunk_<sequence>_<sourceTag>), making retrieval auditable and enabling selective re-embedding when only parts of a document change. âď¸ Why Azure AI Search Matters Here Azure AI Search (formerly Cognitive Search) is the heart of the retrieval pipeline. Key Roles: Vector Search Engine: Stores embeddings of chunks and performs semantic similarity search. Hybrid Search (Keyword + Vector): Combines lexical and semantic matching for high precision and recall. Scalability: Supports millions of chunks with blazing-fast search latency. Metadata Filtering: Enables fine-grained retrieval (e.g., by document type, author, section). Native Integration with Azure OpenAI: Allows a seamless, end-to-end RAG pipeline without third-party dependencies. In short, Azure AI Search provides the speed, scalability, and semantic intelligence to make your RAG pipeline enterprise-grade. đĄ Importance of Azure OpenAI Azure OpenAI complements Azure AI Search by providing: High-quality embeddings (text-embedding-3-large) for accurate vector search. Powerful generative reasoning (GPT-4o or GPT-4.1) to craft contextually relevant answers. Security and compliance within your organizationâs Azure boundary â critical for regulated environments. Together, these two services form the retrieval (Azure AI Search) and generation (Azure OpenAI) halves of your RAG system. đ° Token Efficiency By limiting the modelâs input to only the most relevant, semantically meaningful chunks, you drastically reduce prompt size and cost. Approach Tokens per Query Typical Cost Accuracy Full-document prompt ~15,000â20,000 Very high Medium Fixed-size RAG chunks ~5,000â8,000 Moderate Medium-high Context-aware RAG (this approach) ~2,000â3,000 Low High đ° Token Cost Reduction Analysis Letâs quantify it: Step Naive Approach (no RAG) Your Approach (Context-Aware RAG) Prompt context size Entire document (e.g., 15,000 tokens) Top 3 chunks (e.g., 2,000 tokens) Tokens per query ~16,000 (incl. user + system) ~2,500 Cost reduction â ~84% reduction in token usage Accuracy Often low (hallucinations) Higher (targeted retrieval) Thatâs roughly an 80â85% reduction in token usage while improving both accuracy and response speed. đ§ą Tech Stack Overview Component Service Purpose Chunking Engine Azure OpenAI (GPT models) Generate context-aware chunks Embedding Model Azure OpenAI Embedding API Create high-dimensional vectors Retriever Azure AI Search Perform hybrid and vector search Generator Azure OpenAI GPT-4o Produce final answer Orchestration Layer Python / FastAPI / .NET c# Handle RAG pipeline đ The Bottom Line By adopting context-aware chunking and Azure AI Search-powered RAG, you achieve: â Higher accuracy (contextually complete retrievals) đ¸ Lower cost (token-efficient prompts) ⥠Faster latency (smaller context per call) đ§Š Scalable and secure architecture (fully Azure-native) This is the same design philosophy powering Microsoft Copilot and other enterprise AI assistants today. đ§Ş Real-Life Example: Context-Aware RAG in Action To bring this architecture to life, letâs walk through a simple example of how documents can be chunked, embedded, stored in Azure AI Search, and then queried to generate accurate, cost-efficient answers. Imagine you want to build an internal knowledge assistant that answers developer questions from your companyâs Azure documentation. âď¸ Step 1: Intelligent Document Chunking Weâll use a small LLM call to segment text into context-aware chunks â rather than fixed token counts //Context Aware Chunking //text can be your retrieved text from any page/ document private async Task<List<SemanticChunk>> AzureOpenAIChunk(string text) { try { string prompt = $@" Divide the following text into logical, meaningful chunks. Each chunk should represent a coherent section, topic, or idea. Return the result as a JSON array, where each object contains: - sectiontitle - speaker (if applicable, otherwise leave empty) - content Do not add any extra commentary or explanation. Only output the JSON array. Do not give content an array, try to keep all in string. TEXT: {text}" var client = GetAzureOpenAIClient(); var chatCompletionsOptions = new ChatCompletionOptions { Temperature = 0, FrequencyPenalty = 0, PresencePenalty = 0 }; var Messages = new List<OpenAI.Chat.ChatMessage> { new SystemChatMessage("You are a text processing assistant."), new UserChatMessage(prompt) }; var chatClient = client.GetChatClient( deploymentName: _appSettings.Agent.Model); var response = await chatClient.CompleteChatAsync(Messages, chatCompletionsOptions); string responseText = response.Value.Content[0].Text.ToString(); string cleaned = Regex.Replace(responseText, @"```[\s\S]*?```", match => { var match1 = match.Value.Replace("```json", "").Trim(); return match1.Replace("```", "").Trim(); }); // Try to parse the response as JSON array of chunks return CreateChunkArray(cleaned); } catch (JsonException ex) { _logger.LogError("Failed to parse GPT response: " + ex.Message); throw; } catch (Exception ex) { _logger.LogError("Error in AzureOpenAIChunk: " + ex.Message); throw; } } đ§ Step 2: Adding Overlaps for better result We are adding overlapping between chunks for better and accurate answers. Overlapping window can be modified based on the documents. public List<SemanticChunk> AddOverlap(List<SemanticChunk> chunks, string IDText, int overlapChars = 0) { var overlappedChunks = new List<SemanticChunk>(); for (int i = 0; i < chunks.Count; i++) { var current = chunks[i]; string previousOverlap = i > 0 ? chunks[i - 1].Content[^Math.Min(overlapChars, chunks[i - 1].Content.Length)..] : ""; string combinedText = previousOverlap + "\n" + current.Content; var Id = $"chunk_{i + '_' + IDText}"; overlappedChunks.Add(new SemanticChunk { Id = Regex.Replace(Id, @"[^A-Za-z0-9_\-=]", "_"), Content = combinedText, SectionTitle = current.SectionTitle }); } return overlappedChunks; } đ§ Step 3: Generate and Store Embeddings in Azure AI Search We convert each chunk into an embedding vector and push it to an Azure AI Search index. public async Task<List<SemanticChunk>> AddEmbeddings(List<SemanticChunk> chunks) { var client = GetAzureOpenAIClient(); var embeddingClient = client.GetEmbeddingClient("text-embedding-3-small"); foreach (var chunk in chunks) { // Generate embedding using the EmbeddingClient var embeddingResult = await embeddingClient.GenerateEmbeddingAsync(chunk.Content).ConfigureAwait(false); chunk.Embedding = embeddingResult.Value.ToFloats(); } return chunks; } public async Task UploadDocsAsync(List<SemanticChunk> chunks) { try { var indexClient = GetSearchindexClient(); var searchClient = indexClient.GetSearchClient(_indexName); var result = await searchClient.UploadDocumentsAsync(chunks); } catch (Exception ex) { _logger.LogError("Failed to upload documents: " + ex); throw; } } đ¤ Step 4: Generate the Final Answer with Azure OpenAI Now we combine the top chunks with the user query to create a cost-efficient, context-rich prompt. P.S. : Here in this example we have used semantic kernel agent , in real time any agent can be used and any prompt can be updated. var context = await _aiSearchService.GetSemanticSearchresultsAsync(UserQuery); // Gets chunks from Azure AI Search //here UserQuery is query asked by user/any question prompt which need to be answered. string questionWithContext = $@"Answer the question briefly in short relevant words based on the context provided. Context : {context}. \n\n Question : {UserQuery}?"; var _agentModel = new AgentModel() { Model = _appSettings.Agent.Model, AgentName = "Answering_Agent", Temperature = _appSettings.Agent.Temperature, TopP = _appSettings.Agent.TopP, AgentInstructions = $@"You are a cloud Migration Architect. " + "Analyze all the details from top to bottom in context based on the details provided for the Migration of APP app using Azure Services. Do not assume anything." + "There can be conflicting details for a question , please verify all details of the context. If there are any conflict please start your answer with word - **Conflict**." + "There might not be answers for all the questions, please verify all details of the context. If there are no answer for question just mention - **No Information**" }; _agentModel = await _agentService.CreateAgentAsync(_agentModel); _agentModel.QuestionWithContext = questionWithContext; var modelWithResponse = await _agentService.GetAnswerAsync(_agentModel); đ§ Final Thoughts Context-aware RAG isnât just a performance optimization â itâs an architectural evolution. It shifts the focus from feeding LLMs more data to feeding them the right data. By letting Azure AI Search handle intelligent retrieval and Azure OpenAI handle reasoning, you create an efficient, explainable, and scalable AI assistant. The outcome: Smarter answers, lower costs, and a pipeline that scales with your enterprise. Wiki Link: Tokenization and Chunking IP Link: AI Migration Accelerator1.2KViews4likes1CommentFine-tuning at Ignite 2025: new models, new tools, new experience
Fineâtuning isnât just âbetter prompts.â Itâs how you tailor a foundation model to your domain and tasks to get higher accuracy, lower cost, and faster responses -- then run it at scale. As Agents become more critical to businesses, weâre seeing growing demand for fine tuning to ensure agents are low latency, low cost, and call the right tools and the right time. At Ignite 2025, we saw how Docusign fine-tuned models that powered their document management system to achieve major gains: more than 50% cost reduction per document, 2x faster inference time, and significant improvements in accuracy. At Ignite, we launched several new features in Microsoft Foundry that make fineâtuning easier, more scalable, and more impactful than ever with the goal of making agents unstoppable in the real world: New Open-Source models â Qwen3 32B, Ministral 3B, GPT-OSS-20B and Llama 3.3 70B â to give users access to Open-Source models in the same low friction experience as OpenAI Synthetic data generation to jump start your training journey â just upload your documents and our multi-agent system takes care of the rest Developer Training tier to reduce the barrier to entry by offering discounted training (50% off global!) on spot capacity Agentic Reinforcement Fine-tuning with GPT-5: leverage tool calling during chain of thought to teach reasoning models to use your tools to solve complex problems And if that wasnât enough, we also released a re-imagined fine tuning experience in Foundry (new), providing access to all these capabilities in a simplified and unified UI. New Open-Source Models for Fine-tuning (Public Preview): Bringing open-source innovation to your fingertips Weâve expanded our model lineup to new open-source models you can fine-tune without worrying about GPUs or compute. Ministral-3B and Qwen3 32B are now available to fine-tune with Supervised Fine-Tuning (SFT) in Microsoft Foundry, enabling developers to adapt open-source models to their enterprise-specific domains with ease. Look out for Llama 3.3 70B and GPT-OSS-20B, coming next week! These OSS models are offered through a unified interface with OpenAI via the UI or Foundry SDK which means the same experience, regardless of model choice. These models can be used alongside your favorite Foundry tools, from AI Search to Evaluations, or to power your agents. Note: New OSS models are only available in "New" Foundry â so upgrade today! Like our OpenAI models, Open-Source models in Foundry charge per-token for training, making it simple to forecast and estimate your costs. All models are available on Global Standard tier, making discoverability easy. For more details on pricing, please see our Microsoft Foundry Models pricing page. Customers like Co-Star Group have already seen success leveraging fine tuning with Mistral models to power their home search experience on Homes.com. They selected Ministral-3B as a small, efficient model to power high volume, low latency processing with lower costs and faster deployment times than Frontier models â while still meeting their needs for accuracy, scalability, and availability thanks to fine tuning in Foundry. Synthetic data generation (Public Preview): Create high-quality training data automatically Developers can now generate high-quality, domain-specific synthetic datasets to close those persistent data gaps with synthetic data generation. One of the biggest challenges we hear teams face during fine-tuning is not having enough data or the right kind of data because itâs scarce, sensitive, or locked behind compliance constraints (think healthcare and finance). Our new synthetic data generation capability solves this by giving you a safe, scalable way to create realistic, diverse datasets tailored to your use case so you can fine-tune and evaluate models without waiting for perfect real-world data. Now, you can produce realistic questionâanswer pairs from your documents, or simulate multiâturn toolâuse dialogues that include function calls without touching sensitive production data. How it works: Fineâtuning datasets: Upload a reference file (PDF/Markdown/TXT) and Foundry converts it into SFTâformatted Q&A pairs that reflect your domainâs language and nuances so your model learns from the right examples. Agent toolâuse datasets: Provide an OpenAPI (Swagger) spec, and Foundry simulates multiâturn assistantâuser conversations with tool calls, producing SFTâready examples that teach models to call your APIs reliably. Evaluation datasets: Generate distinct test queries tailored to your scenarios so you can measure model and agent quality objectivelyâseparate from your training data to avoid false confidence. Agents succeed when they reliably understand domain intent and call the right tools at the right time. Foundryâs synthetic data generation does exactly that: it creates taskâspecific training and test data so your agent learns from the right examples and you can prove it works before you go live so they are reliable in the real world. Developer Training Tier (Public Preview): 50% discount on training jobs Fine-tuning can be expensive, especially when you may need to run multiple experiments to create the right model for your production agents. To make it easier than ever to get started, weâre introducing Developer Training tier â providing users with a 50% discount when they choose to run workloads on pre-emptible capacity. It also lets users iterate faster: we support up to 10 concurrent jobs on Developer tier, making it ideal for running experiments in parallel. Because it uses reclaimable capacity, jobs may be preâempted and automatically resumed, so they may take longer to complete. When to use Developer Training tier: When cost matters - great for early experimentation or hyperparameter tuning thanks to 50% lower training cost. When you need high concurrency - supports up to 10 simultaneous jobs, ideal for running multiple experiments in parallel. When the workload is nonâurgent - suitable for jobs that can tolerate pre-emption and longer, capacity-dependent runtimes. Agentic Reinforcement FineâTuning (RFT) (Private Preview): Train reasoning models to use your tools through outcome based optimization Building reliable AI agents requires more than copying correct behavior; models need to learn which reasoning paths lead to successful outcomes. While supervised fine-tuning trains models to imitate demonstrations, reinforcement fine-tuning optimizes models based on whether their chain of thought actually generates a successful outcome. It teaches them to think in new ways, about new domains â to solve complex problems. Agentic RFT applies this to tool-using workflows: the model generates multiple reasoning traces (including tool calls and planning steps), receives feedback on which attempts solved the problem correctly, and updates its reasoning patterns accordingly. This helps models learn effective strategies for tool sequencing, error recovery, and multi-step planningâbehaviors that are difficult to capture through demonstrations alone. The difference now is that you can provide your own custom tools for use during chain of thought: models can interact with your own internal systems, retrieve the data they need, and access your proprietary APIs to solve your unique problems. Agentic RFT is currently available in private preview for o4-mini and GPT-5, with configurable reasoning effort, sampling rates, and per-run telemetry. Request access at aka.ms/agentic-rft-preview. What are customers saying? Fine-tuning is critical to achieve the accuracy and latency needed for enterprise agentic workloads. Decagon is used by many of the worldâs most respected enterprises to build, manage and scale AI agents that can resolve millions of customer inquiries across chat, email, and voice â 24 hours a day, seven days a week. This experience is powered by fine-tuning: âProviding accurate responses with minimal latency is fundamental to Decagonâs product experience. We saw an opportunity to reduce latency while improving task-specific accuracy by fine-tuning models using our proprietary datasets. Via fine-tuning, we were able to exceed the performance of larger state of the art models with smaller, lighter-weight models which could be served significantly faster.â -- Cyrus Asgari, Lead Research Engineer for fine-tuning at Decagon But itâs not just agent-first startups seeing results. Companies like Discover Bank are using fine tuned models to provide better customer experiences with personal banking agents: We consolidated three steps into one, response times that were previously five or six seconds came down to one and a half to two seconds on average. This approach made theâŻsystem more efficientâŻand the 50% reduction in latency made conversations with Discovery AI feel seamless. - Stuart Emslie, Head of Actuarial and Data Science at Discovery Bank Fine-tuning has evolved from an optimization technique to essential infrastructure for production AI. Whether building specialized agents or enhancing existing products, the pattern is clear: custom-trained models deliver the accuracy and speed that general-purpose models can't match. As techniques like Agentic RFT and synthetic data generation mature, the question isn't whether to fine-tune, but how to build the systems to do it systematically. Learn More đ§ Get Started with fine-tuning with Azure AI Foundry on Microsoft Learn Docs âśď¸ Watch On-Demand: https://ignite.microsoft.com/en-US/sessions/BRK188?source=sessions đŠâ Try the demos: aka.ms/FT-ignite-demos đ Continue the conversation on Discord147Views0likes0CommentsFoundry IQ: Unlocking ubiquitous knowledge for agents
Introducing Foundry IQ by Azure AI Search in Microsoft Foundry. Foundry IQ is a centralized knowledge layer that connects agents to data with the next generation of retrieval-augmented generation (RAG). Foundry IQ includes the following features: Knowledge bases: Available directly in the new Foundry portal, knowledge bases are reusable, topic-centric collections that ground multiple agents and applications through a single API. Automated indexed and federated knowledge sources â Expand what data an agent can reach by connecting to both indexed and remote knowledge sources. For indexed sources, Foundry IQ delivers automatic indexing, vectorization, and enrichment for text, images, and complex documents. Agentic retrieval engine in knowledge bases â A self-reflective query engine that uses AI to plan, select sources, search, rank and synthesize answers across sources with configurable âretrieval reasoning effort.â Enterprise-grade security and governance â Support for document-level access control, alignment with existing permissions models, and options for both indexed and remote data. Foundry IQ is available in public preview through the new Foundry portal and Azure portal with Azure AI Search. Foundry IQ is part of Microsoft's intelligence layer with Fabric IQ and Work IQ.18KViews4likes0CommentsFoundry IQ: boost response relevance by 36% with agentic retrieval
The latest RAG performance evaluations and results for knowledge bases and built-in agentic retrieval engine. Foundry IQ by Azure AI Search is a unified knowledge layer for agents, designed to improve response performance, automate RAG workflows and enable enterprise-ready grounding. These evaluations tested RAG performance for knowledge bases and new features including retrieval reasoning effort and federated sources like web and SharePoint for M365. Foundry IQ and Azure AI Search are part of Microsoft Foundry.3.5KViews4likes0CommentsSecuring Azure AI Applications: A Deep Dive into Emerging Threats | Part 1
Why AI Security Canât Be Ignored? Generative AI is rapidly reshaping how enterprises operateâaccelerating decision-making, enhancing customer experiences, and powering intelligent automation across critical workflows. But as organizations adopt these capabilities at scale, a new challenge emerges: AI introduces security risks that traditional controls cannot fully address. AI models interpret natural language, rely on vast datasets, and behave dynamically. This flexibility enables innovationâbut also creates unpredictable attack surfaces that adversaries are actively exploiting. As AI becomes embedded in business-critical operations, securing these systems is no longer optionalâit is essential. The New Reality of AI Security The threat landscape surrounding AI is evolving faster than any previous technology wave. Attackers are no longer focused solely on exploiting infrastructure or APIs; they are targeting the intelligence itselfâthe model, its prompts, and its underlying data. These AI-specific attack vectors can: Expose sensitive or regulated data Trigger unintended or harmful actions Skew decisions made by AI-driven processes Undermine trust in automated systems As AI becomes deeply integrated into customer journeys, operations, and analytics, the impact of these attacks grows exponentially. Why These Threats Matter? Threats such as prompt manipulation and model tampering go beyond technical issuesâthey strike at the foundational principles of trustworthy AI. They affect: Confidentiality: Preventing accidental or malicious exposure of sensitive data through manipulated prompts. Integrity: Ensuring outputs remain accurate, unbiased, and free from tampering. Reliability: Maintaining consistent model behavior even when adversaries attempt to deceive or mislead the system. When these pillars are compromised, the consequences extend across the business: Incorrect or harmful AI recommendations Regulatory and compliance violations Damage to customer trust Operational and financial risk In regulated sectors, these threats can also impact audit readiness, risk posture, and long-term credibility. Understanding why these risks matter builds the foundation. In the upcoming blogs, weâll explore how these threats work and practical steps to mitigate them using Azure AIâs security ecosystem. Why AI Security Remains an Evolving Discipline? Traditional security frameworksâbuilt around identity, network boundaries, and application hardeningâdo not fully address how AI systems operate. Generative models introduce unique and constantly shifting challenges: Dynamic Model Behavior: Models adapt to context and data, creating a fluid and unpredictable attack surface. Natural Language Interfaces: Prompts are unstructured and expressive, making sanitization inherently difficult. Data-Driven Risks: Training and fine-tuning pipelines can be manipulated, poisoned, or misused. Rapidly Emerging Threats: Attack techniques evolve faster than most defensive mechanisms, requiring continuous learning and adaptation. Microsoft and other industry leaders are responding with robust toolsâAzure AI Content Safety, Prompt Shields, Responsible AI Frameworks, encryption, isolation patternsâbut technology alone cannot eliminate risk. True resilience requires a combination of tooling, governance, awareness, and proactive operational practices. Let's Build a Culture of Vigilance: AI security is not just a technical requirementâit is a strategic business necessity. Effective protection requires collaboration across: Developers Data and AI engineers Cybersecurity teams Cloud platform teams Leadership and governance functions Security for AI is a shared responsibility. Organizations must cultivate awareness, adopt secure design patterns, and continuously monitor for evolving attack techniques. Building this culture of vigilance is critical for long-term success. Key Takeaways: AI brings transformative value, but it also introduces risks that evolve as quickly as the technology itself. Strengthening your AI security posture requires more than robust toolingâit demands responsible AI practices, strong governance, and proactive monitoring. By combining Azureâs built-in security capabilities with disciplined operational practices, organizations can ensure their AI systems remain secure, compliant, and trustworthy, even as new threats emerge. Whatâs Next? In future blogs, weâll explore two of the most important AI threatsâPrompt Injection and Model Manipulationâand share actionable strategies to mitigate them using Azure AIâs security capabilities. Stay tuned for practical guidance, real-world scenarios, and Microsoft-backed best practices to keep your AI applications secure. Stay Tuned.!584Views3likes0CommentsBuilding Secure, Governable AI Agents with Microsoft Foundry
The era of AI agents has arrived â and itâs accelerating fast. Organizations are moving beyond simple chatbots that merely respond to requests and toward intelligent agents that can reason, act, and adapt without human supervision. These agents can analyze data, call tools, orchestrate workflows, and make autonomous decisions in real time. As a result, agents are becoming integral members of teams, augmenting and amplifying human capabilities across organizations at scale. But the very strengths that make agents so powerful â their autonomy, intelligence, and ability to operate like virtual teammates â also introduce new risks. Enterprises need a platform that doesnât just enable agent development but governs it from the start, ensuring security, accountability, and trust at every layer. Thatâs where Microsoft Foundry comes in. A unified platform for enterprise-ready agents Microsoft Foundry is a unified, interoperable platform for building, optimizing, and governing AI apps and agents at scale. At its core is Foundry Agent Service, which connects models, tools, knowledge, and frameworks into a single, observable runtime. Microsoft Foundry enables companies to shift-left, with security, safety, and governance integrated from the beginning of a developer's workflow. It delivers enterprise-grade controls from setup through production, giving customers the trust, flexibility, and confidence to innovate. 1. Setup: Start with the right foundation Enterprise customers have stringent networking, compliance, and security requirements that must be met before they can even start testing AI capabilities. Microsoft Foundry Agent Service provides a flexible setup experience designed to meet organizations where they are â whether youâre a startup prioritizing speed and simplicity or an enterprise with strict data and compliance needs. Data Control Basic Setup: Ideal for rapid prototyping and getting started quickly. This mode uses platform-managed storage. Standard Setup: Enables fine-grained control over your data by using your own Azure resources and configurations. Networking Bring Your Own Virtual Network (BYO VNet) or enable a Managed Virtual Network (Managed VNet) to enable full network isolation and strict data exfiltration control, ensuring that sensitive information remains within your organizationâs trusted boundaries. Using your own virtual network for agents and evaluation workloads in Foundry allows the networking controls to be in your hands, including setting up your own Firewall to control egress traffic, virtual network peering, and setting NSGs and UDRs for managing network traffic. Managed virtual network (preview) creates a virtual network in the Microsoft tenant to handle the egress traffic of your agents. The managed virtual network handles the hassle of setting up network isolation for your Foundry resource and agents, such as setting up the subnet range, IP selection, and subnet delegation. Secrets Management Choose between a Managed Key Vault or Bring Your Own Key Vault to manage secrets and access credentials in a way that aligns with your organizationâs security policies. These credentials are critical for establishing secure connections to external resources and tools integrated via the Model Context Protocol (MCP). Encryption Data is always encrypted in transit and at rest using Microsoft-managed keys by default. For enhanced ownership and control, customers can opt for Customer Managed Keys (CMK) to enable key rotation and fine-tuned data governance. Model Governance with AI Gateway Foundry supports Bring Your Own AI Gateway (preview) so enterprises can integrate their existing Foundry and Azure OpenAI model endpoints into Foundry Agent Service behind an AI Gateway for maximum flexibility, control, and governance. Authentication Foundry enforces keyless authentication using Microsoft Entra ID for all end-users wanting to access agents. 2. Development: Build agents you trust Once the environment is configured, Foundry provides tools to develop, control, and evaluate agents before putting them into production. Microsoft Entra Agent ID Every agent in Foundry is assigned a Microsoft Entra Agent ID â a new identity type purpose-built for the security and operational needs of enterprise-scale AI agents. With an agent identity, agents can be recognized, authenticated, and governed just like users, allowing IT teams to enforce familiar controls such as Conditional Access, Identity Protection, Identity Governance, and network policies. In the Microsoft Entra admin center, you will manage your agent inventory which lists all agents in your tenant including those created in Foundry, Copilot Studio, and any 3P agent you register. Unpublished agents (shared agent identity): All unpublished or in-development Foundry agents within the same project share a common agent identity. This design simplifies permission management because unpublished agents typically require the same access patterns and permission configurations. The shared identity approach provides several benefits: Simplified administration: Administrators can centrally manage permissions for all in-development agents within a project Reduced identity sprawl: Using a single identity per project prevents unnecessary identity creation during early experimentation Developer autonomy: Once the shared identity is configured, developers can independently build and test agents without repeatedly configuring new permissions Published Agents (unique agent identity): When you want to share an agent with others as a stable offering, publish it to an agent application. Once published, the agent gets assigned a unique agent identity, tied to the agent application. This establishes durable, auditable boundaries for production agents and enables independent lifecycle, compliance, and monitoring controls. Observability: Tracing, Evaluation, and Monitoring Microsoft Foundry provides a comprehensive observability layer that gives teams deep visibility into agent performance, quality, and operational health across development and production. Foundryâs observability stack brings together traces, logs, evaluations, and safety signals to help developers and administrators understand exactly how an agent arrived at an answer, which tools it used, and where issues may be emerging. This includes: Tracing: Track every step of an agent response including prompts, tool calls, tool responses, and output generation to understand decision paths, latency contributors, and failure points. Evaluations: Foundry provides a comprehensive library of built-in evaluators that measure coherence, groundedness, relevance, safety risks, security vulnerabilities, and agent-specific behaviors such as task adherence or tool-call accuracy. These evaluations help teams catch regressions early, benchmark model quality, and validate that agents behave as intended before moving to production. Monitoring: The Agent Monitoring Dashboard in Microsoft Foundry provides real-time insights into the operational health, performance, and compliance of your AI agents. This dashboard can track token usage, latency, evaluation metrics, and security posture across multi-agent systems. AI red teaming: Foundryâs AI Red Teaming Agent can be used to probe agents with adversarial queries to detect jailbreaks, prompt attacks, and security vulnerabilities. Agent Guardrails and Controls Microsoft Foundry offers safety and security guardrails that can be applied to core models, including image generation models, and agents. Guardrails consist of controls that define three things: What risk to detect (e.g., harmful content, prompt attacks, data leakage) Where to scan for it (user input, tool calls, tool responses, or model output) What action to take (annotate or block) Foundry automatically applies a default safety guardrail to all models and agents, mitigating a broad range of risks â including hate and fairness issues, sexual or violent content, self-harm, protected text/code material, and prompt-injection attempts. For organizations that require more granular control, Foundry supports custom guardrails. These allow teams to tune detection levels, selectively enable or disable risks, and apply different safety policies at the model or agent level. Tool Controls with AI Gateway To enforce tool-level controls, connect AI Gateway to your Foundry project. Once connected, all MCP and OpenAPI tools automatically receive an AI Gateway endpoint, allowing administrators to control how agents call these tools, where they can be accessed from, and who is authorized to use them. You can configure inbound, backend, outbound, and error-handling policies â for example, restricting which IPs can call an API, setting error-handling rules, or applying rate-limiting policies to control how often a tool can be invoked within a given time window. 3. Publish: Securely share your agents with end users Once the proper controls are in place and testing is complete, the agent is ready to be promoted to production. At this stage, enterprises need a secure, governed way to publish and share agents with teammates or customers. Publishing an agent to an Agent Application Anyone with the Azure AI User role on a Foundry project can interact with all agents inside that project, with conversations and state shared across users. This model is ideal for development scenarios like authoring, debugging, and testing, but it is not suitable for distributing agents to broader audiences. Publishing promotes an agent from a development asset into a managed Azure resource with a dedicated endpoint, independent identity, and governance capabilities. When an agent is published, Foundry creates an Agent Application resource designed for secure, scalable distribution. This resource provides a stable endpoint, a unique agent identity with full audit trails, cross-team sharing capabilities, integration with the Entra Agent Registry, and isolation of user data. Instead of granting access to the entire Foundry project, you grant users access only to the Agent Application resource. Integrate with M365/A365 Once your agent is published, you can integrate it into Microsoft 365 or Agent 365. This enables developers to seamlessly deploy agents from Foundry into Microsoft productivity experiences like M365 Copilot or Teams. Users can access and interact with these agents in the canvases they already use every day, providing enterprise-ready distribution with familiar governance and trust boundaries. 4. Production: Govern your agent fleet at scale As organizations expand from a handful of agents to hundreds or thousands, visibility and control become essential. The Foundry Control Plane delivers a unified, real-time view of a company's entire agent ecosystem â spanning Foundry-built and third-party agents. Key capabilities include: Comprehensive agent inventory: View and govern 100% of your agent fleet with sortable, filterable data views. Foundry Control Plane gives developers a clear understanding of every agentâwhether built in Foundry, Microsoft Agent Framework, LangChain, or LangGraphâand surfaces them in one place, regardless of where theyâre hosted or which cloud they run on. Operational control: Pause or disable an agent when a risk is detected. Real-time alerts: Get notified about policy, evaluation, and security alerts. Policy compliance management: Enforce organization-wide AI content policies and model policies to only allow developers to build agents with approved models in your enterprise. Cost and ROI insights: Real-time cost charts in Foundry give an accurate view of spending across all agents in a project or subscription, with drill-down capabilities to see costs at the individual agent or run level. Agent behavior controls: Apply consistent guardrails across inputs, outputs, and now tool interactions. Health and quality metrics: Review performance and reliability scores for each agent, with drilldowns for deeper analysis and corrective action. Foundry Control Plane brings everything together into a single, connected experience, enabling teams to observe, control, secure, and operate their entire agent fleet. Its capabilities work together seamlessly to help organizations build and manage AI systems that are both powerful and responsibly governed. Build agents with confidence Microsoft Foundry unifies identity, governance, security, observability, and operational control into a single end-to-end platform purpose-built for enterprise AI. With Foundry, companies can choose the setup model that matches their security and compliance posture, apply agent-level guardrails and tool-level controls with AI Gateway, securely publish and share agents across Microsoft 365 and A365, and govern their entire agent fleet through the Foundry Control Plane. At the center of this system is Microsoft Entra Agent ID, ensuring every agent has a managed identity. With these capabilities, organizations can deploy autonomous agents at scale â knowing every interaction is traceable, every risk is mitigated, and every agent is fully accountable. Whether you're building your first agent or managing a fleet of thousands, Foundry provides the foundation to innovate boldly while meeting the trust, compliance, and operational excellence enterprises require. The future of work is one where people and agents collaborate seamlessly â and Microsoft Foundry gives you the platform to build it with confidence. Learn more To dive deeper, watch the Ignite 2025 session: Entra Agent ID and other enterprise superpowers in Microsoft Foundry. To learn more, visit Microsoft Learn and explore resources including the AI Agents for Beginners, Microsoft Agent Framework, and course materials that help you build and operate agents responsibly.601Views2likes0CommentsAnnouncing Elastic MCP Server in Microsoft Foundry Tool Catalog
Introduction The future of enterprise AI is agentic - driven by intelligent, context-aware agents that deliver real business value. Microsoft Foundry is committed to enabling developers with the tools and integrations they need to build, deploy, and govern these advanced AI solutions. Today, we are excited to announce that Elastic MCP Server is now discoverable in the Microsoft Foundry Tool Catalog, unlocking seamless access to Elasticâs industry-leading vector search capabilities for Retrieval-Augmented Generation (RAG) scenarios. Seamless Integration: Elastic Meets Microsoft Foundry This integration is a major milestone in our ongoing effort to foster an open, extensible AI ecosystem. With Elastic MCP Server now available in the Azure MCP registry, developers can easily connect their agents to Elasticâs powerful search and analytics engine using the Model Context Protocol (MCP). This ensures that agents built on Microsoft Foundry are grounded in trusted, enterprise-grade data - delivering accurate, relevant, and verifiable responses. Create Elastic cloud hosted deployments or Serverless Search Projects through the Microsoft Marketplace or the Azure Portal Discoverability: Elastic MCP Server is listed as a remote MCP server in the Azure MCP Registry and the Foundry Tool Catalog. Multi-Agent Workflows: Enable collaborative agent scenarios via the A2A protocol. Unlocking Vector Search for RAG Elasticâs advanced vector search capabilities are now natively accessible to Foundry agents, enabling powerful Retrieval-Augmented Generation (RAG) workflows: Semantic Search: Agents can perform hybrid and vector-based searches over enterprise data, retrieving the most relevant context for grounding LLM responses. Customizable Retrieval: With Elasticâs Agent Builder, you can define your custom tools specific to your indices and datasets and expose them to Foundry Agents via MCP. Enterprise Grounding: Ensure agent outputs are always based on proprietary, up-to-date data, reducing hallucinations and improving trust. Deployment: Getting Started Follow these steps to integrate Elastic MCP Server with your Foundry agents: Within your Foundry project, you can either: Go to Build in the top menu, then select Tools. Click on Connect a Tool. Select the Catalog tab, search for Elasticsearch, and click Create. Once prompted, configure the Elasticsearch details by providing a name, your Kibana endpoint, and your Elasticsearch API key. Click on Use in an agent and select an existing Agent to integrate Elastic MCP Server. Alternatively, within your Agent: Click on Tools. Click Add, then select Custom. Search for Elasticsearch, add it, and configure the tool as described above. The tool will now appear in your Agentâs configuration. You are all set to now interact with your Elasticsearch projects and deployments! Conclusion & Next Steps The addition of Elastic MCP Server to the Foundry Tool Catalog empowers developers to build the next generation of intelligent, grounded AI agents - combining Microsoftâs agentic platform with Elasticâs cutting-edge vector search. Whether youâre building RAG-powered copilots, automating workflows, or orchestrating multi-agent systems, this integration accelerates your journey from prototype to production. Ready to get started? Get started with Elastic via the Azure Marketplace or Azure portal. New users get a 7-day free trial! Explore agent creation in Microsoft Foundryportal and try the Foundry Tool Catalog. Deep dive into Elastic MCP and Agent Builder Join us at Microsoft Ignite 2025 for live demos, deep dives, and more on building agentic AI with Elastic and Microsoft Foundry!428Views1like2CommentsHybrid AI Using Foundry Local, Microsoft Foundry and the Agent Framework - Part 2
Background In Part 1, we explored how a local LLM running entirely on your GPU can call out to the cloud for advanced capabilities The theme was: Keep your data local. Pull intelligence in only when necessary. That was local-first AI calling cloud agents as needed. This time, the cloud is in charge, and the user interacts with a Microsoft Foundry hosted agent â but whenever it needs private, sensitive, or user-specific information, it securely âcalls back homeâ to a local agent running next to the user via MCP. Think of it as: The cloud agent = specialist doctor The local agent = your health coach who you trust and who knows your medical history The cloud never sees your raw medical history The local agent only shares the minimum amount of information needed to support the cloud agent reasoning This hybrid intelligence pattern respects privacy while still benefiting from hosted frontier-level reasoning. Disclaimer: The diagnostic results, symptom checker, and any medical guidance provided in this article are for illustrative and informational purposes only. They are not intended to provide medical advice, diagnosis, or treatment. Architecture Overview The diagram illustrates a hybrid AI workflow where a Microsoft Foundryâhosted agent in Azure works together with a local MCP server running on the userâs machine. The cloud agent receives user symptoms and uses a frontier model (GPT-4.1) for reasoning, but when it needs personal contextâlike medical historyâit securely calls back into the local MCP Health Coach via a dev-tunnel. The local MCP server queries a local GPU-accelerated LLM (Phi-4-mini via Foundry Local) along with stored health-history JSON, returning only the minimal structured background the cloud model needs. The cloud agent then combines both piecesâits own reasoning plus the local contextâto produce tailored recommendations, all while sensitive data stays fully on the userâs device. Hosting the agent in Microsoft Foundry on Azure provides a reliable, scalable orchestration layer that integrates directly with Azure identity, monitoring, and governance. It lets you keep your logic, policies, and reasoning engine in the cloud, while still delegating private or resource-intensive tasks to your local environment. This gives you the best of both worlds: enterprise-grade control and flexibility with edge-level privacy and efficiency. Demo Setup Create the Cloud Hosted Agent Using Microsoft Foundry, I created an agent in the UI and pick gpt-4.1 as model: I provided rigorous instructions as system prompt: You are a medical-specialist reasoning assistant for non-emergency triage. You do NOT have access to the patientâs identity or private medical history. A privacy firewall limits what you can see. A local âPersonal Health Coachâ LLM exists on the userâs device. It holds the patientâs full medical history privately. You may request information from this local model ONLY by calling the MCP tool: get_patient_background(symptoms) This tool returns a privacy-safe, PII-free medical summary, including: - chronic conditions - allergies - medications - relevant risk factors - relevant recent labs - family history relevance - age group Rules: 1. When symptoms are provided, ALWAYS call get_patient_background BEFORE reasoning. 2. NEVER guess or invent medical history â always retrieve it from the local tool. 3. NEVER ask the user for sensitive personal details. The local model handles that. 4. After the tool runs, combine: (a) the patient_background output (b) the userâs reported symptoms to deliver high-level triage guidance. 5. Do not diagnose or prescribe medication. 6. Always end with: âThis is not medical advice.â You MUST display the section âLocal Health Coach Summary:â containing the JSON returned from the tool before giving your reasoning. Build the Local MCP Server (Local LLM + Personal Medical Memory) The full code for the MCP server is available here but here are the main parts: HTTP JSON-RPC Wrapper (âMCP Gatewayâ) The first part of the server exposes a minimal HTTP API that accepts MCP-style JSON-RPC messages and routes them to handler functions: Listens on a local port Accepts POST JSON-RPC Normalizes the payload Passes requests to handle_mcp_request() Returns JSON-RPC responses Exposes initialize and tools/list class MCPHandler(BaseHTTPRequestHandler): def _set_headers(self, status=200): self.send_response(status) self.send_header("Content-Type", "application/json") self.end_headers() def do_GET(self): self._set_headers() self.wfile.write(b"OK") def do_POST(self): content_len = int(self.headers.get("Content-Length", 0)) raw = self.rfile.read(content_len) print("---- RAW BODY ----") print(raw) print("-------------------") try: req = json.loads(raw.decode("utf-8")) except: self._set_headers(400) self.wfile.write(b'{"error":"Invalid JSON"}') return resp = handle_mcp_request(req) self._set_headers() self.wfile.write(json.dumps(resp).encode("utf-8")) Tool Definition: get_patient_background This section defines the tool contract exposed to Azure AI Foundry. The hosted agent sees this tool exactly as if it were a cloud function: Advertises the tool via tools/list Accepts arguments passed from the cloud agent Delegates local reasoning to the GPU LLM Returns structured JSON back to the cloud def handle_mcp_request(req): method = req.get("method") req_id = req.get("id") if method == "tools/list": return { "jsonrpc": "2.0", "id": req_id, "result": { "tools": [ { "name": "get_patient_background", "description": "Returns anonymized personal medical context using your local LLM.", "inputSchema": { "type": "object", "properties": { "symptoms": {"type": "string"} }, "required": ["symptoms"] } } ] } } if method == "tools/call": tool = req["params"]["name"] args = req["params"]["arguments"] if tool == "get_patient_background": symptoms = args.get("symptoms", "") summary = summarize_patient_locally(symptoms) return { "jsonrpc": "2.0", "id": req_id, "result": { "content": [ { "type": "text", "text": json.dumps(summary) } ] } } Local GPU LLM Caller (Foundry Local Integration) This is where personalization happens â entirely on your machine, not in the cloud: Calls the local GPU LLM through Foundry Local Injects private medical data (loaded from a file or memory) Produces anonymized structured outputs Logs debug info so you can see when local inference is running FOUNDRY_LOCAL_BASE_URL = "http://127.0.0.1:52403" FOUNDRY_LOCAL_CHAT_URL = f"{FOUNDRY_LOCAL_BASE_URL}/v1/chat/completions" FOUNDRY_LOCAL_MODEL_ID = "Phi-4-mini-instruct-cuda-gpu:5" def summarize_patient_locally(symptoms: str): print("[LOCAL] Calling Foundry Local GPU model...") payload = { "model": FOUNDRY_LOCAL_MODEL_ID, "messages": [ {"role": "system", "content": PERSONAL_SYSTEM_PROMPT}, {"role": "user", "content": symptoms} ], "max_tokens": 300, "temperature": 0.1 } resp = requests.post( FOUNDRY_LOCAL_CHAT_URL, headers={"Content-Type": "application/json"}, data=json.dumps(payload), timeout=60 ) llm_content = resp.json()["choices"][0]["message"]["content"] print("[LOCAL] Raw content:\n", llm_content) cleaned = _strip_code_fences(llm_content) return json.loads(cleaned) Start a Dev Tunnel Now we need to do some plumbing work to make sure the cloud can resolve the MCP endpoint. I used Azure Dev Tunnels for this demo. The snippet below shows how to set that up in 4 PowerShell commands: PS C:\Windows\system32> winget install Microsoft.DevTunnel PS C:\Windows\system32> devtunnel create mcp-health PS C:\Windows\system32> devtunnel port create mcp-health -p 8081 --protocol http PS C:\Windows\system32> devtunnel host mcp-health I have now a public endpoint: https://xxxxxxxxx.devtunnels.ms:8081 Wire Everything Together in Azure AI Foundry Now let's us the UI to add a new custom tool as MCP for our agent: And point to the public endpoint created previously: Voila, we're done with the setup, let's test it Demo: The Cloud Agent Talks to Your Local Private LLM I am going to use a simple prompt in the agent: âHi, Iâve been feeling feverish, fatigued, and a bit short of breath since yesterday. Should I be worried?â Disclaimer: The diagnostic results and any medical guidance provided in this article are for illustrative and informational purposes only. They are not intended to provide medical advice, diagnosis, or treatment. Below is the sequence shown in real time: Conclusion â Why This Hybrid Pattern Matters Hybrid AI lets you place intelligence exactly where it belongs: high-value reasoning in the cloud, sensitive or contextual data on the local machine. This protects privacy while reducing cloud compute costsâroutine lookups, context gathering, and personal history retrieval can all run on lightweight local models instead of expensive frontier models. This pattern also unlocks powerful real-world applications: local financial data paired with cloud financial analysis, on-device coding knowledge combined with cloud-scale refactoring, or local corporate context augmenting cloud automation agents. In industrial and edge environments, local agents can sit directly next to the actionâembedded in factory sensors, cameras, kiosks, or ambient IoT devicesâproviding instant, private intelligence while the cloud handles complex decision-making. Hybrid AI turns every device into an intelligent collaborator, and every cloud agent into a specialist that can safely leverage local expertise. References Get started with Foundry Local - Foundry Local: https://learn.microsoft.com/en-us/azure/ai-foundry/foundry-local/get-started?view=foundry-classic Using MCP tools with Agents (Microsoft Agent Framework) â https://learn.microsoft.com/en-us/agent-framework/user-guide/model-context-protocol/using-mcp-tools Microsoft Learn Build Agents using Model Context Protocol on Azure â https://learn.microsoft.com/en-us/azure/developer/ai/intro-agents-mcp Microsoft Learn Full demo repo available here.457Views0likes0CommentsUshering in the next era of agentic AI with tools in Microsoft Foundry
Models are limited to their own knowledge and reasoning capabilitiesâthey canât access real-time data or perform actions on their own. Tools are what make agents truly powerful. By connecting agents to live information, enabling automation, and integrating with the apps and services you use every day, tools transform agents from passive responders into active problem-solvers. With tools, agents can deliver faster, smarter, and more connected experiences that drive real results. Microsoft Foundry is now your central hub for discovering, testing, and integrating powerful AI toolsâdesigned to accelerate every stage of your development journey. Whether youâre prototyping new ideas, optimizing production workflows, or extending the capabilities of your AI solutions, Microsoft Foundry puts everything you need at your fingertips. We are excited to announce the following capabilities to empower your experience of building agents with tools: Discover from Foundry Tools(preview): Browse a growing, curated, enterprise-ready list of 1400+ Microsoft and partner-provided tools, covering industries from databases, developer tools, analytics and more. Build your organizationâs tool catalog (preview): Build and manage a tools catalog for your own enterprise with your private and custom tools. Enhanced Enterprise Support Connect MCP servers and A2A endpoints with comprehensive authentication support, via OAuth identity passthrough, Microsoft Entra Agent Identity, Microsoft Entra Managed Identity and more Govern your tools via AI Gateway (preview) with customizable policies Leverage Azure Policy to enforce security behaviors Bring A2A endpoint to Foundry Agent Service(preview): via A2A tool, you can easily bring A2A endpoint to Foundry Agent Service with comprehensive authentication support Foundry Tools (preview) Building AI agents isnât just about intelligenceâitâs about action. To deliver real business value, agents need to connect with the systems where work happens, access enterprise data securely, and orchestrate workflows across tools your teams already use. Thatâs where Foundry Tools in Microsoft Foundry come in. With 1400+ officially curated tools from Microsoft and trusted partners, the Foundry Tools give developers and enterprises everything they need to build agents that are powerful, integrated, and ready for scale. "Celonis and Microsoft are creating the tech stack for the AI-driven, composable enterprise. By bringing the Celonis MCP Server into the Foundry Tool Catalog, weâre enabling organizations to infuse real-time process intelligence directly into their AI workflows. This integration empowers AI agents to understand business processes in context and take intelligent action through scalable, process-aware automation. Together weâre helping customers transform and continuously improve operations, delivering meaningful ROI." â Dan Brown, Chief Product Officer, Celonis âIntegrating Sophos Intelix with Foundry brings threat intelligence directly into the heart of security workflows. By embedding real-time file, URL, and IP reputation insights into AI agents built in Foundry, weâre enabling analysts to make faster, smarter decisions without leaving their existing tools. This collaboration with Microsoft transforms incident response by combining domain-specific cybersecurity expertise with the power of AI, helping security teams stay ahead of evolving threats.â â Simon Reed, Chief Research and Scientific Officer, Sophos "As AI agents reshape enterprise workflows, organizations increasingly need to connect data across multiple platforms that have historically operated in isolation,â said Ben Kus, Chief Technology Officer at Box. âBy integrating with Microsoft Foundry, Box is deepening its partnership with Microsoft to enable developers to build sophisticated agents on Azure that can intelligently leverage customersâ enterprise content in Box. Instead of relying on complex, one-off integrations to access proprietary data, the Box MCP Server provides a standardized bridge, empowering the use of interoperable agents while maintaining Boxâs enterprise-grade security and permissions.â We are excited to bring tools in different categories in Foundry Tools: Databases: Fuel Intelligent Decisions Agents thrive on data. With tools connecting to data in Azure Cosmos DB, Azure Databricks Genie, Azure Managed Redis, Azure Databases for PostgreSQL, Azure SQL, Elastic, MongoDB, and Pinecone, your agents can query, analyze, and act on your own data âpowering smarter decisions and automated insights. All through natural language, without compromising enterprise security or compliance. Agent 365 MCP Servers: Where Work Happens Starting at Ignite this year, we are excited to bring Agent 365 MCP servers to select Frontier customers. Agent 365 MCP servers are the backbone of enterprise-grade agentic automation, seamlessly connecting Outlook Calendar, Outlook Email, SharePoint, Teams, and a growing suite of productivity and collaboration tools to digital worker agents built in Foundry and other leading platforms. With the Agent 365 Tooling Gateway, agent builders gain secure, scalable access to certified MCP serversâeach designed to unlock rich, governed interactions with core business data while enforcing IT policies and compliance at every step. Whether orchestrating meetings, managing documents, or collaborating across channels, agents leverage these servers to deliver intelligent workflows with built-in observability, audit trails, and granular policy enforcement. This unified approach fulfills the Agent 365 promise: empowering organizations to innovate confidently, knowing every agent action is traceable, compliant, and ready to scale across the enterprise. Developer Tools: Build Faster, Deploy Smarter Accelerate your agent development lifecycle with tools like GitHub, Vercel, and Postman. From version control to API testing and front-end deployment, these integrations help you ship high-quality experiences faster. Custom tools from Azure Logic Apps Connectors: Automate at Scale Agents often need to orchestrate multi-step workflows across diverse systems. With hundreds of Azure Logic Apps connectors, you can automate approvals, sync data, and integrate SaaS apps without writing custom code. Build your organization tool catalog (preview) In large enterprises, the teams that build APIs and tools are often separate from those that consume them to create AI agents, leading to friction, delays, and governance challenges. The private tool catalog, powered by Azure API Center, solves this by providing a single, secure hub where internal tools are published, discovered, and managed with confidence. Key features include: Centralized publishing with metadata, authentication profiles, and version control. Easy discovery for agent developers, reducing integration time from weeks to minutes. With this organizational catalog, enterprises can turn internal capabilities into enterprise-ready AI solutions, accelerating innovation while maintaining control. Enhanced Enterprise Support for Tools As enterprises build AI agents that connect to a growing universe of APIs and agent endpoints, robust authentication and governance are non-negotiable. Microsoft Foundry delivers comprehensive support for tools and protocols, such as Model Context Protocol (MCP), A2A, and OpenAPI, ensuring every integration meets your organizationâs security and compliance standards. Flexible Authentication for Every Enterprise Scenario Foundry supports a full spectrum of authentication methods for MCP servers and A2A endpointsâincluding key-based, Microsoft Entra Agent Identity, Microsoft Entra Foundry Project Managed Identity, and OAuth Identity Passthrough. This flexibility means you can choose shared authentication for organization-wide access, or individual authentication to persist user context and enforce least-privilege access. For example, OAuth Identity Passthrough allows users to sign into the MCP server and grant the agent access to their credentials, while Microsoft Entra Agentic Identity and Managed Identity enable seamless, secure service-to-service connections. AI Gateway: Governance and Policy Enforcement (preview) For advanced governance, Foundry integrates with the AI Gateway powered by Azure API Management (APIM). Once an AI Gateway is integrated with the Foundry resource, all eligible MCP servers will automatically be governed by the AI Gateway with admin-configured custom policies. This ensures that every tool invocation adheres to enterprise-grade governanceâwithout requiring any manual configuration from developers. When enabled, admins can govern tool traffic to be routed through the AI Gateway, where enterprise policies are enforced at runtime. AI Gateway enforces powerful enterprise controls at runtime, including: Authentication and Authorization: Enforce OAuth, subscription key, or IP-based authentication, with token validation and RBAC controls. Traffic Management: Apply quotas, rate limits, and throttling to control usage and prevent abuse. Security and Compliance: Add data loss prevention, request/response validation, and custom policies for sensitive workloads. Observability: Capture unified telemetry for every tool invocation, with integrated logging and metrics through Azure Monitor and Foundry analytics. Policy Management: Admins can manage and update policies in the Azure API Management portal. A2A tool in Foundry Agent Service (preview) The Foundry Agent Service introduces a powerful pattern: you can add an A2A compatible agent to your Foundry agents simply by adding it via A2A tool. With A2A tools, your Foundry agent can securely invoke another A2A compatible agentâs capabilitiesâenabling modular, reusable workflows across teams and business units. This approach leverages comprehensive authentication options, including OAuth passthrough, Microsoft Entra to ensure every agent call is authorized, auditable, and governed by enterprise policy. Connecting agents via A2ATool VS Multi-agent Workflow Via A2A Tool: When Agent A calls Agent B via a tool, Agent B's answer is passed back to Agent A, which then summarizes the answer and generates a response to the user. Agent A retains control and continues to handle future user input. Via Workflow: When Agent A calls Agent B via Workflow or other multi-agent orchestration, the responsibility of answering the user is completely transferred to Agent B. Agent A is effectively out of the loop. All subsequent user input will be answered by Agent B. Get started today Start building with tools for intelligent agents today with Microsoft Foundry. If youâre attending Microsoft Ignite 2025, or watching on-demand content later, be sure to check out these sessions: Innovation Session: Build & Manage AI Apps with Your Agent Factory AI agents in Microsoft Foundry, ship fast, scale fearlessly AI powered automation & multi-agent orchestration in Microsoft Foundry AI builderâs guide to agent development in Foundry Agent Service To learn more, visit Microsoft Learn and explore resources including the AI Agents for Beginners, Microsoft Agent Framework, and course materials that help you build and operate agents responsibly.781Views0likes0Comments