rag
95 TopicsBuilding Agentic Systems on Azure: Microsoft Foundry Agents SDK vs Microsoft Agent Framework
In my recent experience as a Senior Consultant at Microsoft, I’ve been actively involved in designing and delivering AI-driven solutions, with a strong focus on building intelligent agents using modern frameworks. Along the way, I've built agents using both Microsoft Foundry Agents SDK (hereafter "Agents SDK") and Microsoft Agent Framework (MAF) Both approaches are powerful and capable. However, once you move beyond simple proofs of concept, the developer experience and architectural patterns start to differ significantly. This article provides a practical comparison based on real implementation experience and aims to help developers choose the right approach. Approach 1: Agents SDK Agents SDK provides a straightforward way to create agents with integrated tools and models. Example: Creating an Agent from azure.ai.projects import AIProjectClient from azure.ai.agents.models import AzureAISearchTool, AzureAISearchQueryType from azure.identity import DefaultAzureCredential client = AIProjectClient(credential=DefaultAzureCredential(), endpoint=os.getenv("AZURE_AI_PROJECT_ENDPOINT")) # Configure tools ai_search = AzureAISearchTool( index_connection_id=conn_id, index_name="my-index", query_type=AzureAISearchQueryType.SEMANTIC, ) # Create agent (persisted in Foundry portal) agent = client.agents.create_agent( model=os.getenv("AZURE_AI_AGENT_DEPLOYMENT_NAME"), name="MyAgent", instructions="You are a helpful assistant.", tool_resources=ai_search.resources, tools=ai_search.definitions, ) # Run conversation thread = client.agents.threads.create() client.agents.messages.create(thread_id=thread.id, role="user", content="Hello") run = client.agents.runs.create(thread_id=thread.id, agent_id=agent.id) What this approach provides Native integration with Azure AI services (OpenAI, AI Search, MCP) Managed execution environment Simple and quick agent setup Conceptually, this approach can be summarized as: Model + Tools + Execution Strengths ✅ Rapid development and onboarding ✅ Strong integration within the Azure ecosystem ✅ Well-suited for single-agent or tool-driven use cases ✅ Minimal infrastructure overhead Challenges observed in practice As the complexity of scenarios increases, certain limitations become more visible: Multi-agent workflows require custom orchestration logic Agent handoffs must be implemented manually Context sharing across agents requires additional design effort While this approach offers flexibility, it shifts orchestration complexity to the developer. Approach 2: Microsoft Agent Framework (MAF) Microsoft Agent Framework introduces a higher-level abstraction, focused on agent orchestration and system design. Creating an Agent from agent_framework import Agent, WorkflowBuilder, Message from agent_framework.foundry import FoundryChatClient from azure.identity import DefaultAzureCredential client = FoundryChatClient( project_endpoint=os.getenv("FOUNDRY_PROJECT_ENDPOINT"), model=os.getenv("FOUNDRY_MODEL_DEPLOYMENT_NAME"), credential=DefaultAzureCredential(), ) # Create agents (in-process only, not persisted in portal) researcher = Agent(client, name="ResearcherAgent", instructions="Research topics thoroughly.") writer = Agent(client, name="WriterAgent", instructions="Write concise summaries.") # Build and run multi-agent workflow workflow = WorkflowBuilder(start_executor=researcher).add_edge(researcher, writer).build() async for event in workflow.run(Message("user", "Summarize migration best practices"), stream=True): print(event.content) What this approach provides Built-in orchestration capabilities Native support for multi-agent workflows Structured agent lifecycle management Context and memory handling Conceptually, this can be viewed as: Agents + Orchestration + System Design Observations from implementation When implementing similar use cases using MAF: Agent responsibilities became clearly defined Routing and delegation patterns were significantly simplified Overall system architecture became easier to maintain and scale This approach encourages thinking in terms of agent ecosystems rather than isolated agents. Architecture Comparison Agents SDK Microsoft Agent Framework (MAF) Choosing the Right Approach Use Agents SDK when: You need rapid development for a single-agent use case The workflow is relatively straightforward You prefer flexibility and lower-level control Use Microsoft Agent Framework when: You are designing multi-agent systems Your solution requires routing, delegation, or handoffs Long-term scalability and maintainability are essential Pros and Cons Summary Agents SDK Pros Easy to get started Strong Azure integration Flexible design Cons Manual orchestration required Limited native multi-agent support Complexity increases as scenarios grow Microsoft Agent Framework (MAF) Pros Built-in orchestration Native multi-agent support Scalable and structured architecture Cons Learning curve for new developers More opinionated framework design Reduced low-level control compared to SDK-based approach References and Repositories 🔗 Microsoft Agent Framework (MAF) Microsoft Agent Framework – GitHub Repository Microsoft Agent Framework Samples – Tutorials & Examples Workflow Samples (Multi-agent patterns) FoundryChatClient sample (Python) Agent Framework demos - GitHub Source 📘 Documentation Microsoft Agent Framework Overview (Microsoft Learn) Agent Framework + Microsoft Foundry provider docs 🔗 Azure AI Projects / Agents SDK Azure AI Projects SDK – Python (GitHub Source) Azure AI Projects Agents (.NET SDK repo) 📘 Documentation Azure AI Projects SDK (Python) – Microsoft Learn Azure AI Agents SDK – Microsoft Learn Conclusion Azure AI Projects and Microsoft Agent Framework both play important roles in the modern agent development landscape. Agents SDK enables quick and flexible agent development Microsoft Agent Framework enables structured, scalable agent systems In practice, the choice depends on whether you are building a single agent feature or a multi-agent system. Final Thought Agents SDK helps you get started quickly. Microsoft Agent Framework helps you scale with confidence In a follow-up blog, I’ll dive into how the M365 Agents SDK compares with Microsoft Agent Framework, especially in the context of enterprise productivity and Copilot experiences.Building an End-to-End Azure RAG Strategy Agent with MS Foundry
High-Level Architecture This architecture represents an end-to-end Retrieval-Augmented Generation (RAG) pipeline where raw documents are ingested from Azure Blob Storage, processed using Document Intelligence, transformed into embeddings via Azure OpenAI, and indexed in Azure AI Search for hybrid retrieval. A Foundry/MAF-based agent orchestrates query processing by combining user input with relevant search results and generates contextual responses, which are exposed through a FastAPI or CLI interface. This solution is composed of two main layers: 1. Data Ingestion Layer (RAG Pipeline) This layer transforms raw enterprise documents into searchable knowledge. Flow: Raw documents stored in Azure Blob Storage Supported formats: PDF, DOCX, PPTX, images, etc. Document Intelligence extraction Extracts: Text Tables Key-value pairs Structure Writes output as structured JSON back to Blob (processed/) Chunking + Embedding Documents are split into chunks Each chunk is embedded using Azure OpenAI (text-embedding-*) Indexing into Azure AI Search Creates a hybrid index: Keyword search Semantic ranking Vector search Enables flexible retrieval strategies 2. Query Layer (Strategy Agents) This layer enables intelligent query answering. Flow: User sends a query via: FastAPI endpoint CLI interface Query is handled by: Microsoft Agent Framework (MAF) agent Running on Azure AI Foundry Agent: Queries Azure AI Search Retrieves top relevant chunks Injects them into LLM prompt LLM generates grounded response This follows the standard RAG pattern: Retrieval → Augmentation → Generation End-to-End Flow Key Azure Services Used Service Purpose Azure Blob Storage Raw + processed document storage Azure AI Document Intelligence Extract structured content Azure OpenAI Embeddings + LLM generation Azure AI Search Hybrid retrieval engine Azure AI Foundry Agent orchestration Microsoft Agent Framework Agent execution layer Why this Architecture Matters This solution goes beyond basic RAG and provides: Hybrid Retrieval Combines keyword + semantic + vector search Improves recall and accuracy Structured Document Parsing Handles complex enterprise documents Extracts tables and metadata Agent-Based Orchestration Enables reasoning over retrieval results Extensible for multi-agent workflows Scalable Data Pipeline Supports continuous ingestion Works with large document collections Enterprise Considerations Use Managed Identity for secure service access Apply RBAC on Cosmos DB / Search / Storage Enable Private Endpoints for network isolation Use Guardrails + Evaluations in Foundry Summary This repository demonstrates a production-ready Azure RAG architecture: Ingest → Extract → Chunk → Embed → Index Retrieve → Reason → Generate Powered by Azure AI Foundry + Agent Framework By combining data engineering + AI orchestration, it enables enterprise AI systems that are: Accurate Grounded Extensible Repo: https://github.com/snd94/azure-rag-strategy-agent Please refer to the Microsoft Learn Documentation for further information: Azure AI Search documentation - Azure AI Search | Microsoft Learn Document Intelligence documentation - Quickstarts, Tutorials, API Reference - Foundry Tools | Microsoft Learn How to generate embeddings with Azure OpenAI in Microsoft Foundry Models - Microsoft Foundry | Microsoft Learn How to generate embeddings with Azure OpenAI in Microsoft Foundry Models - Microsoft Foundry | Microsoft Learn Microsoft Agent Framework Overview | Microsoft Learn What is Microsoft Foundry? - Microsoft Foundry | Microsoft LearnWhen RAG Hits the Wall: Designing Systems That Scale from 1,000 to 1 million Documents
Introduction Retrieval-Augmented Generation (RAG) has quickly become the default architecture for grounding Large Language Models (LLMs) in enterprise data. And at small scale, it works exceptionally well. 100 documents → Excellent accuracy 1,000 documents → Still predictable With around 100 documents, RAG systems tend to produce highly accurate responses. Even at 1,000 documents, behavior remains predictable and reliable. However, as systems grow beyond tens of thousands - and especially into the range of hundreds of thousands or millions of documents - many implementations begin to degrade in surprising ways. Latency begins to rise nonlinearly. Retrieval precision declines, costs increase, and responses grow inconsistent. What looks like a model issue is usually an architectural one. The Hidden Theory Behind Early RAG Success Early RAG systems work well not because they are perfectly designed, but because small datasets are forgiving. In smaller corpora, irrelevant retrieval is naturally rare. Semantic similarity remains tightly clustered, and noise does not overwhelm signal. This creates an illusion of robustness - systems seem accurate even when the underlying retrieval strategy is weak. As scale increases, this illusion disappears. Breaking Point #1: Chunk Explosion (Entropy Growth) What Happens Most ingestion pipelines rely on token-based chunking: Document -> Fixed-size chunks -> Embed everything As document count increases, the system experiences entropy growth: The number of chunks grows faster than the number of documents, leading to a dense and noisy vector space. Similar information becomes fragmented, and retrieval precision drops. This is a manifestation of the curse of dimensionality - as the number of vectors increases, distance metrics lose meaning, and “nearest neighbors” stop being truly relevant. The Shift: Structural Information Retrieval To solve this production-grade RAG systems reintroduce structure. Instead of blindly splitting text, semantic chunking aligns content with logical boundaries like headings and sections. This preserves meaning and improves retrieval quality. Deduplication removes repeated templates and boilerplate, reducing unnecessary noise in the system. Hierarchical indexing allows retrieval to operate at multiple levels - document, section, and chunk - making search both more efficient and more accurate. These changes restore order in the vector space and significantly improve retrieval performance. Breaking Point #2: Vector Search Saturation What Happens As data grows, latency becomes one of the biggest bottlenecks. Many systems rely on runtime-heavy operations such as generating embeddings on demand or querying large, unpartitioned indexes. This leads to unbounded computation and poor scalability. Over time, retrieval cost trends toward linear complexity. Cache inefficiencies increase, and tail latency begins to dominate the user experience. The Shift: Systems Thinking Scaling RAG requires applying distributed systems principles. Partitioned indexes reduce the search space, allowing queries to operate on smaller, more relevant subsets of data. Precomputed embeddings shift expensive computation to ingestion time, eliminating runtime overhead. Caching strategies, informed by real-world usage patterns, significantly improve performance by reusing frequent query results. Together, these changes make latency predictable and systems more cost-efficient. The Final Trap: Context does not equal to Intelligence What Happens A common mistake in RAG systems is assuming that more context leads to better answers. In reality, LLMs are attention limited. As more tokens are added, attention becomes diluted, and the model struggles to focus on what matters. Excessive context introduces noise, reducing the overall quality of responses. The Shift: Information Compression Effective systems focus on quality over quantity. By limiting retrieval to the most relevant chunks, summarizing context, and grounding responses with citations, RAG systems achieve higher information density and better reasoning performance. What a Scalable RAG System Actually Represent At scale, RAG is no longer an LLM feature. It becomes a retrieval system with an LLM as a reasoning layer. Prototype RAG Production RAG Token chunking Structured IR Vector-only search Hybrid retrieval No ranking theory Reranking models Runtime-heavy Precomputed pipelines More context Information compression Final Insight Scaling RAG is not primarily a machine learning problem. It is a combination of information retrieval and distributed systems engineering, with the LLM acting as the final layer. Closing Thought If your RAG system works with 1,000 documents, you’ve validated an idea. If it works with 1 million documents, you’ve respected theory - and built an architecture. References RAG and Generative AI - Azure AI Search | Microsoft Learn Chunk and Vectorize by Document Layout - Azure AI Search | Microsoft Learn Chunk Documents - Azure AI Search | Microsoft Learn Hybrid Search Overview - Azure AI Search | Microsoft LearnConfidence-Aware RAG: Teaching Your AI Pipeline to Acknowledge Uncertainty
Introduction Retrieval-Augmented Generation (RAG) has become the standard architecture for grounding Large Language Models (LLMs) with enterprise data. By retrieving relevant documents before generating a response, RAG helps reduce hallucinations compared to relying on model knowledge alone. However, an important limitation remains in most implementations: RAG systems can produce confident-sounding answers even when the underlying data is incomplete, irrelevant, or missing. This happens when: • Retrieved documents are loosely related to the query • The answer exists partially but lacks key details • Retrieved sources contradict each other • The query falls entirely outside the knowledge base In enterprise environments, this behavior carries real risk. A reliable AI system must not only answer well - it must also know when not to answer. This article presents a practical confidence-aware RAG architecture using three layered strategies: retrieval confidence scoring, citation validation, and LLM-based abstention - all implemented with Azure AI Search and Azure OpenAI. The Problem: Confident Hallucination Consider a real-world enterprise scenario. An employee asks: "What is our company's parental leave policy for contractors?""What is our company's parental leave policy for contractors?" The knowledge base contains parental leave policies for full-time employees - but nothing specific to contractors. A standard RAG pipeline retrieves the closest matching document and confidently presents full-time employee policy as the answer. This outcome is worse than returning no answer. The user trusts the system, acts on incorrect information, and the error may not surface until real consequences follow. This pattern is sometimes called hallucination laundering - the RAG architecture creates the appearance of factual grounding while the response is not actually supported by the retrieved evidence. Fixing this requires deliberate confidence checkpoints at each stage of the pipeline. Architecture Overview A standard RAG pipeline follows a simple path: User Query → Retrieve Documents → Generate Answer A confidence-aware pipeline adds two explicit decision checkpoints: Each layer catches failures the previous one may miss. Together, they form a defense-in-depth approach to output reliability. Strategy 1: Retrieval Confidence Scoring The first checkpoint evaluates whether retrieved documents are genuinely relevant before passing them to the LLM. Azure AI Search returns a @search.rerankerScore when semantic ranking is enabled - a value on the 0-4 scale that reflects how well each document matches the query intent, not just keyword overlap. from azure.search.documents import SearchClient from azure.identity import DefaultAzureCredential search_client = SearchClient( endpoint=AZURE_SEARCH_ENDPOINT, index_name="enterprise-knowledge-base", credential=DefaultAzureCredential() ) def retrieve_with_confidence(query: str, threshold: float = 1.5, top_k: int = 5): results = search_client.search( search_text=query, query_type="semantic", semantic_configuration_name="default", top=top_k, select=["content", "title", "source"] ) confident_results = [] for result in results: reranker_score = result.get("@search.rerankerScore", 0) if reranker_score >= threshold: confident_results.append({ "content": result["content"], "title": result["title"], "source": result["source"], "score": reranker_score }) return confident_results If no documents clear the threshold, the pipeline abstains rather than forcing a low-quality answer: results = retrieve_with_confidence(user_query, threshold=1.5) if not results: return { "answer": ( "I don't have enough information in the knowledge base to answer " "this question. Please contact the relevant team for assistance." ), "status": "abstained_retrieval" } Threshold tuning: Start at 1.5 on the 0-4 scale. Evaluate against a labeled test set and adjust based on your precision/recall requirements. Higher thresholds reduce false positives but may increase abstention on edge cases. Strategy 2: Citation Validation Even when retrieval scores are high, the LLM may synthesize information that does not exist in the retrieved context. Citation validation addresses this by requiring the model to ground every factual claim in a specific named source - and then programmatically verifying those citations exist in the retrieved set. from openai import AzureOpenAI client = AzureOpenAI( api_key=AZURE_OPENAI_API_KEY, azure_endpoint=AZURE_OPENAI_ENDPOINT, api_version="2025-12-01-preview" ) ANSWER_WITH_CITATIONS_PROMPT = """ You are an enterprise assistant. Answer the question using ONLY the provided context. RULES: 1. Every factual claim MUST include a citation in the format [Source: <title>]. 2. If the context does not contain enough information, respond with: "I don't have sufficient information to answer this question." 3. Do NOT infer, assume, or use knowledge outside the provided context. 4. If context partially answers the question, state what you know and explicitly note what information is missing. Context: {context} Question: {question} Answer: """ def generate_answer(question: str, context: str, sources: list) -> dict: prompt = ANSWER_WITH_CITATIONS_PROMPT.format( context=context, question=question ) response = client.chat.completions.create( model=AZURE_DEPLOYMENT_NAME, messages=[{"role": "user", "content": prompt}], temperature=0 ) answer = response.choices[0].message.content.strip() validation = validate_citations(answer, sources) return {"answer": answer, "citation_check": validation} The validation function checks that every citation in the answer maps to a document that was actually retrieved: import re def validate_citations(answer: str, sources: list) -> dict: cited = re.findall(r'\[Source:\s*(.+?)\]', answer) source_titles = {s["title"].lower().strip() for s in sources} valid, invalid = [], [] for citation in cited: if citation.lower().strip() in source_titles: valid.append(citation) else: invalid.append(citation) return { "total_citations": len(cited), "valid": valid, "invalid": invalid, "is_trustworthy": len(invalid) == 0 and len(cited) > 0 } If is_trustworthy is False, the pipeline flags the response for review or suppresses it: if not generation["citation_check"]["is_trustworthy"]: return { "answer": "I found related information but cannot provide a reliable answer based on the available sources.", "status": "abstained_citation" } Strategy 3: LLM-Based Abstention Scoring The third layer adds a second LLM call that acts as a quality judge - explicitly evaluating whether the generated answer is well-supported by the retrieved context, independent of citation formatting. ABSTENTION_JUDGE_PROMPT = """ You are an answer quality judge. Given a question, retrieved context, and a generated answer, evaluate whether the answer is fully supported by the context. Respond ONLY in JSON format: {{ "verdict": "supported" | "partial" | "unsupported", "confidence": <float between 0.0 and 1.0>, "reasoning": "<brief explanation>" }} Question: {question} Context: {context} Answer: {answer} """ def judge_answer(question: str, context: str, answer: str) -> dict: import json prompt = ABSTENTION_JUDGE_PROMPT.format( question=question, context=context, answer=answer ) response = client.chat.completions.create( model=AZURE_DEPLOYMENT_NAME, messages=[{"role": "user", "content": prompt}], temperature=0 ) return json.loads(response.choices[0].message.content.strip()) Integrate the judge with a confidence threshold of 0.6: judgement = judge_answer(user_query, context, generation["answer"]) if judgement["verdict"] == "unsupported" or judgement["confidence"] < 0.6: return { "answer": "I don't have sufficient information to answer this question confidently.", "status": "abstained_judge" } if judgement["verdict"] == "partial": generation["answer"] += ( "\n\nNote: This answer may be incomplete. " "Some aspects of your question were not covered in the available documents." ) End-to-End Pipeline Combining all three strategies gives a complete confidence-aware pipeline: def confidence_aware_rag(user_query: str) -> dict: # Layer 1: Retrieve with confidence gating results = retrieve_with_confidence(user_query, threshold=1.5) if not results: return { "answer": "I don't have enough information in the knowledge base to answer this.", "status": "abstained_retrieval" } context = "\n\n".join(r["content"] for r in results) # Layer 2: Generate with citation requirements generation = generate_answer(user_query, context, results) if not generation["citation_check"]["is_trustworthy"]: return { "answer": "I found related information but cannot provide a reliable answer.", "status": "abstained_citation" } # Layer 3: Judge the answer judgement = judge_answer(user_query, context, generation["answer"]) if judgement["verdict"] == "unsupported" or judgement["confidence"] < 0.6: return { "answer": "I don't have sufficient information to answer this question confidently.", "status": "abstained_judge" } if judgement["verdict"] == "partial": generation["answer"] += ( "\n\nNote: This answer may be incomplete. " "Some aspects of your question were not covered in available documents." ) return { "answer": generation["answer"], "status": "answered", "confidence": judgement["confidence"], "sources": [r["source"] for r in results[:3]] }def confidence_aware_rag(user_query: str) -> dict: # Layer 1: Retrieve with confidence gating results = retrieve_with_confidence(user_query, threshold=1.5) if not results: return { "answer": "I don't have enough information in the knowledge base to answer this.", "status": "abstained_retrieval" } context = "\n\n".join(r["content"] for r in results) # Layer 2: Generate with citation requirements generation = generate_answer(user_query, context, results) if not generation["citation_check"]["is_trustworthy"]: return { "answer": "I found related information but cannot provide a reliable answer.", "status": "abstained_citation" } # Layer 3: Judge the answer judgement = judge_answer(user_query, context, generation["answer"]) if judgement["verdict"] == "unsupported" or judgement["confidence"] < 0.6: return { "answer": "I don't have sufficient information to answer this question confidently.", "status": "abstained_judge" } if judgement["verdict"] == "partial": generation["answer"] += ( "\n\nNote: This answer may be incomplete. " "Some aspects of your question were not covered in available documents." ) return { "answer": generation["answer"], "status": "answered", "confidence": judgement["confidence"], "sources": [r["source"] for r in results[:3]] } Choosing the Right Strategies for Your Use Case Each strategy adds a layer of safety at a different cost. The right combination depends on the stakes involved in your deployment. Strategy Added Cost Latency Best For Retrieval Confidence Scoring None (uses existing search scores) None All RAG applications - this should be universal Citation Validation Minimal (regex post-processing) Negligible Regulated industries, compliance, audit trails LLM Abstention Judge One additional LLM call +1-3 seconds High-stakes decisions - financial, legal, medical For most enterprise applications, combining retrieval scoring and citation validation provides a strong baseline with minimal overhead. The judge layer is most valuable when incorrect answers carry significant business or compliance risk. Threshold calibration There is a meaningful tradeoff in threshold selection. Setting thresholds too high reduces hallucination but increases abstention - the system may refuse to answer even when reliable information is available. The recommended approach is to build a labeled evaluation set of query/answer pairs, run the pipeline at multiple threshold values, and select the point that meets your precision/recall requirements for the specific domain. When to Apply This Pattern Confidence-aware RAG is most valuable in deployments where: Data coverage is uneven - the knowledge base may have detailed coverage in some areas and gaps in others, making it difficult to predict when retrieval will be reliable Errors carry downstream consequences - healthcare documentation, legal and compliance search, financial reporting, and regulated industries where a wrong answer is worse than no answer Users have varying expertise - non-expert users may not recognize a plausible-sounding but incorrect response, making transparent uncertainty signals especially important Audit or traceability requirements apply - the ability to trace each answer back to a specific source with a confidence signal supports governance and review workflows Conclusion Building a RAG system that retrieves documents and generates responses is relatively straightforward. Building one that understands the limits of its own knowledge requires deliberate design. The three strategies covered here - retrieval confidence scoring, citation validation, and LLM-based abstention - form a layered defense against the most common failure mode in production RAG systems: the confident, well-formatted, completely unreliable answer. The most dangerous AI system is not one that fails openly. It is one that fails silently, with confidence. Teaching your pipeline to say "I don't know" is not a limitation. It is a feature that builds user trust and makes enterprise AI adoption sustainable over time.Build AI RAG Apps with LangChain, Azure DocumentDB and Microsoft Foundry: Step-by-Step Guide
Scenario Imagine you are building your company’s RAG chat application using Microsoft Foundry - Azure OpenAI and orchestrating the flow with LangChain. The chat experience works, but now it needs to be grounded in your company’s data. You generate embeddings and want to store and query them without adding another database or complex sync pipeline. Instead of stitching services together, you use Azure DocumentDB (with MongoDB compatibility) with built-in vector search to store your JSON data and embeddings in one place. You deploy the app to Azure App Service and quickly compare vector search alone versus a full RAG pipeline, sharing it with your team for testing. What will you learn? In this blog, you'll learn to: Create an Azure DocumentDB (with MongoDB compatibility) resource. Create an embeddings and a chat deployment in Microsoft Foundry Azure OpenAI portal. Create an Azure App Service website with continuous deployment from GitHub. Configure Azure App Service application settings to enable communication between Azure resources. Configure GitHub workflow to work successfully. What is the main objective? Build AI Powered RAG Application using LangChain, Microsoft Foundry Azure OpenAI, and Azure DocumentDB (with MongoDB compatibility): Step-by-Step Guide Prerequisites An Azure subscription. If you don’t already have one, you can sign up for an Azure free account. For students, you can use the free Azure for Students offer which doesn’t require a credit card only your school email. A GitHub account. Summary of the steps: Step 1: Create an Azure DocumentDB (with MongoDB compatibility) resource Step 2: Create a Microsoft Foundry - Azure OpenAI resource and Deploy chat and embedding Models Step 3: Create an Azure App Service and Deploy the RAG Chat Application Step 1: Create an Azure DocumentDB (with MongoDB compatibility) resource In this step, you'll: Open the Azure Portal. Create an Azure DocumentDB (with MongoDB compatibility) resource. Open the Azure Portal 1. Visit the Azure Portal https://portal.azure.com in your browser and sign in. Now you are inside the Azure portal! Create a new Azure DocumentDB (with MongoDB compatibility) resource In this step, you create an Azure DocumentDB (with MongoDB compatibility) resource to store your data, vector embedding, and perform vector search. 1. Type documentdb in the search bar at the top of the portal page and select Azure DocumentDB (with MongoDB compatibility) from the available options. 2. Select Create from the toolbar to start provisioning your new cluster. 3. Add the following information to create a resource: What Value Subscription Use your preferred subscription. It's advised to use the same subscription across all the resources that communicate with each other on Azure. Resource group Select Create new to create a new resource group. Enter a unique name for the resource group. Cluster name Enter a globally unique name. Location Select a region close to you for the best response time. For example, Select UK South. MongoDB version Select the latest available version of MongoDB 4. Select Configure to configure your cluster tier. 5. Add the following information to configure the cluster tier. You can scale it up later: What Value Cluster tier Select M25 tier, 2 (Burstable) vCores. Storage Select 32 GiB. 6. Select Save. 7. Enter the cluster Admin Username and Password and store them in a secure location. 8. Select Next to configure the networking settings. 9. Select Allow Public Access from Azure services and resources within the Azure to this cluster. 10. Select Add current IP address to the firewall rules to allow local access to the cluster. 11. Select Review + create. 12. Confirm your configuration settings and select Create to start provisioning the resource. Note: The cluster creation can take up to 10 minutes. It's recommended to move on with the rest of the steps and get back to it later. Step 2: Create a Microsoft Foundry - Azure OpenAI resource and Deploy chat and embedding Models In this step, you'll: Create a Microsoft Foundry Azure OpenAI resource. Create chat and embedding model deployments. Create an Azure OpenAI resource In this step, you create an Azure OpenAI Service resource that enables you to interact with different large language models (LLMs). 1. Type openai in the search bar at the top of the portal page and select Azure OpenAI from the available options. 2. Select Create from the toolbar then select Azure OpenAI to provision a new Azure OpenAI resource. 3. Add the following information to create a resource: What Value Subscription Use the same subscription you used to apply for Azure OpenAI access. Resource group Use the resource group you created in the previous step. Region Select a region close to you for the best response time. For example, Select UK South. Name Enter a globally unique name. Pricing tier Select S0. Currently, this is the only available pricing tier. 4. Now that the basic information is added, select Next to confirm your details and proceed to the next page. 5. Select Next to confirm your network details. 6. Select Next to confirm your tag details. 7. Confirm your configuration settings and select Create to start provisioning the resource. Wait for the deployment to finish. 8. After the deployment finishes, select Go to resource to inspect your created resource. Here, you can manage your resource and find important information like the endpoint URL and API keys. Create chat and embedding model deployments In this step, you create an Azure OpenAI embedding model deployment and a chat model deployment. Creating a deployment on your previously provisioned resource allows you to generate text embeddings (i.e. numerical representation for text) and have a natural language conversation with your data. 1. Select Go to Foundry portal from the toolbar to open the studio. 2. Select Deployments from the Shared resources left side menu to go to the deployments tab. 3. Select + Deploy model from the toolbar then select Deploy base model from the options. A Deploy model window opens. 4. Type gpt-4o-mini to search for the model then select it then select Use model. 5. Select Continue with existing setup to proceed to next step. 6. Refresh page and repeat previous steps to select the model then select Confirm. 7. Review selected options then select Deploy. 8. Select + Deploy model from the toolbar then select Deploy base model from the options. A Deploy model window opens. 9. Type text-embedding-3-small to search for the model then select it then select Confirm. 10. Review selected options then select Deploy. Step 3: Create an Azure App Service and Deploy the RAG Chat Application In this step, you'll: Fork the sample repository on GitHub. Create an Azure App Service resource with a deployment from GitHub. Modify Azure App Service Application settings in the Azure portal. Configure the workflow to deploy your application from GitHub. Test the website before and after adding the data. Fork the Sample Repository on GitHub In this step, you create a copy from the source code on your GitHub account to be able to edit it and use it later. 1. Visit the sample github.com/Azure-Samples/Cosmic-Food-RAG-app in your browser and sign in. 2. Select Fork from the top of the sample page. 3. Select an owner for the fork then, select Create fork. Create an Azure App Service resource with a deployment from GitHub In this step, you create an Azure App service resource and connect it with your GitHub account to deploy a Python application. 1. Type app service in the search bar at the top of the portal page and select App Services from the available options. 2. Select Create Web App from the toolbar to start provisioning a new web application. 3. Add the following information to fill in the basic configuration of the application: What Value Subscription Use the same subscription you used to apply for Azure OpenAI access. Resource group Use the same resource group you created before. Name Enter a unique name for your website. For example, cosmic-food-rag. Publish? Select Code. This option specifies whether your deployment consists of code or a container. Runtime stack Select Python 3.12. Operating System Select Linux. Region Select UK South. This is the region where the rest of the resources you created reside. 4. Add the following information to create the app service plan. You can scale it up later: What Value Linux Plan Select a pre-existing plan or create a new plan. Pricing Plan Select Basic B1. 5. Select Deployment from the toolbar to move to the deployment configuration tab. 6. Add the following information to enable continuous deployment from GitHub: What Value Continuous deployment Select Enable. GitHub account Select your GitHub account. Organization Select your organization. If you are using your personal account then select it. Repository Select Cosmic-Food-RAG-app. Branch Select main. 7. Select Review + create. 8. Confirm your configuration settings and select Create to start provisioning the resource. Wait for the deployment to finish. 9. After the deployment finishes, select Go to resource to inspect your created resource. Here, you can manage your resource and find important information like the application settings and logs. Modify Azure App service Application settings in the Azure portal In this step, you configure the Application settings to make the website able to communicate with other cloud resources. 1. In the Web App resource, select Environment variables from the left side menu. 2. Select + Add to add new environment variables to the function configuration. 3. Add the following names and values one by one and select Ok. Make sure to add your own values. These application settings are for the Azure OpenAI resources that you created: What Value OPENAI_API_VERSION 2024-10-21 AZURE_OPENAI_CHAT_DEPLOYMENT_NAME gpt-4o-mini AZURE_OPENAI_CHAT_MODEL_NAME gpt-4o-mini AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME text-embedding-3-small AZURE_OPENAI_EMBEDDINGS_MODEL_NAME text-embedding-3-small AZURE_OPENAI_EMBEDDINGS_DIMENSIONS 1536 AZURE_OPENAI_DEPLOYMENT_NAME <azureOpenAiResourceName> AZURE_OPENAI_ENDPOINT https://<azureOpenAiResourceName>.openai.azure.com/ AZURE_OPENAI_API_KEY <azureOpenAiResourceKey> You can get the Azure OpenAI key from the Azure OpenAI resource page. Select Keys and Endpoint from the Resource Management section and copy any of the available keys. These application settings are for Azure DocumentDB (with MongoDB compatibility): AZURE_COSMOS_USERNAME <documentUsername> AZURE_COSMOS_PASSWORD <documentPassword> AZURE_COSMOS_CONNECTION_STRING mongodb+srv://<user>:<password>@<clusterName>.global.mongocluster.cosmos.azure.com/?tls=true&authMechanism=SCRAM-SHA-256&retrywrites=false&maxIdleTimeMS=120000 You can get the DocumentDB connection string from the Azure DocumentDB (with MongoDB compatibility) resource page. Select Connection strings and copy the connection string. Make sure to replace the user and password with the ones you created. These application settings are new and are used for resources that will be created when the application starts you can use any value for them: AZURE_COSMOS_DATABASE_NAME <documentDatabaseName> ex. CosmicDB AZURE_COSMOS_COLLECTION_NAME <documentContainerName> ex. CosmicFoodCollection AZURE_COSMOS_INDEX_NAME <documentIndexName> ex. CosmicIndex 4. Select Apply to save your newly added environment variables. 5. Select Configuration then Stack settings to edit the application startup command. 6. Type entrypoint.sh in the startup command field then select Apply. Configure the Workflow to deploy your application from GitHub In this step, you modify the GitHub deployment workflow to point to the folder that contains the application. 1. Visit your forked repository on GitHub and notice the failing workflow. 2. Open the workflow file .github/workflows/main_cosmic-food-rag.yml. 3. Open the file and select the pen icon to edit it. 4. Modify line 41 from . to src/. 5. Remove the optional Local Build Section since the application already has tests that cover this part. 6. Add this section to Install Node 22 and build the static frontend. 7. Select Commit changes, and review your commit message and description. Select Commit changes. The final workflow file should look like this: # Docs for the Azure Web Apps Deploy action: https://github.com/Azure/webapps-deploy # More GitHub Actions for Azure: https://github.com/Azure/actions # More info on Python, GitHub Actions, and Azure App Service: https://aka.ms/python-webapps-actions name: Build and deploy Python app to Azure Web App - cosmic-food-rag on: push: branches: - main workflow_dispatch: jobs: build: runs-on: ubuntu-latest permissions: contents: read #This is required for actions/checkout steps: - uses: actions/checkout@v4 - name: Set up Node 22 uses: actions/setup-node@v6 with: node-version: 22 - name: Install Node Packages & Build Static Site run: cd frontend && npm install && npm run build # By default, when you enable GitHub CI/CD integration through the Azure portal, the platform automatically sets the SCM_DO_BUILD_DURING_DEPLOYMENT application setting to true. This triggers the use of Oryx, a build engine that handles application compilation and dependency installation (e.g., pip install) directly on the platform during deployment. Hence, we exclude the antenv virtual environment directory from the deployment artifact to reduce the payload size. - name: Upload artifact for deployment jobs uses: actions/upload-artifact@v4 with: name: python-app path: | src/ !antenv/ # 🚫 Opting Out of Oryx Build # If you prefer to disable the Oryx build process during deployment, follow these steps: # 1. Remove the SCM_DO_BUILD_DURING_DEPLOYMENT app setting from your Azure App Service Environment variables. # 2. Refer to sample workflows for alternative deployment strategies: https://github.com/Azure/actions-workflow-samples/tree/master/AppService deploy: runs-on: ubuntu-latest needs: build permissions: id-token: write #This is required for requesting the JWT contents: read #This is required for actions/checkout steps: - name: Download artifact from build job uses: actions/download-artifact@v4 with: name: python-app - name: Login to Azure uses: azure/login@v2 with: client-id: ${{ secrets.AZUREAPPSERVICE_CLIENTID_5672547ED09F46D59DD431ACF5A29F28 }} tenant-id: ${{ secrets.AZUREAPPSERVICE_TENANTID_0059913572C8467882D3999D0E0DD5B8 }} subscription-id: ${{ secrets.AZUREAPPSERVICE_SUBSCRIPTIONID_7C42E3352C5D47F084CB0CD14F549D27 }} - name: 'Deploy to Azure Web App' uses: azure/webapps-deploy@v3 id: deploy-to-webapp with: app-name: 'cosmic-food-rag' slot-name: 'Production' 8. Select Actions to review the workflow run status. Test the website before and After adding the data In this step, you test the application before adding the data, add the data, and test again. 1. Select the workflow name to open it and get the website URL. 2. Select any of the suggested messages or type your own and it should respond with No results found. 3. Navigate to your Azure App Service resource page and select SSH then select Go to open a new SSH page. 4. In the SSH terminal, run these commands: uv sync --active uv run --active ./scripts/add_data.py --file="./data/food_items.json" 5. Navigate back to the live website and type in the chat message Do you have any vegan food dishes? and it should respond with the correct answer now. Congratulations!! You successfully built the full application. Clean Up Once you finish experimenting on Microsoft Azure you might want to delete the resources to not consume any more money from your subscription. You can delete the resource group and it will delete everything inside it or delete the resources one by one that's totally up to you. Conclusion Congratulations! You've learned how to create an Azure DocumentDB (with MongoDB compatibility) cluster, how to create a Microsoft Foundry - Azure OpenAI resource, how to deploy an embedding model and a chat model from the Foundry portal, how to create an Azure App Service and configure continuous deployment with GitHub, and how to modify application settings to enable the communication across Azure resources. By using these technologies, you can build a RAG chat application with the option to perform vector search too over your own data and provide grounded (relevant) responses. Next steps Documentation Azure OpenAI in Microsoft Foundry models Understand embeddings in Azure OpenAI in Microsoft Foundry Models (classic) Azure DocumentDB (with MongoDB compatibility) documentation Integrated vector store in Azure DocumentDB LangChain Python documentation Training Content Develop generative AI apps in Azure Found this useful? Share it with others and follow me to get updates on: Twitter (twitter.com/john00isaac) LinkedIn (linkedin.com/in/john0isaac) Feel free to share your comments and/or inquiries in the comment section below.. See you in future demos!352Views1like1CommentFrom Test Cases to Trust: Elevating Enterprise Quality with GitHub Copilot
The Traditional QA Bottleneck In complex enterprise systems, QA teams often face familiar challenges: Time‑consuming test case creation from evolving requirements Repetitive automation scripting and refactoring Heavy regression cycles under tight release timelines Limited bandwidth for deeper risk analysis and exploratory testing None of these issues are caused by lack of skill—they’re symptoms of scale and complexity. This is where GitHub Copilot entered our workflow—not as a “magic button,” but as a thinking partner. Where GitHub Copilot Actually Helped Used responsibly, Copilot added value in very specific QA scenarios: Faster Test Design from Requirements Transforming business or technical requirements into structured test scenarios is intellectually demanding but time-intensive. Copilot helped accelerate: Initial test scenario drafting Gherkin-style acceptance criteria Coverage identification for edge and negative cases The result wasn’t “auto-generated tests,” but faster starting points, reviewed and refined by humans. Accelerating Automation Without Losing Control Whether working with UI automation or API tests, a significant portion of effort goes into boilerplate code, assertions, and structuring. Copilot assisted with: Suggesting test skeletons Refactoring repetitive code Improving readability and consistency This freed engineers to focus on test intent, not syntax. Supporting Debugging and Maintenance Automation maintenance is often underestimated. Copilot helped: Identify potential fixes during test failures Suggest improvements during refactoring Reduce turnaround time during regression cycles Again, nothing was auto‑merged. Human review remained non‑negotiable. The Most Important Shift: QA Mindset The real impact of Copilot wasn’t just efficiency—it changed how QA engineers spent their time. Instead of: Writing repetitive scripts Manually expanding similar test cases The team could focus more on: Risk‑based testing Failure pattern analysis Cross‑team quality discussions Improving test strategy and coverage depth In short, AI didn’t remove QA effort—it redirected it to higher‑value work. Responsible AI Was Not Optional In an enterprise setup, responsible AI usage matters. Key principles we followed: No blind acceptance of AI suggestions Strict human validation of all test logic Awareness of data sensitivity and compliance boundaries Treating Copilot as an assistant, not an authority This balance ensured quality and trust were never compromised in the pursuit of speed. What This Means for QA Teams From this experience, one thing became clear: AI won’t replace QA engineers. But QA engineers who use AI effectively will redefine quality. GitHub Copilot helped shift QA from execution to enablement—from writing tests faster to thinking about quality better. For enterprise teams, this is a powerful evolution. Final Thoughts Quality engineering is no longer just about finding defects—it’s about enabling confidence in delivery. Tools like GitHub Copilot, when used responsibly, can become catalysts in that transformation. The future of QA isn’t manual vs automation. It’s human judgment amplified by AI assistance. And that’s where real quality lives. References Create and manage manual test cases - Azure Test Plans | Microsoft Learn What is Azure Test Plans? Manual, exploratory, and automated test tools. - Azure Test Plans | Microsoft Learn Create and manage test plans - Azure Test Plans | Microsoft LearnThe Future of Agentic AI: Inside Microsoft Agent Framework 1.0
Agentic AI is rapidly moving beyond demos and chatbots toward long‑running, autonomous systems that reason, call tools, collaborate with other agents, and operate reliably in production. On April 3, 2026, Microsoft marked a major milestone with the General Availability (GA) release of Microsoft Agent Framework 1.0, a production‑ready, open‑source framework for building agents and multi‑agent workflows in.NET and Python. [techcommun...rosoft.com] In this post, we’ll deep‑dive into: What Microsoft Agent Framework actually is Its core architecture and design principles What’s new in version 1.0 How it differs from other agent frameworks When and how to use it—with real code examples What Is Microsoft Agent Framework? According to the official announcement, Microsoft Agent Framework is an open‑source SDK and runtime for building AI agents and multi‑agent workflows with strong enterprise foundations. Agent Framework provides two primary capability categories: 1. Agents Agents are long‑lived runtime components that: Use LLMs to interpret inputs Call tools and MCP servers Maintain session state Generate responses They are not just prompt wrappers, but stateful execution units. 2. Workflows Workflows are graph‑based orchestration engines that: Connect agents and functions Enforce execution order Support checkpointing and human‑in‑the‑loop scenarios This leads to a clean separation of responsibilities: Concern Handled By Reasoning & interpretation Agent Execution policy & control flow Workflow This separation is a foundational design decision. High‑Level Architecture From the official overview, Agent Framework is composed of several core building blocks: Model clients (chat completions & responses) Agent sessions (state & conversation management) Context providers (memory and retrieval) Middleware pipeline (interception, filtering, telemetry) MCP clients (tool discovery and invocation) Workflow engine (graph‑based orchestration) Conceptual Flow 🌟 What’s New in Version 1.0 Version 1.0 marks the transition from "Release Candidate" to "General Availability" (GA). Production-Ready Stability: Unlike the earlier experimental packages, 1.0 offers stable APIs, versioned releases, and a commitment to long-term support (LTS). A2A Protocol (Agent-to-Agent): A new structured messaging protocol that allows agents to communicate across different runtimes. For example, an agent built in Python can seamlessly coordinate with an agent running in a .NET environment. MCP (Model Context Protocol) Support: Full integration with the Model Context Protocol, enabling agents to dynamically discover and invoke external tools and data sources without manual integration code. Multi-Agent Orchestration Patterns: Stable implementations of complex patterns, including: Sequential: Linear handoffs between specialized agents. Group Chat: Collaborative reasoning where agents discuss and solve problems. Magentic-One: A sophisticated pattern for task-oriented reasoning and planning. Middleware Pipeline: The new middleware architecture lets you inject logic into the agent's execution loop without modifying the core prompts. This is essential for Responsible AI (RAI), allowing you to add content safety filters, logging, and compliance checks globally. DevUI Debugger: A browser-based local debugger that provides a real-time visual representation of agent message flows, tool calls, and state changes. Code Examples Creating a Simple Agent (C#) From Microsoft Learn : using Azure.AI.Projects; using Azure.Identity; using Microsoft.Agents.AI; AIAgent agent = new AIProjectClient( new Uri("https://your-foundry-service.services.ai.azure.com/api/projects/your-project"), new AzureCliCredential()) .AsAIAgent( model: "gpt-5.4-mini", instructions: "You are a friendly assistant. Keep your answers brief."); Console.WriteLine(await agent.RunAsync("What is the largest city in France?")); This shows: Provider‑agnostic model access Session‑aware agent execution Minimal setup for production agents Creating a Simple Agent (Python) from agent_framework.foundry import FoundryChatClient from azure.identity import AzureCliCredential client = FoundryChatClient( project_endpoint="https://your-foundry-service.services.ai.azure.com/api/projects/your-project", model="gpt-5.4-mini", credential=AzureCliCredential(), ) agent = client.as_agent( name="HelloAgent", instructions="You are a friendly assistant. Keep your answers brief.", ) result = await agent.run("What is the largest city in France?") print(result) The same agent abstraction applies across languages. When to Use Agents vs Workflows Microsoft provides clear guidance: Use an Agent when… Use a Workflow when… Task is open‑ended Steps are well‑defined Autonomous tool use is needed Execution order matters Single decision point Multiple agents/functions collaborate Key principle: If you can solve the task with deterministic code, do that instead of using an AI agent. 🔄 How It Differs from Other Frameworks Microsoft Agent Framework 1.0 distinguishes itself by focusing on "Enterprise Readiness" and "Interoperability." Feature Microsoft Agent Framework 1.0 Semantic Kernel / AutoGen LangChain / CrewAI Philosophy Unified, production-ready SDK. Research-focused or tool-specific. High-level, developer-friendly abstractions. Integration Deeply integrated with Microsoft Foundry and Azure. Varied; often requires more glue code. Generally cloud-agnostic. Interoperability Native A2A and MCP for cross-framework tasks. Limited to internal ecosystem. Uses proprietary connectors. Runtime Identical API parity for .NET and Python. Primarily Python-first (SK has C#). Primarily Python. Control Graph-based deterministic workflows. More non-deterministic/experimental. Mixture of role-based and agentic. 🛠️ Key Technical Components Agent Harness: The execution layer that provides agents with controlled access to the shell, file system, and messaging loops. Agent Skills: A portable, file-based or code-defined format for packaging domain expertise. Implementation Tip: If you are coming from Semantic Kernel, Microsoft provides migration assistants that analyze your existing code and generate step-by-step plans to upgrade to the new Agent Framework 1.0 standards. Microsoft Agent Framework Version 1.0 | Microsoft Agent Framework Agent Framework documentation 🎯 Summary Microsoft Agent Framework 1.0 is the "grown-up" version of AI orchestration. By standardizing the way agents talk to each other (A2A), discover tools (MCP), and process information (Middleware), Microsoft has provided a clear path for taking AI experiments into production. For more detailed guides, check out the official Microsoft Agent Framework DocumentationMicrosoft Agent Framework - .NET AI Community StandupClaim your IQ Series: Foundry IQ badge
The IQ Series kicked off with three Foundry IQ episodes, each paired with a hands-on cookbook. If you've worked through all three or you're planning to, there's now a digital badge waiting for you to claim! What the badge represents The IQ Series: Foundry IQ badge recognizes developers who've completed the full Foundry IQ curriculum end-to-end: not just watched the episodes, but deployed the Azure resources, run every notebook, and built working knowledge bases against live data. Earners have: Deployed AI Search, Azure OpenAI, a Foundry project, and Azure Blob Storage with seeded sample data Connected structured and unstructured sources into Foundry IQ Built and queried multi-source AI knowledge bases Grounded agent responses in permission-aware enterprise knowledge Badges are issued by the Global AI Community, so you'll want an account there before you submit. What the three episodes cover Episode 1 — Unlocking Knowledge for Your Agents. Introduces Foundry IQ and the core ideas behind it. The episode explains how AI agents work with knowledge and walks through the main components of Foundry IQ that support knowledge-driven applications. Episode 2 — Building the Data Pipeline with Knowledge Sources. Focuses on Knowledge Sources and how different types of content flow into Foundry IQ across SharePoint, Fabric, OneLake, Azure Blob Storage, Azure AI Search, and the web. Episode 3 — Querying the Multi-Source AI Knowledge Bases. Dives into Knowledge Bases and how multiple knowledge sources can be organized behind a single endpoint. The episode demonstrates how AI systems query across these sources and synthesize information to answer complex questions. Each episode is paired with a cookbook for you to learn hands-on and each of them reuses the same Azure deployment, so you set up once and build across all three. How to claim the badge Four steps, in order: Fork the IQ Series repo and work through all three episode cookbooks in your fork. Commit your notebooks with cell outputs saved! That's the proof of completion. Capture a final output screenshot for each episode. Your GitHub username or Azure resource name needs to be visible in the screenshot. Submit a badge request issue. The template walks you through fork URLs, screenshots, and one brief technical takeaway per episode. Complete the badge form. This step is required. Without the form, we can't issue the badge. Why this badge is worth your time The IQ Series recognizes your hands-on learning with real infrastructure, real indexed data, real agents and queries. If you're working on enterprise AI (grounding, retrieval, knowledge-aware agents), this is a concrete artifact that says: I've built this, end to end, on the actual platform. Work IQ and Fabric IQ are coming next, and each phase will have its own badge. Foundry IQ is your head start on the full IQ Series. 👉 Start with Episodes or jump straight to the cookbooks if you prefer to learn by doing. Questions along the way? Create and issue in the repo or drop into our Discord. The Foundry IQ team and community are there to help.Building an Enterprise HR Chatbot with Multi-Strategy RAG and Live Agent Handoff on Azure
HR teams deal with thousands of employee questions every day — policy lookups, leave balances, case escalations, and sensitive topics like harassment or misconduct. AI chatbots can handle the routine stuff and free up HR advisors for the hard cases. But most chatbot projects get stuck at basic Q&A. They can't handle multi-country policies, employee slang, or smooth handoffs to a real person. This post covers how we built Eva, a production HR chatbot using Microsoft Bot Framework and Semantic Kernel on Azure. I'll focus on three problems and how we solved them: Getting accurate answers when employees and policy documents use different words Handing off to a live human advisor in real time Catching answer quality regressions automatically Why basic RAG isn't enough for HR Retrieval-Augmented Generation (RAG) — fetching relevant documents and feeding them to an LLM — is the standard approach. But plain RAG breaks down in HR for a few reasons: Vocabulary mismatch. An employee asks "How does misconduct affect my ACB?" but the policy document says "Annual Cash Bonus eligibility criteria." The search doesn't connect the two. Multi-country ambiguity. The same question can have different answers depending on the employee's country, grade, or role. Sensitive topics. Questions about harassment, disability, or whistleblowing should go to a human, not get an AI-generated answer. Ranking noise. Search results often include globally relevant but locally irrelevant documents. Eva handles these with a layered pipeline: query augmentation → multi-index search → LLM reranking → answer generation → citation handling. Architecture at a glance Layer Technology Bot framework Microsoft Agents SDK (aiohttp) LLM orchestration Semantic Kernel Primary LLM Azure OpenAI Service (GPT-4.1 / GPT-4o) Knowledge search Azure AI Search (hybrid + vector) Live agent chat Salesforce MIAW via server-sent events Evaluation Azure AI Evaluation SDK + custom LLM judge Config Pydantic-settings + Azure App Configuration + Key Vault Four retrieval strategies, controlled by feature flags Instead of one search approach, Eva supports four — toggled by feature flags so we can A/B test per country without code changes. They run in a priority cascade: HyDE (Hypothetical Document Embeddings) Instead of searching with the employee's question, the LLM first generates a hypothetical policy document thatwouldanswer it. We embed that synthetic document and use it as the search query. Since a hypothetical answer is closer in embedding space to the real answer than the original question is, this bridges vocabulary gaps effectively. Step-back prompting The LLM broadens the question. "How does misconduct affect my ACB?" becomes "What is the Annual Cash Bonus policy and what factors affect eligibility?" This works well when answers live in broader policy sections. Query rewrite The LLM expands abbreviations and adds HR domain context, then runs a hybrid (text + vector) search. Standard search (fallback) Basic intent classification with hybrid search. No augmentation. All four strategies return the same Pydantic model, so the rest of the pipeline doesn't care which one ran. The team can enable HyDE globally, roll out step-back to specific countries, or revert instantly if something underperforms. LLM reranking After pulling results from both a country-specific index and a global index, Eva optionally reranks them using a RankGPT-style approach — the LLM scores document relevance with a bias toward local content. If reranking fails for any reason, it falls back to the original ordering so the pipeline keeps moving. Answer generation with local vs. global context The answer stage separates retrieved documents into local context (country-specific) and global context (company-wide), injected as distinct prompt sections. The LLM returns a structured response with reasoning, the actual answer, citations, and a coverage classification (full, partial, or denial). Prompts are stored as version-controlled .txt files with per-model variants (e.g., gpt-4o.txt, gpt-4.1.txt), resolved at runtime. This makes prompts reviewable in PRs and deployable without code changes. Live agent handoff with Salesforce When Eva determines a question needs a human — sensitive topic, complex case, or the employee simply asks — it hands off to a Salesforce advisor in real time. SSE streaming. Eva keeps a persistent HTTP connection to Salesforce for real-time messages, typing indicators, and session end signals. Session resilience. Session state persists across three layers — in-memory cache, Azure Cosmos DB, and Bot Framework turn state — to survive restarts and failovers. Message delivery workers. Each session has a dedicated async worker with exponential backoff retry. Overflow messages go to a failed messages list rather than being silently dropped. Queue position updates. While employees wait, Eva queries Salesforce for queue position and sends rate-limited updates. Context handoff. On session start, Eva sends the full conversation transcript so advisors don't ask employees to repeat themselves. Automated evaluation Eva includes an evaluation framework that runs as a separate process, testing against ground-truth Q&A pairs from CSV files. Factual questions are scored using Azure AI's SimilarityEvaluator on a 1–5 scale, with optional relevance and groundedness checks. Sensitive questions (harassment, disability, whistleblowing) use a custom LLM judge that checks whether the response acknowledges sensitivity and directs the employee to create a case or speak with an advisor. A deviation detector flags score drops between runs. SQLite stores results for trending, and Application Insights powers dashboards. Long evaluation runs support resume — the framework skips already-completed test cases on restart. Key takeaways Make retrieval strategies swappable. Feature flags let you A/B test without redeploying. Separate local and global knowledge explicitly. Don't rely on the LLM to figure out which country's policy applies. Invest in evaluation early. Ground-truth datasets with factual and behavioral scoring catch regressions that manual testing misses. Build resilience into live agent handoff. Multi-tier session recovery and retry logic prevent dropped conversations. Treat prompts as code. File-based, model-variant-aware prompts are easier to maintain than inline strings. Use Pydantic for structured LLM outputs. Typed models catch bad output at the validation boundary instead of letting it propagate. Get started Semantic Kernel documentation — LLM orchestration with plugins and structured outputs Azure OpenAI Service quickstart — Deploy GPT-4o or GPT-4.1 Azure AI Search vector search tutorial — Hybrid and vector search indices Microsoft Bot Framework SDK — Build bots for Teams and web Azure AI Evaluation SDK — Score for similarity, relevance, and groundednessWhy Data Platforms Must Become Intelligence Platforms for AI Agents to Work
The promise and the gap Your organization has invested in an AI agent. You ask it: "Prepare a summary of Q3 revenue by region, including year-over-year trends and top product lines." The agent finds revenue numbers in a SQL warehouse, product metadata in Dataverse, regional mappings in SharePoint, historical data in Azure Blob Storage, and organizational context in Microsoft Graph. Five data sources. Five schemas. No shared definitions. The result? The agent hallucinates, returns incomplete data, or asks a dozen clarifying questions that defeat its purpose. This isn't a model limitation — modern AI models are highly capable. The real constraint is that enterprise data is not structured for reasoning. Traditional data platforms were built for humans to query. Intelligence platforms must be built for agents to _reason_ over. That distinction is the subject of this post. What you'll understand Why fragmented enterprise data blocks effective AI agents What distinguishes a storage platform from an intelligence platform How Microsoft Fabric and Azure AI Foundry work together to enable trustworthy, agent-ready data access The enterprise pain: Fragmented data breaks AI agents Enterprise data is spread across relational databases, data lakes, business applications, collaboration platforms, third-party APIs, and Microsoft Graph — each with its own schema and security model. Humans navigate this fragmentation through institutional knowledge and years of muscle memory. A seasoned analyst knows that "revenue" in the data warehouse means net revenue after returns, while "revenue" in the CRM means gross bookings. An AI agent does not. The cost of this fragmentation isn't hypothetical. Each new AI agent deployment can trigger another round of bespoke data preparation — custom integrations and transformation pipelines just to make data usable, let alone agent-ready. This approach doesn't scale. Why agents struggle without a semantic layer To produce a trustworthy answer, an AI agent needs: (1) **data access** to reach relevant sources, (2) **semantic context** to understand what the data _means_ (business definitions, relationships, hierarchies), and (3) **trust signals** like lineage, permissions, and freshness metadata. Traditional platforms provide the first but rarely the second or third — leaving agents to infer meaning from column names and table structures. This is fragile at best and misleading at worst. Figure 1: Without a shared semantic layer, AI agents must interpret raw, disconnected data across multiple systems — often leading to inconsistent or incomplete results. From storage to intelligence: What must change The fix isn't another ETL pipeline or another data integration tool. The fix is a fundamental shift in what we expect from a data platform. A storage platform asks: "Where is the data, and how do I access it?" An intelligence platform asks: "What does the data mean, who can use it, and how can an agent reason over it?" This shift requires four foundational pillars: Pillar 1: Unified data access OneLake, the data lake built into Microsoft Fabric, provides a single logical namespace across an organization. Whether data originates in a Fabric lakehouse, a warehouse, or an external storage account, OneLake makes it accessible through one interface — using shortcuts and mirroring rather than requiring data migration. This respects existing investments while reducing fragmentation. Pillar 2: Shared semantic layer Semantic models in Microsoft Fabric define business measures, table relationships, human-readable field descriptions, and row-level security. When an agent queries a semantic model instead of raw tables, it gets _answers_ — like `Total Revenue = $42.3M for North America in Q3` — not raw result sets requiring interpretation and aggregation. Before vs After: What changes for an agent? Without semantic layer: Queries raw tables Infers business meaning Risk of incorrect aggregation With semantic layer: Queries `[Total Revenue]` Uses business-defined logic Gets consistent, governed results Pillar 3: Context enrichment Microsoft Graph adds organizational signals — people and roles, activity patterns, and permissions — helping agents produce responses that are not just accurate, but _relevant_ and _appropriately scoped_ to the person asking. Pillar 4: Agent-ready APIs Data Agents in Microsoft Fabric (currently in preview) provide a natural-language interface to semantic models and lakehouses. Instead of generating SQL, an AI agent can ask: "What was Q3 revenue by region?" and receive a structured, sourced response. This is the critical difference: the platform provides structured context and business logic, helping reduce the reasoning burden on the agent. Figure 2: An intelligence platform adds semantic context, trust signals, and agent-ready APIs on top of unified data access — enabling AI agents to combine structured data, business definitions, and relationships to produce more consistent responses. Microsoft Fabric as the intelligence layer Microsoft Fabric is often described as a unified analytics platform. That description is accurate but incomplete. In the context of AI agents, Fabric's role is better understood as an **intelligence layer** — a platform that doesn't just store and process data, but _makes data understandable_ to autonomous systems. Let's look at each capability through the lens of agent readiness. OneLake: One namespace, many sources OneLake provides a single logical namespace backed by Azure Data Lake Storage Gen2. For AI agents, this means one authentication context, one discovery mechanism, and one governance surface. Key capabilities: **shortcuts** (reference external data without copying), **mirroring** (replicate from Azure SQL, Cosmos DB, or Snowflake), and a **unified security model**. For more on OneLake architecture, see [OneLake documentation on Microsoft Learn](https://learn.microsoft.com/fabric/onelake/onelake-overview). Semantic models: Business logic that agents can understand Semantic models (built on the Analysis Services engine) transform raw tables into business concepts: Raw Table Column Semantic Model Measure `fact_sales.amount` `[Total Revenue]` — Sum of net sales after returns `fact_sales.amount / dim_product.cost` `[Gross Margin %]` — Revenue minus COGS as a percentage `fact_sales.qty` YoY comparison `[YoY Growth %]` — Year-over-year quantity growth Code Snippet 1 — Querying a Fabric Semantic Model with Semantic Link (Python) import sempy.fabric as fabric # Query business-defined measures — no need to know underlying table schemas dax_query = """ EVALUATE SUMMARIZECOLUMNS( 'Geography'[Region], 'Calendar'[FiscalQuarter], "Total Revenue", [Total Revenue], "YoY Growth %", [YoY Growth %] ) """ result_df = fabric.evaluate_dax( dataset="Contoso Sales Analytics", workspace="Contoso Analytics Workspace", dax_string=dax_query ) print(result_df.head()) # NOTE: Output shown is illustrative and based on the semantic model definition # Output (illustrative): # Region FiscalQuarter Total Revenue YoY Growth % # North America Q3 FY2026 42300000 8.2 # Europe Q3 FY2026 31500000 5.7 Key takeaway: The agent doesn’t need to know that revenue is in `fact_sales.amount` or that fiscal quarters don’t align with calendar quarters. The semantic model handles all of this. Code Snippet 2 — Discovering Available Models and Measures (Python) Before an agent can query, it needs to _discover_ what data is available. Semantic Link provides programmatic access to model metadata — enabling agents to find relevant measures without hardcoded knowledge. import sempy.fabric as fabric # Discover available semantic models in the workspace datasets = fabric.list_datasets(workspace="Contoso Analytics Workspace") print(datasets[["Dataset Name", "Description"]]) # NOTE: Output shown is illustrative and based on the semantic model definition # Output (illustrative): # Dataset Name Description # Contoso Sales Analytics Revenue, margins, and growth metrics # Contoso HR Analytics Headcount, attrition, and hiring pipeline # Contoso Supply Chain Inventory, logistics, and supplier data # Inspect available measures — these are the business-defined metrics an agent can query measures = fabric.list_measures( dataset="Contoso Sales Analytics", workspace="Contoso Analytics Workspace" ) print(measures[["Table Name", "Measure Name", "Description"]]) # Output (illustrative): # Table Name Measure Name Description # Sales Total Revenue Sum of net sales after returns # Sales Gross Margin % Revenue minus COGS as a percentage # Sales YoY Growth % Year-over-year quantity growth Key takeaway: An agent can programmatically discover which semantic models exist and what measures they expose — turning the platform into a self-describing data catalog that agents can navigate autonomously. For more on Semantic Link, see the Semantic Link documentation on Microsoft Learn. Data Agents: Natural-language access for AI (preview) Note: Fabric Data Agents are currently in preview. See [Microsoft preview terms](https://learn.microsoft.com/legal/microsoft-fabric-preview) for details. A Data Agent wraps a semantic model and exposes it as a natural-language-queryable endpoint. An AI Foundry agent can register a Fabric Data Agent as a tool — when it needs data, it calls the Data Agent like any other tool. Important: In production scenarios, use managed identities or Microsoft Entra ID authentication. Always follow the [principle of least privilege](https://learn.microsoft.com/entra/identity-platform/secure-least-privileged-access) when configuring agent access. Microsoft Graph: Organizational context Microsoft Graph adds the final layer: who is asking (role-appropriate detail), what’s relevant (trending datasets), and who should review (data stewards). Fabric’s integration with Graph brings these signals into the data platform so agents produce contextually appropriate responses. Tying it together: Azure AI Foundry + Microsoft Fabric The real power of the intelligence platform concept emerges when you see how Azure AI Foundry and Microsoft Fabric are designed to work together. The integration pattern Azure AI Foundry provides the orchestration layer (conversations, tool selection, safety, response generation). Microsoft Fabric provides the data intelligence layer (data access, semantic context, structured query resolution). The integration follows a tool-calling pattern: 1.User prompt → End user asks a question through an AI Foundry-powered application. 2.Tool call → The agent selects the appropriate Fabric Data Agent and sends a natural-language query. 3.Semantic resolution → The Data Agent translates the query into DAX against the semantic model and executes it via OneLake. 4.Structured response → Results flow back through the stack, with each layer adding context (business definitions, permissions verification, data lineage). 5.User response → The AI Foundry agent presents a grounded, sourced answer to the user. Why these matters No custom ETL for agents — Agents query the intelligence platform directly No prompt-stuffing — The semantic model provides business context at query time No trust gap — Governed semantic models enforce row-level security and lineage No one-off integrations — Multiple agents reuse the same Data Agents Code Snippet 3 — Azure AI Foundry Agent with Fabric Data Agent Tool (Python) The following example shows how an Azure AI Foundry agent registers a Fabric Data Agent as a tool and uses it to answer a business question. The agent handles tool selection, query routing, and response grounding automatically. from azure.ai.projects import AIProjectClient from azure.ai.projects.models import FabricTool from azure.identity import DefaultAzureCredential # Connect to Azure AI Foundry project project_client = AIProjectClient.from_connection_string( credential=DefaultAzureCredential(), conn_str="<your-ai-foundry-connection-string>" ) # Register a Fabric Data Agent as a grounding tool # The connection references a Fabric workspace with semantic models fabric_tool = FabricTool(connection_id="<fabric-connection-id>") # Create an agent that uses the Fabric Data Agent for data queries agent = project_client.agents.create_agent( model="gpt-4o", name="Contoso Revenue Analyst", instructions="""You are a business analytics assistant for Contoso. Use the Fabric Data Agent tool to answer questions about revenue, margins, and growth. Always cite the source semantic model.""", tools=fabric_tool.definitions ) # Start a conversation thread = project_client.agents.create_thread() message = project_client.agents.create_message( thread_id=thread.id, role="user", content="What was Q3 revenue by region, and which region grew fastest?" ) # The agent automatically calls the Fabric Data Agent tool, # queries the semantic model, and returns a grounded response run = project_client.agents.create_and_process_run( thread_id=thread.id, agent_id=agent.id ) # Retrieve the agent's response messages = project_client.agents.list_messages(thread_id=thread.id) print(messages.data[0].content[0].text.value) # NOTE: Output shown is illustrative and based on the semantic model definition # Output (illustrative): # "Based on the Contoso Sales Analytics model, Q3 FY2026 revenue by region: # - North America: $42.3M (+8.2% YoY) # - Europe: $31.5M (+5.7% YoY) # - Asia Pacific: $18.9M (+12.1% YoY) — fastest growing # Source: Contoso Sales Analytics semantic model, OneLake" Key takeaway: The AI Foundry agent never writes SQL or DAX. It calls the Fabric Data Agent as a tool, which resolves the query against the semantic model. The response comes back grounded with source attribution — matching the five-step integration pattern described above. Figure 3: Each layer adds context — semantic models provide business definitions, Graph adds permissions awareness, and Data Agents provide the natural-language interface. Getting started: Practical next steps You don't need to redesign your entire data platform to begin this shift. Start with one high-value domain and expand incrementally. Step 1: Consolidate data access through OneLake Create OneLake shortcuts to your most critical data sources — core business metrics, customer data, financial records. No migration needed. [Create OneLake shortcuts](https://learn.microsoft.com/fabric/onelake/create-onelake-shortcut) Step 2: Build semantic models with business definitions For each major domain (sales, finance, operations), create a semantic model with key measures, table relationships, human-readable descriptions, and row-level security. [Create semantic models in Microsoft Fabric](https://learn.microsoft.com/fabric/data-warehouse/semantic-models) Step 3: Enable Data Agents (preview) Expose your semantic models as natural-language endpoints. Start with a single domain to validate the pattern. Note: Review the [preview terms](https://learn.microsoft.com/legal/microsoft-fabric-preview) and plan for API changes. [Fabric Data Agents overview](https://learn.microsoft.com/fabric/data-science/concept-data-agent) Step 4: Connect Azure AI Foundry agents Register Data Agents as tools in your AI Foundry agent configuration. Azure AI Foundry documentation Conclusion: The bottleneck isn't the model — it's the platform Models can reason, plan, and hold multi-turn conversations. But in the enterprise, the bottleneck for effective AI agents is the data platform underneath. Agents can’t reason over data they can’t find, apply business logic that isn’t encoded, respect permissions that aren’t enforced, or cite sources without lineage. The shift from storage to intelligence requires unified data access, a shared semantic layer, organizational context, and agent-ready APIs. Microsoft Fabric provides these capabilities, and its integration with Azure AI Foundry makes this intelligence layer accessible to AI agents. Disclaimer: Some features described in this post, including Fabric Data Agents, are currently in preview. Preview features may change before general availability, and their availability, functionality, and pricing may differ from the final release. See [Microsoft preview terms](https://learn.microsoft.com/legal/microsoft-fabric-preview) for details.