retrieval augmented generation
40 TopicsIf You're Building AI on Azure, ECS 2026 is Where You Need to Be
Let me be direct: there's a lot of noise in the conference calendar. Generic cloud events. Vendor showcases dressed up as technical content. Sessions that look great on paper but leave you with nothing you can actually ship on Monday. ECS 2026 isn't that. As someone who will be on stage at Cologne this May, I can tell you the European Collaboration Summit combined with the European AI & Cloud Summit and European Biz Apps Summit is one of the few events I've seen where engineers leave with real, production-applicable knowledge. Three days. Three summits. 3,000+ attendees. One of the largest Microsoft-focused events in Europe, and it keeps getting better. If you're building AI systems on Azure, designing cloud-native architectures, or trying to figure out how to take your AI experiments to production — this is where the conversation is happening. What ECS 2026 Actually Is ECS 2026 runs May 5–7 at Confex in Cologne, Germany. It brings together three co-located summits under one roof: European Collaboration Summit — Microsoft 365, Teams, Copilot, and governance European AI & Cloud Summit — Azure architecture, AI agents, cloud security, responsible AI European BizApps Summit — Power Platform, Microsoft Fabric, Dynamics For Azure engineers and AI developers, the European AI & Cloud Summit is your primary destination. But don't ignore the overlap, some of the most interesting AI conversations happen at the intersection of collaboration tooling and cloud infrastructure. The scale matters here: 3,000+ attendees, 100+ sessions, multiple deep-dive tracks, and a speaker lineup that includes Microsoft executives, Regional Directors, and MVPs who have built, broken, and rebuilt production systems. The Azure + AI Track - What's Actually On the Agenda The AI & Cloud Summit agenda is built around real technical depth. Not "intro to AI" content, actual architecture decisions, patterns that work, and lessons from things that didn't. Here's what you can expect: AI Agents and Agentic Systems This is where the energy is right now, and ECS is leaning in. Expect sessions covering how to design agent workflows, chain reasoning steps, handle memory and state, and integrate with Azure AI services. Marco Casalaina, VP of Products for Azure AI at Microsoft, is speaking if you want to understand the direction of the Azure AI platform from the people building it, this is a direct line. Azure Architecture at Scale Cloud-native patterns, microservices, containers, and the architectural decisions that determine whether your system holds up under real load. These sessions go beyond theory you'll hear from engineers who've shipped these designs at enterprise scale. Observability, DevOps, and Production AI Getting AI to production is harder than the demos suggest. Sessions here cover monitoring AI systems, integrating LLMs into CI/CD pipelines, and building the operational practices that keep AI in production reliable and governable. Cloud Security and Compliance Security isn't optional when you're putting AI in front of users or connecting it to enterprise data. Tracks cover identity, access patterns, responsible AI governance, and how to design systems that satisfy compliance requirements without becoming unmaintainable. Pre-Conference Deep Dives One underrated part of ECS: the pre-conference workshops. These are extended, hands-on sessions typically 3–6 hours that let you go deep on a single topic with an expert. Think of them as intensive short courses where you can actually work through the material, not just watch slides. If you're newer to a particular area of Azure AI, or you want to build fluency in a specific pattern before the main conference sessions, these are worth the early travel. The Speaker Quality Is Different Here The ECS speaker roster includes Microsoft executives, Microsoft MVPs, and Regional Directors, people who have real accountability for the products and patterns they're presenting. You'll hear from over 20 Microsoft speakers: Marco Casalaina — VP of Products, Azure AI at Microsoft Adam Harmetz — VP of Product at Microsoft, Enterprise Agent And dozens of MVPs and Regional Directors who are in the field every day, solving the same problems you are. These aren't keynote-only speakers — they're in the session rooms, at the hallway track, available for real conversations. The Hallway Track Is Not a Cliché I know "networking" sounds like a corporate afterthought. At ECS it genuinely isn't. When you put 3,000 practitioners, engineers, architects, DevOps leads, security specialists in one venue for three days, the conversations between sessions are often more valuable than the sessions themselves. You get candid answers to "how are you actually handling X in production?" that you won't find in documentation. The European Microsoft community is tight-knit and collaborative. ECS is where that community concentrates. Why This Matters Right Now We're in a period where AI development is moving fast but the engineering discipline around it is still maturing. Most teams are figuring out: How to move from AI prototype to production system How to instrument and observe AI behaviour reliably How to design agent systems that don't become unmaintainable How to satisfy security and compliance requirements in AI-integrated architectures ECS 2026 is one of the few places where you can get direct answers to these questions from people who've solved them — not theoretically, but in production, on Azure, in the last 12 months. If you go, you'll come back with practical patterns you can apply immediately. That's the bar I hold events to. ECS consistently clears it. Register and Explore the Agenda Register for ECS 2026: ecs.events Explore the AI & Cloud Summit agenda: cloudsummit.eu/en/agenda Dates: May 5–7, 2026 | Location: Confex, Cologne, Germany Early registration is worth it the pre-conference workshops fill up. And if you're coming, find me, I'll be the one talking too much about AI agents and Azure deployments. See you in Cologne.Building an Enterprise HR Chatbot with Multi-Strategy RAG and Live Agent Handoff on Azure
HR teams deal with thousands of employee questions every day — policy lookups, leave balances, case escalations, and sensitive topics like harassment or misconduct. AI chatbots can handle the routine stuff and free up HR advisors for the hard cases. But most chatbot projects get stuck at basic Q&A. They can't handle multi-country policies, employee slang, or smooth handoffs to a real person. This post covers how we built Eva, a production HR chatbot using Microsoft Bot Framework and Semantic Kernel on Azure. I'll focus on three problems and how we solved them: Getting accurate answers when employees and policy documents use different words Handing off to a live human advisor in real time Catching answer quality regressions automatically Why basic RAG isn't enough for HR Retrieval-Augmented Generation (RAG) — fetching relevant documents and feeding them to an LLM — is the standard approach. But plain RAG breaks down in HR for a few reasons: Vocabulary mismatch. An employee asks "How does misconduct affect my ACB?" but the policy document says "Annual Cash Bonus eligibility criteria." The search doesn't connect the two. Multi-country ambiguity. The same question can have different answers depending on the employee's country, grade, or role. Sensitive topics. Questions about harassment, disability, or whistleblowing should go to a human, not get an AI-generated answer. Ranking noise. Search results often include globally relevant but locally irrelevant documents. Eva handles these with a layered pipeline: query augmentation → multi-index search → LLM reranking → answer generation → citation handling. Architecture at a glance Layer Technology Bot framework Microsoft Agents SDK (aiohttp) LLM orchestration Semantic Kernel Primary LLM Azure OpenAI Service (GPT-4.1 / GPT-4o) Knowledge search Azure AI Search (hybrid + vector) Live agent chat Salesforce MIAW via server-sent events Evaluation Azure AI Evaluation SDK + custom LLM judge Config Pydantic-settings + Azure App Configuration + Key Vault Four retrieval strategies, controlled by feature flags Instead of one search approach, Eva supports four — toggled by feature flags so we can A/B test per country without code changes. They run in a priority cascade: HyDE (Hypothetical Document Embeddings) Instead of searching with the employee's question, the LLM first generates a hypothetical policy document thatwouldanswer it. We embed that synthetic document and use it as the search query. Since a hypothetical answer is closer in embedding space to the real answer than the original question is, this bridges vocabulary gaps effectively. Step-back prompting The LLM broadens the question. "How does misconduct affect my ACB?" becomes "What is the Annual Cash Bonus policy and what factors affect eligibility?" This works well when answers live in broader policy sections. Query rewrite The LLM expands abbreviations and adds HR domain context, then runs a hybrid (text + vector) search. Standard search (fallback) Basic intent classification with hybrid search. No augmentation. All four strategies return the same Pydantic model, so the rest of the pipeline doesn't care which one ran. The team can enable HyDE globally, roll out step-back to specific countries, or revert instantly if something underperforms. LLM reranking After pulling results from both a country-specific index and a global index, Eva optionally reranks them using a RankGPT-style approach — the LLM scores document relevance with a bias toward local content. If reranking fails for any reason, it falls back to the original ordering so the pipeline keeps moving. Answer generation with local vs. global context The answer stage separates retrieved documents into local context (country-specific) and global context (company-wide), injected as distinct prompt sections. The LLM returns a structured response with reasoning, the actual answer, citations, and a coverage classification (full, partial, or denial). Prompts are stored as version-controlled .txt files with per-model variants (e.g., gpt-4o.txt, gpt-4.1.txt), resolved at runtime. This makes prompts reviewable in PRs and deployable without code changes. Live agent handoff with Salesforce When Eva determines a question needs a human — sensitive topic, complex case, or the employee simply asks — it hands off to a Salesforce advisor in real time. SSE streaming. Eva keeps a persistent HTTP connection to Salesforce for real-time messages, typing indicators, and session end signals. Session resilience. Session state persists across three layers — in-memory cache, Azure Cosmos DB, and Bot Framework turn state — to survive restarts and failovers. Message delivery workers. Each session has a dedicated async worker with exponential backoff retry. Overflow messages go to a failed messages list rather than being silently dropped. Queue position updates. While employees wait, Eva queries Salesforce for queue position and sends rate-limited updates. Context handoff. On session start, Eva sends the full conversation transcript so advisors don't ask employees to repeat themselves. Automated evaluation Eva includes an evaluation framework that runs as a separate process, testing against ground-truth Q&A pairs from CSV files. Factual questions are scored using Azure AI's SimilarityEvaluator on a 1–5 scale, with optional relevance and groundedness checks. Sensitive questions (harassment, disability, whistleblowing) use a custom LLM judge that checks whether the response acknowledges sensitivity and directs the employee to create a case or speak with an advisor. A deviation detector flags score drops between runs. SQLite stores results for trending, and Application Insights powers dashboards. Long evaluation runs support resume — the framework skips already-completed test cases on restart. Key takeaways Make retrieval strategies swappable. Feature flags let you A/B test without redeploying. Separate local and global knowledge explicitly. Don't rely on the LLM to figure out which country's policy applies. Invest in evaluation early. Ground-truth datasets with factual and behavioral scoring catch regressions that manual testing misses. Build resilience into live agent handoff. Multi-tier session recovery and retry logic prevent dropped conversations. Treat prompts as code. File-based, model-variant-aware prompts are easier to maintain than inline strings. Use Pydantic for structured LLM outputs. Typed models catch bad output at the validation boundary instead of letting it propagate. Get started Semantic Kernel documentation — LLM orchestration with plugins and structured outputs Azure OpenAI Service quickstart — Deploy GPT-4o or GPT-4.1 Azure AI Search vector search tutorial — Hybrid and vector search indices Microsoft Bot Framework SDK — Build bots for Teams and web Azure AI Evaluation SDK — Score for similarity, relevance, and groundednessBuild an Offline Hybrid RAG Stack with ONNX and Foundry Local
If you are building local AI applications, basic retrieval augmented generation is often only the starting point. This sample shows a more practical pattern: combine lexical retrieval, ONNX based semantic embeddings, and a Foundry Local chat model so the assistant stays grounded, remains offline, and degrades cleanly when the semantic path is unavailable. Why this sample is worth studying Many local RAG samples rely on a single retrieval strategy. That is usually enough for a proof of concept, but it breaks down quickly in production. Exact keywords, acronyms, and document codes behave differently from natural language questions and paraphrased requests. This repository keeps the original lexical retrieval path, adds local ONNX embeddings for semantic search, and fuses both signals in a hybrid ranking mode. The generation step runs through Foundry Local, so the entire assistant can remain on device. Lexical mode handles exact terms and structured vocabulary. Semantic mode handles paraphrases and more natural language phrasing. Hybrid mode combines both and is usually the best default. Lexical fallback protects the user experience if the embedding pipeline cannot start. Architectural overview The sample has two main flows: an offline ingestion pipeline and a local query pipeline. The architecture splits cleanly into offline ingestion at the top and runtime query handling at the bottom. Offline ingestion pipeline Read Markdown files from docs/ . Parse front matter and split each document into overlapping chunks. Generate dense embeddings when the ONNX model is available. Store chunks in SQLite with both sparse lexical features and optional dense vectors. Local query pipeline The browser posts a question to the Express API. ChatEngine resolves the requested retrieval mode. VectorStore retrieves lexical, semantic, or hybrid results. The prompt is assembled with the retrieved context and sent to a Foundry Local chat model. The answer is returned with source references and retrieval metadata. The sequence diagram shows the difference between lexical retrieval and hybrid retrieval. In hybrid mode, the query is embedded first, then lexical and semantic scores are fused before prompt assembly. Repository structure and core components The implementation is compact and readable. The main files to understand are listed below. src/config.js : retrieval defaults, paths, and model settings. src/embeddingEngine.js : local ONNX embedding generation through Transformers.js. src/vectorStore.js : SQLite storage plus lexical, semantic, and hybrid ranking. src/chatEngine.js : retrieval mode resolution, prompt assembly, and Foundry Local model execution. src/ingest.js : document ingestion and embedding generation during indexing. src/server.js : REST endpoints, streaming endpoints, upload support, and health reporting. Getting started To run the sample, you need Node.js 20 or newer, Foundry Local, and a local ONNX embedding model. The default model path is models/embeddings/bge-small-en-v1.5 . cd c:\Users\leestott\local-hybrid-retrival-onnx npm install huggingface-cli download BAAI/bge-small-en-v1.5 --local-dir models/embeddings/bge-small-en-v1.5 npm run ingest npm start Ingestion writes the local SQLite database to data/rag.db . If the embedding model is available, each chunk gets a dense vector as well as lexical features. If the embedding model is missing, ingestion still succeeds and the application remains usable in lexical mode. Best practice: local AI applications should treat model files, SQLite data, and native runtime compatibility as part of the deployable system, not as optional developer conveniences. Code walkthrough 1. Retrieval configuration The sample makes its retrieval behaviour explicit in configuration. That is useful for testing and for operator visibility. export const config = { model: "phi-3.5-mini", docsDir: path.join(ROOT, "docs"), dbPath: path.join(ROOT, "data", "rag.db"), chunkSize: 200, chunkOverlap: 25, topK: 3, retrievalMode: process.env.RETRIEVAL_MODE || "hybrid", retrievalModes: ["lexical", "semantic", "hybrid"], fallbackRetrievalMode: "lexical", retrievalWeights: { lexical: 0.45, semantic: 0.55, }, }; Those defaults tell you a lot about the intended operating profile. Chunks are small, the number of returned chunks is low, and the fallback path is explicit. 2. Local ONNX embeddings The embedding engine disables remote model loading and only uses local files. That matters for privacy, repeatability, and air gapped operation. env.allowLocalModels = true; env.allowRemoteModels = false; this.extractor = await pipeline("feature-extraction", resolvedPath, { local_files_only: true, }); const output = await this.extractor(text, { pooling: "mean", normalize: true, }); The mean pooling and normalisation step make the vectors suitable for cosine similarity based ranking. 3. Hybrid storage and ranking in SQLite Instead of adding a separate vector database, the sample stores lexical and semantic representations in the same SQLite table. That keeps the local footprint low and the implementation easy to debug. searchHybrid(query, queryEmbedding, topK = 5, weights = { lexical: 0.45, semantic: 0.55 }) { const lexicalResults = this.searchLexical(query, topK * 3); const semanticResults = this.searchSemantic(queryEmbedding, topK * 3); if (semanticResults.length === 0) { return lexicalResults.slice(0, topK).map((row) => ({ ...row, retrievalMode: "lexical", })); } const fused = [...combined.values()].map((row) => ({ ...row, score: (row.lexicalScore * lexicalWeight) + (row.semanticScore * semanticWeight), })); fused.sort((a, b) => b.score - a.score); return fused.slice(0, topK); } The important point is not just the weighted fusion. It is the fallback behaviour. If semantic retrieval cannot provide results, the user still gets lexical grounding instead of an empty context window. 4. Retrieval mode resolution in ChatEngine ChatEngine keeps the runtime behaviour predictable. It validates the requested mode and falls back to lexical search when semantic retrieval is unavailable. resolveRetrievalMode(requestedMode) { const desiredMode = config.retrievalModes.includes(requestedMode) ? requestedMode : config.retrievalMode; if ((desiredMode === "semantic" || desiredMode === "hybrid") && !this.semanticAvailable) { return config.fallbackRetrievalMode; } return desiredMode; } This is a sensible production design because local runtime failures are common. Missing model files or native dependency mismatches should reduce quality, not crash the entire assistant. 5. Foundry Local model management The sample uses FoundryLocalManager to discover, download, cache, and load the configured chat model. const manager = FoundryLocalManager.create({ appName: "gas-field-local-rag" }); const catalog = manager.catalog; this.model = await catalog.getModel(config.model); if (!this.model.isCached) { await this.model.download((progress) => { const pct = Math.round(progress * 100); this._emitStatus("download", `Downloading ${this.modelAlias}... ${pct}%`, progress); }); } await this.model.load(); this.chatClient = this.model.createChatClient(); this.chatClient.settings.temperature = 0.1; This gives the app a better local startup experience. The server can expose a status stream while the model initialises in the background. User experience and screenshots The client is intentionally simple, which makes it useful during evaluation. You can switch retrieval mode, test questions quickly, and inspect the retrieved sources. The landing page exposes retrieval mode directly in the UI. That makes it easy to compare lexical, semantic, and hybrid behaviour during testing. The sources panel shows grounding evidence and retrieval scores, which is useful when validating whether better answers are coming from better retrieval or just model phrasing. Best practices for ONNX RAG and Foundry Local Keep lexical fallback alive. Exact identifiers and runtime failures both make this necessary. Persist sparse and dense features together where possible. It simplifies debugging and operational reasoning. Use small chunks and conservative topK values for local context budgets. Expose health and status endpoints so users can see when the model is still loading or embeddings are unavailable. Test retrieval quality separately from generation quality. Pin and validate native runtime dependencies, especially ONNX Runtime, before tuning prompts. Practical warning: this repository already shows why runtime validation matters. A local app can ingest documents successfully and still fail at model initialisation if the native runtime stack is misaligned. How this compares with RAG and CAG The strongest value in this sample comes from where it sits between a basic local RAG baseline and a curated CAG design. Dimension Classic local RAG This hybrid ONNX RAG sample CAG Context assembly Retrieve chunks at query time, often lexically, then inject them into the prompt. Retrieve chunks at query time with lexical, semantic, or fused scoring, then inject the strongest results into the prompt. Use a prepared or cached context pack instead of fresh retrieval for every request. Main strength Easy to implement and easy to explain. Better recall for paraphrases without giving up exact match behaviour or offline execution. Predictable prompts and low query time overhead. Main weakness Misses synonyms and natural language reformulations. More moving parts, larger local asset footprint, and native runtime compatibility to manage. Coverage depends on curation quality and goes stale more easily. Failure behaviour Weak retrieval leads to weak grounding. Semantic failure can degrade to lexical retrieval if designed properly, which this sample does. Prepared context can be too narrow for new or unexpected questions. Best fit Simple local assistants and proof of concept systems. Offline copilots and technical assistants that need stronger recall across varied phrasing. Stable workflows with tightly bounded, curated knowledge. Samples Related samples: - Foundry Local RAG - https://github.com/leestott/local-rag - Foundry Local CAG - https://github.com/leestott/local-cag - Foundry Local hybrid-retrival-onnx https://github.com/leestott/local-hybrid-retrival-onnx Specific benefits of this hybrid approach over classic RAG It captures paraphrased questions that lexical search would often miss. It still preserves exact match performance for codes, terms, and product names. It gives operators a controlled degradation path when the semantic stack is unavailable. It stays local and inspectable without introducing a separate hosted vector service. Specific differences from CAG CAG shifts effort into context curation before the request. This sample retrieves evidence dynamically at runtime. CAG can be faster for fixed workflows, but it is usually less flexible when the document set changes. This hybrid RAG design is better suited to open ended knowledge search and growing document collections. What to validate before shipping Measure retrieval quality in each mode using exact term, acronym, and paraphrase queries. Check that sources shown in the UI reflect genuinely distinct evidence, not repeated chunks. Confirm the application remains usable when semantic retrieval is unavailable. Verify ONNX Runtime compatibility on the real target machines, not only on the development laptop. Test model download, cache, and startup behaviour with a clean environment. Final take For developers getting started with ONNX RAG and Foundry Local, this sample is a good technical reference because it demonstrates a realistic local architecture rather than a minimal demo. It shows how to build a grounded assistant that remains offline, supports multiple retrieval modes, and fails gracefully. Compared with classic local RAG, the hybrid design provides better recall and better resilience. Compared with CAG, it remains more flexible for changing document sets and less dependent on pre curated context packs. If you want a practical starting point for offline grounded AI on developer workstations or edge devices, this is the most balanced pattern in the repository set.278Views0likes0CommentsVectorless Reasoning-Based RAG: A New Approach to Retrieval-Augmented Generation
Introduction Retrieval-Augmented Generation (RAG) has become a widely adopted architecture for building AI applications that combine Large Language Models (LLMs) with external knowledge sources. Traditional RAG pipelines rely heavily on vector embeddings and similarity search to retrieve relevant documents. While this works well for many scenarios, it introduces challenges such as: Requires chunking documents into small segments Important context can be split across chunks Embedding generation and vector databases add infrastructure complexity A new paradigm called Vectorless Reasoning-Based RAG is emerging to address these challenges. One framework enabling this approach is PageIndex, an open-source document indexing system that organizes documents into a hierarchical tree structure and allows Large Language Models (LLMs) to perform reasoning-based retrieval over that structure. Vectorless Reasoning-Based RAG Instead of vectors, this approach uses structured document navigation. User Query ->Document Tree Structure ->LLM Reasoning ->Relevant Nodes Retrieved ->LLM Generates Answer This mimics how humans read documents: Look at the table of contents Identify relevant sections Read the relevant content Answer the question Core features No Vector Database: It relies on document structure and LLM reasoning for retrieval. It does not depend on vector similarity search. No Chunking: Documents are not split into artificial chunks. Instead, they are organized using their natural structure, such as pages and sections. Human-like Retrieval: The system mimics how human experts read documents. It navigates through sections and extracts information from relevant parts. Better Explainability and Traceability: Retrieval is based on reasoning. The results can be traced back to specific pages and sections. This makes the process easier to interpret. It avoids opaque and approximate vector search, often called “vibe retrieval.” When to Use Vectorless RAG Vectorless RAG works best when: Data is structured or semi-structured Documents have clear metadata Knowledge sources are well organized Queries require reasoning rather than semantic similarity Examples: enterprise knowledge bases internal documentation systems compliance and policy search healthcare documentation financial reporting Implementing Vectorless RAG with Azure AI Foundry Step 1 : Install Pageindex using pip command, from pageindex import PageIndexClient import pageindex.utils as utils # Get your PageIndex API key from https://dash.pageindex.ai/api-keys PAGEINDEX_API_KEY = "YOUR_PAGEINDEX_API_KEY" pi_client = PageIndexClient(api_key=PAGEINDEX_API_KEY) Step 2 : Set up your LLM Example using Azure OpenAI: from openai import AsyncAzureOpenAI client = AsyncAzureOpenAI( api_key=AZURE_OPENAI_API_KEY, azure_endpoint=AZURE_OPENAI_ENDPOINT, api_version=AZURE_OPENAI_API_VERSION ) async def call_llm(prompt, temperature=0): response = await client.chat.completions.create( model=AZURE_DEPLOYMENT_NAME, messages=[{"role": "user", "content": prompt}], temperature=temperature ) return response.choices[0].message.content.strip() Step 3: Page Tree Generation import os, requests pdf_url = "https://arxiv.org/pdf/2501.12948.pdf" //give the pdf url for tree generation, here given one for example pdf_path = os.path.join("../data", pdf_url.split('/')[-1]) os.makedirs(os.path.dirname(pdf_path), exist_ok=True) response = requests.get(pdf_url) with open(pdf_path, "wb") as f: f.write(response.content) print(f"Downloaded {pdf_url}") doc_id = pi_client.submit_document(pdf_path)["doc_id"] print('Document Submitted:', doc_id) Step 4 : Print the generated pageindex tree structure if pi_client.is_retrieval_ready(doc_id): tree = pi_client.get_tree(doc_id, node_summary=True)['result'] print('Simplified Tree Structure of the Document:') utils.print_tree(tree) else: print("Processing document, please try again later...") Step 5 : Use LLM for tree search and identify nodes that might contain relevant context import json query = "What are the conclusions in this document?" tree_without_text = utils.remove_fields(tree.copy(), fields=['text']) search_prompt = f""" You are given a question and a tree structure of a document. Each node contains a node id, node title, and a corresponding summary. Your task is to find all nodes that are likely to contain the answer to the question. Question: {query} Document tree structure: {json.dumps(tree_without_text, indent=2)} Please reply in the following JSON format: {{ "thinking": "<Your thinking process on which nodes are relevant to the question>", "node_list": ["node_id_1", "node_id_2", ..., "node_id_n"] }} Directly return the final JSON structure. Do not output anything else. """ tree_search_result = await call_llm(search_prompt) Step 6 : Print retrieved nodes and reasoning process node_map = utils.create_node_mapping(tree) tree_search_result_json = json.loads(tree_search_result) print('Reasoning Process:') utils.print_wrapped(tree_search_result_json['thinking']) print('\nRetrieved Nodes:') for node_id in tree_search_result_json["node_list"]: node = node_map[node_id] print(f"Node ID: {node['node_id']}\t Page: {node['page_index']}\t Title: {node['title']}") Step 7: Answer generation node_list = json.loads(tree_search_result)["node_list"] relevant_content = "\n\n".join(node_map[node_id]["text"] for node_id in node_list) print('Retrieved Context:\n') utils.print_wrapped(relevant_content[:1000] + '...') answer_prompt = f""" Answer the question based on the context: Question: {query} Context: {relevant_content} Provide a clear, concise answer based only on the context provided. """ print('Generated Answer:\n') answer = await call_llm(answer_prompt) utils.print_wrapped(answer) When to Use Each Approach Both vector-based RAG and vectorless RAG have their strengths. Choosing the right approach depends on the nature of the documents and the type of retrieval required. When to Use Vector Database–Based RAG Vector-based retrieval works best when dealing with large collections of unrelated or loosely structured documents. In such cases, semantic similarity is often sufficient to identify relevant information quickly. Use vector RAG when: Searching across many independent documents Semantic similarity is sufficient to locate relevant content Real-time retrieval is required over very large datasets Common use cases include: Customer support knowledge bases Conversational chatbots Product and content search systems When to Use Vectorless RAG Vectorless approaches such as PageIndex are better suited for long, structured documents where understanding the logical organization of the content is important. Use vectorless RAG when: Documents contain clear hierarchical structure Logical reasoning across sections is required High retrieval accuracy is critical Typical examples include: Financial filings and regulatory reports Legal documents and contracts Technical manuals and documentation Academic and research papers In these scenarios, navigating the document structure allows the system to identify the exact section that logically contains the answer, rather than relying only on semantic similarity. Conclusion Vector databases significantly advanced RAG architectures by enabling scalable semantic search across large datasets. However, they are not the optimal solution for every type of document. Vectorless approaches such as PageIndex introduce a different philosophy: instead of retrieving text that is merely semantically similar, they retrieve text that is logically relevant by reasoning over the structure of the document. As RAG architectures continue to evolve, the future will likely combine the strengths of both approaches. Hybrid systems that integrate vector search for broad retrieval and reasoning-based navigation for precision may offer the best balance of scalability and accuracy for enterprise AI applications.5.2KViews2likes0CommentsAnnouncing the IQ Series: Foundry IQ
AI agents are rapidly becoming a new way to build applications. But for agents to be truly useful, they need access to the knowledge and context that helps them reason about the world they operate in. That’s where Foundry IQ comes in. Today we’re announcing the IQ Series: Foundry IQ, a new set of developer-focused episodes exploring how to build knowledge-centric AI systems using Foundry IQ. The series focuses on the core ideas behind how modern AI systems work with knowledge, how they retrieve information, reason across sources, synthesize answers, and orchestrate multi-step interactions. Instead of treating retrieval as a single step in a pipeline, Foundry IQ approaches knowledge as something that AI systems actively work with throughout the reasoning process. The IQ Series breaks down these concepts and shows how they come together when building real AI applications. You can explore the series and all the accompanying samples here: 👉 https://aka.ms/iq-series What is Foundry IQ? Foundry IQ helps AI systems work with knowledge in a more structured and intentional way. Rather than wiring retrieval logic directly into every application, developers can define knowledge bases that connect to documents, data sources, and other information systems. AI agents can then query these knowledge bases to gather the context they need to generate responses, make decisions, or complete tasks. This model allows knowledge to be organized, reused, and combined across applications, instead of being rebuilt for each new scenario. What's covered in the IQ Series? The Foundry IQ episodes in the IQ Series explore the key building blocks behind knowledge-driven AI systems from how knowledge enters the system to how agents ultimately query and use it. The series is released as three weekly episodes: Foundry IQ: Unlocking Knowledge for Your Agents — March 18, 2026: Introduces Foundry IQ and the core ideas behind it. The episode explains how AI agents work with knowledge and walks through the main components of the Foundry IQ that support knowledge-driven applications. Foundry IQ: Building the Data Pipeline with Knowledge Sources — March 25, 2026: Focuses on Knowledge Sources and how different types of content flow into Foundry IQ. It explores how systems such as SharePoint, Fabric, OneLake, Azure Blob Storage, Azure AI Search, and the web contribute information that AI systems can later retrieve and use. Foundry IQ: Querying the Multi-Source AI Knowledge Bases — April 1, 2026: Dives into the Knowledge Bases and how multiple knowledge sources can be organized behind a single endpoint. The episode demonstrates how AI systems query across these sources and synthesize information to answer complex questions. Each episode includes a short executive introduction, a tech talk exploring the topic in depth, and a visual recap with doodle summaries of the key ideas. Alongside the episodes, the GitHub repository provides cookbooks with sample code, summary of the episodes, and additinal learning resources, so developers can explore the concepts and apply them in their own projects. Explore the Repo All episodes and supporting materials live in the IQ Series repository: 👉 https://aka.ms/iq-series Inside the repository you’ll find: The Foundry IQ episode links Cookbooks for each episode Links to documentation and additional resources If you're building AI agents or exploring how AI systems can work with knowledge, the IQ Series is a great place to start. Watch the episodes and explore the cookbooks! We’re excited to see what you build and welcome your feedback & ideas as the series evolves.Building High-Performance Agentic Systems
Most enterprise chatbots fail in the same quiet way. They answer questions. They impress in demos. And then they stall in production. Knowledge goes stale. Answers cannot be audited. The system cannot act beyond generating text. When workflows require coordination, execution, or accountability, the chatbot stops being useful. Agentic systems exist because that model is insufficient. Instead of treating the LLM as the product, agentic architecture embeds it inside a bounded control loop: plan → act (tools) → observe → refine The model becomes one component in a runtime system with explicit state management, safety policies, identity enforcement, and operational telemetry. This shift is not speculative. A late-2025 MIT Sloan Management Review / BCG study reports that 35% of organizations have already adopted AI agents, with another 44% planning deployment. Microsoft is advancing open protocols for what it calls the “agentic web,” including Agent-to-Agent (A2A) interoperability and Model Context Protocol (MCP), with integration paths emerging across Copilot Studio and Azure AI Foundry. The real question is no longer whether agents are coming. It is whether enterprise architecture is ready for them. This article translates “agentic” into engineering reality: the runtime layers, latency and cost levers, orchestration patterns, and governance controls required for production deployment. The Core Capabilities of Agentic AI What makes an AI “agentic” is not a single feature—it’s the interaction of different capabilities. Together, they form the minimum set needed to move from “answering” to “operating”. Autonomy – Goal-Driven Task Completion Traditional bots are reactive: they wait for a prompt and produce output. Autonomy introduces a goal state and a control loop. The agent is given an objective (or a trigger) and it can decide the next step without being micromanaged. The critical engineering distinction is that autonomy must be bounded: in production, you implement it with explicit budgets and stop conditions—maximum tool calls, maximum retries, timeouts, and confidence thresholds. The typical execution shape is a loop: plan → act → observe → refine. A project-management agent, for example, doesn’t just answer “what’s the status?” It monitors signals (work items, commits, build health), detects a risk pattern (slippage, dependency blockage), and then either surfaces an alert or prepares a remediation action (re-plan milestones, notify owners). In high-stakes environments, autonomy is usually human-in-the-loop by design: the agent can draft changes, propose next actions, and only execute after approval. Over time, teams expand the autonomy envelope for low-risk actions while keeping approvals for irreversible or financially sensitive operations. Tool Integration – Taking Action and Staying Current A standalone LLM cannot fetch live enterprise state and cannot change it. Tool integration is how an agent becomes operational: it can query systems of record, call APIs, trigger workflows, and produce outputs that reflect the current world rather than the model’s pretraining snapshot. There are two classes of tools that matter in enterprise agents: Retrieval tools (grounding / RAG)When the agent needs facts, it retrieves them. This is the backbone of reducing hallucination: instead of guessing, the agent pulls authoritative content (SharePoint, Confluence, policy repositories, CRM records, Fabric datasets) and uses it as evidence. In practice, retrieval works best when it is engineered as a pipeline: query rewrite (optional) → hybrid search (keyword + vector) → filtering (metadata/ACL) → reranking → compact context injection. The point is not “stuff the prompt with documents,” but “inject only the minimum evidence required to answer accurately.” Action tools (function calling / connectors) These are the hands of the agent: update a CRM record, create a ticket, send an email, schedule a meeting, generate a report, run a pipeline. Tool integration shifts value from “advice” to “execution,” but also introduces risk—so action tools need guardrails: least-privilege permissions, input validation, idempotency keys, and post-condition checks (confirm the update actually happened). In Microsoft ecosystems, this tool plane often maps to Graph actions + business connectors (via Logic Apps/Power Automate) + custom APIs, with Copilot Studio (low code) or Foundry-style runtimes (pro code) orchestrating the calls. Memory (Context & Learning) – Context Awareness and Adaptation “Memory” is not just a long prompt. In agentic systems, memory is an explicit state strategy: Working memory: what the agent has learned during the current run (intermediate tool results, constraints, partial plans). Session memory: what should persist across turns (user preferences, ongoing tasks, summarized history). Long-term memory: enterprise knowledge the agent can retrieve (indexed documents, structured facts, embeddings + metadata). Short-term memory enables multi-step workflows without repeating questions. An HR onboarding agent can carry a new hire’s details from intake through provisioning without re-asking, because the workflow state is persisted and referenced. Long-term “learning” is typically implemented through feedback loops rather than real-time model weight updates: capturing corrections, storing validated outcomes, and periodically improving prompts, routing logic, retrieval configuration, or (where appropriate) fine-tuning. The key design rule is that memory must be policy-aware: retention rules, PII handling, and permission trimming apply to stored state as much as they apply to retrieved documents. Orchestration – Coordinating Multi-Agent Teams Complex enterprise work is rarely single-skill. Orchestration is how agentic systems scale capability without turning one agent into an unmaintainable monolith. The pattern is “manager + specialists”: an orchestrator decomposes the goal into subtasks, routes each to the best tool or sub-agent, and then composes a final response. This can be done sequentially or in parallel. Employee onboarding is a classic: HR intake, IT account creation, equipment provisioning, and training scheduling can run in parallel where dependencies allow. The engineering challenge is making orchestration reliable: defining strict input/output contracts between agents (often structured JSON), handling failures (timeouts, partial completion), and ensuring only one component has authority to send the final user-facing message to avoid conflicting outputs. In Microsoft terms, orchestration can be implemented as agentic flows in Copilot Studio, connected-agent patterns in Foundry, or explicit orchestrators in code using structured tool schemas and shared state. Strategic Impact – How Agentic AI Changes Knowledge Work Agentic AI is no longer an experimental overlay to enterprise systems. It is becoming an embedded operational layer inside core workflows. Unlike earlier chatbot deployments that answered isolated questions, modern enterprise agents execute end-to-end processes, interact with structured systems, maintain context, and operate within governed boundaries. The shift is not about conversational intelligence alone; it is about workflow execution at scale. The transformation becomes clearer when examining real implementations across industries. In legal services, agentic systems have moved beyond document summarization into operational case automation. Assembly Software’s NeosAI, built on Azure AI infrastructure, integrates directly into legal case management systems and automates document analysis, structured data extraction, and first-draft generation of legal correspondence. What makes this deployment impactful is not merely the generative drafting capability, but the integration architecture. NeosAI is not an isolated chatbot; it operates within the same document management systems, billing systems, and communication platforms lawyers already use. Firms report time savings of up to 25 hours per case, with document drafting cycles reduced from days to minutes for first-pass outputs. Importantly, the system runs within secure Azure environments with zero data retention policies, addressing one of the most sensitive concerns in legal AI adoption: client confidentiality. JPMorgan’s COiN platform represents another dimension of legal and financial automation. Instead of conversational assistance, COiN performs structured contract intelligence at production scale. It analyzes more than 12,000 commercial loan agreements annually, extracting over 150 clause attributes per document. Work that previously required approximately 360,000 human hours now executes in seconds. The architecture emphasizes structured NLP pipelines, taxonomy-based clause classification, and private cloud deployment for regulatory compliance. Rather than replacing legal professionals, the system flags unusual clauses for human review, maintaining oversight while dramatically accelerating analysis. Over time, COiN has also served as a knowledge retention mechanism, preserving institutional contract intelligence that would otherwise be lost with employee turnover. In financial services, the impact is similarly structural. Morgan Stanley’s internal AI Assistant allows wealth advisors to query over 100,000 proprietary research documents using natural language. Adoption has reached nearly universal usage across advisor teams, not because it replaces expertise, but because it compresses research time and surfaces insights instantly. Building on this foundation, the firm introduced an AI meeting debrief agent that transcribes client conversations using speech-to-text models and generates CRM notes and follow-up drafts through GPT-based reasoning. Advisors review outputs before finalization, preserving human judgment. The result is faster client engagement and measurable productivity improvements. What differentiates Morgan Stanley’s approach is not only deployment scale, but disciplined evaluation before release. The firm established rigorous benchmarking frameworks to test model outputs against expert standards for accuracy, compliance, and clarity. Only after meeting defined thresholds were systems expanded firmwide. This pattern—evaluation before scale—is becoming a defining trait of successful enterprise agent deployment. Human Resources provides a different perspective on agentic AI. Johnson Controls deployed an AI HR assistant inside Slack to manage policy questions, payroll inquiries, and onboarding support across a global workforce exceeding 100,000 employees. By embedding the agent in a channel employees already use, adoption barriers were reduced significantly. The result was a 30–40% reduction in live HR call volume, allowing HR teams to redirect focus toward strategic workforce initiatives. Similarly, Ciena integrated an AI assistant directly into Microsoft Teams, unifying HR and IT support through a single conversational interface. Employees no longer navigate separate portals; the agent orchestrates requests across backend systems such as Workday and ServiceNow. The technical lesson here is clear: integration breadth drives usability, and usability drives adoption. Engineering and IT operations reveal perhaps the most technically sophisticated application of agentic AI: multi-agent orchestration. In a proof-of-concept developed through collaboration between Microsoft and ServiceNow, an AI-driven incident response system coordinates multiple agents during high-priority outages. Microsoft 365 Copilot transcribes live war-room discussions and extracts action items, while ServiceNow’s Now Assist executes operational updates within IT service management systems. A Semantic Kernel–based manager agent maintains shared context and synchronizes activity across platforms. This eliminates the longstanding gap between real-time discussion and structured documentation, automatically generating incident reports while freeing engineers to focus on remediation rather than clerical tasks. The system demonstrates that orchestration is not conceptual—it is operational. Across these examples, the pattern is consistent. Agentic AI changes knowledge work by absorbing structured cognitive labor: document parsing, compliance classification, research synthesis, workflow routing, transcription, and task coordination. Humans remain essential for judgment, ethics, and accountability, but the operational layer increasingly runs through AI-mediated execution. The result is not incremental productivity improvement; it is structural acceleration of knowledge processes. Design and Governance Challenges – Managing the Risks As agentic AI shifts from answering questions to executing workflows, governance must mature accordingly. These systems retrieve enterprise data, invoke APIs, update records, and coordinate across platforms. That makes them operational actors inside your architecture—not just assistants. The primary shift is this: autonomy increases responsibility. Agents must be observable. Every retrieval, reasoning step, and tool invocation should be traceable. Without structured telemetry and audit trails, enterprises lose visibility into why an agent acted the way it did. Agents must also operate within scoped authority. Least-privilege access, role-based identity, and bounded credentials are essential. An HR agent should not access finance systems. A finance agent should not modify compliance data without policy constraints. Autonomy only works when it is deliberately constrained. Execution boundaries are equally critical. High-risk actions—financial approvals, legal submissions, production changes—should include embedded thresholds or human approval gates. Autonomy should be progressive, not absolute. Cost and performance must be governed just like cloud infrastructure. Agentic systems can trigger recursive calls and model loops. Without usage monitoring, rate limits, and model-tier routing, compute consumption can escalate unpredictably. Finally, agentic systems require continuous evaluation. Real-world testing, live monitoring, and drift detection ensure the system remains aligned with business rules and compliance requirements. These are not “set and forget” deployments. In short, agentic AI becomes sustainable only when autonomy is paired with observability, scoped authority, embedded guardrails, cost control, and structured oversight. Conclusion – Towards the Agentic Enterprise The organizations achieving meaningful returns from agentic AI share a common pattern. They do not treat AI agents as experimental tools. They design them as production systems with defined roles, scoped authority, measurable KPIs, embedded observability, and formal governance layers. When autonomy is paired with integration, memory, orchestration, and governance discipline, agentic AI becomes more than automation—it becomes an operational architecture. Enterprises that master this architecture are not merely reducing costs; they are redefining how knowledge work is executed. In this emerging model, human professionals focus on strategic judgment and innovation, while AI agents manage structured cognitive execution at scale. The competitive advantage will not belong to those who deploy the most AI, but to those who deploy it with architectural rigor and governance maturity. Before we rush to deploy more agents, a few questions are worth asking: If an AI agent executes a workflow in your enterprise today, can you trace every reasoning step and tool invocation behind that decision? Does your architecture treat AI as a conversational layer - or as an operational actor with scoped identity, cost controls, and policy enforcement? Where should autonomy stop in your organization - and who defines that boundary? Agentic AI is not just a capability shift. It is an architectural decision. Curious to hear how others are designing their control planes and orchestration layers. References MIT Sloan – “Agentic AI, Explained” by Beth Stackpole: A foundational overview of agentic AI, its distinction from traditional generative AI, and its implications for enterprise workflows, governance, and strategy. Microsoft TechCommunity – “Introducing Multi-Agent Orchestration in Foundry Agent Service”: Details Microsoft’s multi-agent orchestration capabilities, including Connected Agents, Multi-Agent Workflows, and integration with A2A and MCP protocols. Microsoft Learn – “Extend the Capabilities of Your Agent – Copilot Studio”: Explains how to build and extend custom agents in Microsoft Copilot Studio using tools, connectors, and enterprise data sources. Assembly Software’s NeosAI case – Microsoft Customer Stories JPMorgan COiN platform – GreenData Case Study HR support AI (Johnson Controls, Ciena, Databricks) – Moveworks case studies ServiceNow & Semantic Kernel multi-agent P1 Incident – Microsoft Semantic Kernel BlogThe JavaScript AI Build-a-thon Season 2 starts March 2!
The JavaScript AI Build-a-thon is a free, hands-on program designed to close that gap. Over the course of four weeks (March 2 - March 31, 2026), you'll move from running AI 100% on-device (Local AI), to designing multi-service, multi-agentic systems, all in JavaScript/ TypeScript and using tools you are already familiar with. The series will culminate in a hackathon, where you will create, compete and turn what you'll have learnt into working projects you can point to, talk about and extend.Azure Skilling at Microsoft Ignite 2025
The energy at Microsoft Ignite was unmistakable. Developers, architects, and technical decision-makers converged in San Francisco to explore the latest innovations in cloud technology, AI applications, and data platforms. Beyond the keynotes and product announcements was something even more valuable: an integrated skilling ecosystem designed to transform how you build with Azure. This year Azure Skilling at Microsoft Ignite 2025 brought together distinct learning experiences, over 150+ hands-on labs, and multiple pathways to industry-recognized credentials—all designed to help you master skills that matter most in today's AI-driven cloud landscape. Just Launched at Ignite Microsoft Ignite 2025 offered an exceptional array of learning opportunities, each designed to meet developers anywhere on the skilling journey. Whether you joined us in-person or on-demand in the virtual experience, multiple touchpoints are available to deepen your Azure expertise. Ignite 2025 is in the books, but you can still engage with the latest Microsoft skilling opportunities, including: The Azure Skills Challenge provides a gamified learning experience that lets you compete while completing task-based achievements across Azure's most critical technologies. These challenges aren't just about badges and bragging rights—they're carefully designed to help you advance technical skills and prepare for Microsoft role-based certifications. The competitive element adds urgency and motivation, turning learning into an engaging race against the clock and your peers. For those seeking structured guidance, Plans on Learn offer curated sets of content designed to help you achieve specific learning outcomes. These carefully assembled learning journeys include built-in milestones, progress tracking, and optional email reminders to keep you on track. Each plan represents 12-15 hours of focused learning, taking you from concept to capability in areas like AI application development, data platform modernization, or infrastructure optimization. The Microsoft Reactor Azure Skilling Series, running December 3-11, brings skilling to life through engaging video content, mixing regular programming with special Ignite-specific episodes. This series will deliver technical readiness and programming guidance in a livestream presentation that's more digestible than traditional documentation. Whether you're catching episodes live with interactive Q&A or watching on-demand later, you’ll get world-class instruction that makes complex topics approachable. Beyond Ignite: Your Continuous Learning Journey Here's the critical insight that separates Ignite attendees who transform their careers from those who simply collect swag: the real learning begins after the event ends. Microsoft Ignite is your launchpad, not your destination. Every module you start, every lab you complete, and every challenge you tackle connects to a comprehensive learning ecosystem on Microsoft Learn that's available 24/7, 365 days a year. Think of Ignite as your intensive immersion experience—the moment when you gain context, build momentum, and identify the skills that will have the biggest impact on your work. What you do in the weeks and months following determines whether that momentum compounds into career-defining expertise or dissipates into business as usual. For those targeting career advancement through formal credentials, Microsoft Certifications, Applied Skills and AI Skills Navigator, provide globally recognized validation of your expertise. Applied Skills focus on scenario-based competencies, demonstrating that you can build and deploy solutions, not simply answer theoretical questions. Certifications cover role-based scenarios for developers, data engineers, AI engineers, and solution architects. The assessment experiences include performance-based testing in dedicated Azure tenants where you complete real configuration and development tasks. And finally, the NEW AI Skills Navigator is an agentic learning space, bringing together AI-powered skilling experiences and credentials in a single, unified experience with Microsoft, LinkedIn Learning and GitHub – all in one spot Why This Matters: The Competitive Context The cloud skills race is intensifying. While our competitors offer robust training and content, Microsoft's differentiation comes not from having more content—though our 1.4 million module completions last fiscal year and 35,000+ certifications awarded speak to scale—but from integration of services to orchestrate workflows. Only Microsoft offers a truly unified ecosystem where GitHub Copilot accelerates your development, Azure AI services power your applications, and Azure platform services deploy and scale your solutions—all backed by integrated skilling content that teaches you to maximize this connected experience. When you continue your learning journey after Ignite, you're not just accumulating technical knowledge. You're developing fluency in an integrated development environment that no competitor can replicate. You're learning to leverage AI-powered development tools, cloud-native architectures, and enterprise-grade security in ways that compound each other's value. This unified expertise is what transforms individual developers into force-multipliers for their organizations. Start Now, Build Momentum, Never Stop Microsoft Ignite 2025 offered the chance to compress months of learning into days of intensive, hands-on experience, but you can still take part through the on-demand videos, the Global Ignite Skills Challenge, visiting the GitHub repos for the /Ignite25 labs, the Reactor Azure Skilling Series, and the curated Plans on Learn provide multiple entry points regardless of your current skill level or preferred learning style. But remember: the developers who extract the most value from Ignite are those who treat the event as the beginning, not the culmination, of their learning journey. They join hackathons, contribute to GitHub repositories, and engage with the Azure community on Discord and technical forums. The question isn't whether you'll learn something valuable from Microsoft Ignite 2025-that's guaranteed. The question is whether you'll convert that learning into sustained momentum that compounds over months and years into career-defining expertise. The ecosystem is here. The content is ready. Your skilling journey doesn't end when Ignite does—it accelerates.3.8KViews0likes0CommentsLevel up your Python Gen AI Skills from our free nine-part YouTube series!
Want to learn how to use generative AI models in your Python applications? We're putting on a series of nine live streams, in both English and Spanish, all about generative AI. We'll cover large language models, embedding models, vision models, introduce techniques like RAG, function calling, and structured outputs, and show you how to build Agents and MCP servers. Plus we'll talk about AI safety and evaluations, to make sure all your models and applications are producing safe outputs. 🔗 Register for the entire series. In addition to the live streams, you can also join a weekly office hours in our AI Discord to ask any questions that don't get answered in the chat. You can also scroll down to learn about each live stream and register for individual sessions. See you in the streams! 👋🏻 Large Language Models 7 October, 2025 | 5:00 PM - 6:00 PM (UTC) Coordinated Universal Time Register for the stream on Reactor Join us for the first session in our Python + AI series! In this session, we'll talk about Large Language Models (LLMs), the models that power ChatGPT and GitHub Copilot. We'll use Python to interact with LLMs using popular packages like the OpenAI SDK and Langchain. We'll experiment with prompt engineering and few-shot examples to improve our outputs. We'll also show how to build a full stack app powered by LLMs, and explain the importance of concurrency and streaming for user-facing AI apps. Vector embeddings 8 October, 2025 | 5:00 PM - 6:00 PM (UTC) Coordinated Universal Time Register for the stream on Reactor In our second session of the Python + AI series, we'll dive into a different kind of model: the vector embedding model. A vector embedding is a way to encode a text or image as an array of floating point numbers. Vector embeddings make it possible to perform similarity search on many kinds of content. In this session, we'll explore different vector embedding models, like the OpenAI text-embedding-3 series, with both visualizations and Python code. We'll compare distance metrics, use quantization to reduce vector size, and try out multimodal embedding models. Retrieval Augmented Generation 9 October, 2025 | 5:00 PM - 6:00 PM (UTC) Coordinated Universal Time Register for the stream on Reactor In our fourth Python + AI session, we'll explore one of the most popular techniques used with LLMs: Retrieval Augmented Generation. RAG is an approach that sends context to the LLM so that it can provide well-grounded answers for a particular domain. The RAG approach can be used with many kinds of data sources like CSVs, webpages, documents, databases. In this session, we'll walk through RAG flows in Python, starting with a simple flow and culminating in a full-stack RAG application based on Azure AI Search. Vision models 14 October, 2025 | 5:00 PM - 6:00 PM (UTC) Coordinated Universal Time Register for the stream on Reactor Our third stream in the Python + AI series is all about vision models! Vision models are LLMs that can accept both text and images, like GPT 4o and 4o-mini. You can use those models for image captioning, data extraction, question-answering, classification, and more! We'll use Python to send images to vision models, build a basic chat-on-images app, and build a multimodal search engine. Structured outputs 15 October, 2025 | 5:00 PM - 6:00 PM (UTC) Coordinated Universal Time Register for the stream on Reactor In our fifth stream of the Python + AI series, we'll discover how to get LLMs to output structured responses that adhere to a schema. In Python, all we need to do is define a @dataclass or a Pydantic BaseModel, and we get validated output that meets our needs perfectly. We'll focus on the structured outputs mode available in OpenAI models, but you can use similar techniques with other model providers. Our examples will demonstrate the many ways you can use structured responses, like entity extraction, classification, and agentic workflows. Quality and safety 16 October, 2025 | 5:00 PM - 6:00 PM (UTC) Coordinated Universal Time Register for the stream on Reactor Now that we're more than halfway through our Python + AI series, we're covering a crucial topic: how to use AI safely, and how to evaluate the quality of AI outputs. There are multiple mitigation layers when working with LLMs: the model itself, a safety system on top, the prompting and context, and the application user experience. Our focus will be on Azure tools that make it easier to put safe AI systems into production. We'll show how to configure the Azure AI Content Safety system when working with Azure AI models, and how to handle those errors in Python code. Then we'll use the Azure AI Evaluation SDK to evaluate the safety and quality of the output from our LLM. Tool calling 21 October, 2025 | 5:00 PM - 6:00 PM (UTC) Coordinated Universal Time Register for the stream on Reactor Now that we're more than halfway through our Python + AI series, we're covering a crucial topic: how to use AI safely, and how to evaluate the quality of AI outputs. There are multiple mitigation layers when working with LLMs: the model itself, a safety system on top, the prompting and context, and the application user experience. Our focus will be on Azure tools that make it easier to put safe AI systems into production. We'll show how to configure the Azure AI Content Safety system when working with Azure AI models, and how to handle those errors in Python code. Then we'll use the Azure AI Evaluation SDK to evaluate the safety and quality of the output from our LLM. AI agents 22 October, 2025 | 5:00 PM - 6:00 PM (UTC) Coordinated Universal Time Register for the stream on Reactor For the penultimate session of our Python + AI series, we're building AI agents! We'll use many of the most popular Python AI agent frameworks: Langgraph, Semantic Kernel, Autogen, Pydantic AI, and more. Our agents will start simple and then ramp up in complexity, demonstrating different architectures like hand-offs, round-robin, supervisor, graphs, and ReAct. Model Context Protocol 23 October, 2025 | 5:00 PM - 6:00 PM (UTC) Coordinated Universal Time Register for the stream on Reactor In the final session of our Python + AI series, we're diving into the hottest technology of 2025: MCP, Model Context Protocol. This open protocol makes it easy to extend AI agents and chatbots with custom functionality, to make them more powerful and flexible. We'll show how to use the official Python FastMCP SDK to build an MCP server running locally and consume that server from chatbots like GitHub Copilot. Then we'll build our own MCP client to consume the server. Finally, we'll discover how easy it is to point popular AI agent frameworks like Langgraph, Pydantic AI, and Semantic Kernel at MCP servers. With great power comes great responsibility, so we will briefly discuss the many security risks that come with MCP, both as a user and developer.Essential Microsoft Resources for MVPs & the Tech Community from the AI Tour
Unlock the power of Microsoft AI with redeliverable technical presentations, hands-on workshops, and open-source curriculum from the Microsoft AI Tour! Whether you’re a Microsoft MVP, Developer, or IT Professional, these expertly crafted resources empower you to teach, train, and lead AI adoption in your community. Explore top breakout sessions covering GitHub Copilot, Azure AI, Generative AI, and security best practices—designed to simplify AI integration and accelerate digital transformation. Dive into interactive workshops that provide real-world applications of AI technologies. Take it a step further with Microsoft’s Open-Source AI Curriculum, offering beginner-friendly courses on AI, Machine Learning, Data Science, Cybersecurity, and GitHub Copilot—perfect for upskilling teams and fostering innovation. Don’t just learn—lead. Access these resources, host impactful training sessions, and drive AI adoption in your organization. Start sharing today! Explore now: Microsoft AI Tour Resources.