Blog Post

Microsoft Developer Community Blog
4 MIN READ

RAG Deep Dive: Watch all the recordings!

Pamela_Fox's avatar
Pamela_Fox
Icon for Microsoft rankMicrosoft
Feb 19, 2025

Our most popular RAG solution for Azure has now been deployed thousands of times by developers. The solution includes many optional features that make it even more powerful: support for multiple document types, chat history with Cosmos DB, user account and login, data access control, multimodal media ingestion, private deployment, and more.

Early this year, we put on a YouTube series showcasing the RAG solution and its many features, to provide more guidance both to developers building on our RAG solution as well as developers building your own custom RAG solutions. 

If you missed seeing the series live, you can catch up with the videos and slides, linked below or in this YouTube playlist. Start at the beginning or find the session most interesting to you! 

Have a follow-up question after watching? Join our weekly Python AI office hours in Discord.

RAGChat: The RAG solution for Azure

YouTube video
šŸ“ŗ Watch YouTube recording

For our kickoff session, we walk through a live demo of the RAG solution and explain how it all works. Then we step through the RAG flow from Azure AI Search to Azure OpenAI, deploy the app to Azure, and discuss the Azure architecture.

šŸ”— View the slides for the session

RAGChat: Customizing our RAG solution

YouTube video
šŸ“ŗ Watch YouTube recording

In our second session, we show you how to customize the RAG solution for your own domain - adding your own data, modifying the prompts, and personalizing the UI. Plus, we give you tips for local development for faster feature iteration.

šŸ”— View the slides for the session

RAGChat: Optimal retrieval with Azure AI Search

YouTube video
šŸ“ŗ Watch YouTube recording

Our RAG solution uses Azure AI Search to find matching documents, using state-of-the-art retrieval mechanisms. In this session, we dive into the mechanics of vector embeddings, hybrid search with RRF, and semantic ranking. We also discuss the data ingestion process, highlighting the differences between manual ingestion and integrated vectorization

šŸ”— View the slides for the session

RAGChat: Multimedia data ingestion

YouTube video
šŸ“ŗ Watch YouTube recording

Do your documents contain images or charts? Our RAG solution has two different approaches to handling multimedia documents, and we dive into both approaches in this session. The first approach is purely during ingestion time, where it replaces media in the documents with LLM-generated descriptions. The second approach stores images of the media alongside vector embeddings of the images, and sends both text and images to a multimodal LLM for question answering. Learn about both approaches in this session so that you can decide what to use for your app.

šŸ”— View the slides for the session

RAGChat: User login and data access control

YouTube video
šŸ“ŗ Watch YouTube recording

In our RAG flow, the app first searches a knowledge base for relevant matches to a user's query, then sends the results to the LLM along with the original question. What if you have documents that should only be accessed by a subset of your users, like a group or a single user? Then you need data access controls to ensure that document visibility is respected during the RAG flow. In this session, we show an approach using Azure AI Search with data access controls to only search the documents that can be seen by the logged in user. We also demonstrate a feature for user-uploaded documents that uses data access controls along with Azure Data Lake Storage Gen2.

šŸ”— View the slides for the session

RAGChat: Storing chat history

YouTube video
šŸ“ŗ Watch YouTube recording

Learn how we store chat history using either IndexedDB for client-side storage or Azure Cosmos DB for persistent storage. We discuss the API architecture and data schema choices, doing both a live demo of the app and a walkthrough of the code.

šŸ”— View the slides for the session

RAGChat: Adding speech input and output

YouTube video
šŸ“ŗ Watch YouTube recording

Our RAG solution includes optional features for speech input and output, powered either by the free browser SDKs or by the powerful Azure Speech API. We also offer a tight integration with the VoiceRAG solution, for those of you who want a real-time voice interface. Learn about all the ways you can add speech to your RAG chat in this session!

šŸ”— View the slides for the session

RAGChat: Private deployment

YouTube video
šŸ“ŗ Watch YouTube recording

To ensure that the RAG app can only be accessed within your enterprise network, you can deploy it to an Azure virtual network with private endpoints for each Azure service used. In this session, we show how to deploy the app to a virtual network that includes AI Search, OpenAI, Document Intelligence, and Blob storage. 

šŸ”— View the slides for the session

RAGChat: Evaluating RAG answer quality

YouTube video
šŸ“ŗ Watch YouTube recording

How can you be sure that the RAG chat app answers are accurate, clear, and well formatted? Evaluation! In this session, we show you how to generate synthetic data and run bulk evaluations on your RAG app, using the azure-ai-evaluation SDK. Learn about GPT metrics like groundedness and fluency, and custom metrics like citation matching. Plus, discover how you can run evaluations on CI/CD, to easily verify that new changes don't introduce quality regressions.

šŸ”— View the slides for the session

RAGChat: Monitoring and tracing LLM calls

YouTube video
šŸ“ŗ Watch YouTube recording

When your RAG app is in production, observability is crucial. You need to know about performance issues, runtime errors, and LLM-specific issues like Content Safety filter violations. In this session, learn how to use Azure Monitor along with OpenTelemetry SDKs to monitor the RAG application.

šŸ”— View the slides for the session

RAGChat: Extending RAG with function calling

YouTube video
šŸ“ŗ Watch YouTube recording

For this session, we explore how we can use OpenAI function calling to extend the functionality of the RAG application. We can use function calling to retrieve data from more sources (like additional GitHub API or Bing API), to handle different kind of user requests (like summarization instead of searching), and even to escalate conversations to a human. With function calling, our RAG can handle multiple data sources or even become agentic.

šŸ”— View the slides for the session

Updated Feb 19, 2025
Version 1.0
No CommentsBe the first to comment