Event details
Retrieval-augmented generation (RAG) allows you to build GenAI applications that use your own data, to optimize LLM performance.
Join our AMA to ask us about RAG, vector databases, running RAG with Azure AI, information retrieval best practices, and Azure AI Search's latest releases.
Key topics:
- Vector databases and vector search
- Hybrid search and re-ranking
- Retrieval-augmented generation (RAG)
- Document processing and chunking
- Azure AI Search's latest product releases: vector search, semantic ranker, integrated vectorization
Latest from Azure AI Search
- General availability of vector search and semantic ranker in Azure AI Search, formerly Azure Cognitive Search (microsoft.com)
- Announcing the Public Preview of Integrated Vectorization in Azure AI Search - Microsoft Community Hub
- Azure AI Search: Outperforming vector search with hybrid retrieval and ranking capabilities - Microsoft Community Hub
An AMA is a live text-based online event similar to an “Ask Me Anything” on Reddit. This AMA gives you the opportunity to connect with Microsoft product experts who will be on hand to answer your questions and listen to feedback.
Feel free to post your questions anytime in the comments below beforehand, if it fits your schedule or time zone better, though questions will not be answered until the live hour.
63 Comments
- EricStarkerFormer Employee
Thanks for joining us for this RAG with Azure AI: why your retrieval strategy matters AMA!
The event is now over, but we'll be posting a summary of the questions and answers here soon.
- EricStarkerFormer EmployeeJust nine minutes to go! Get your questions in!
- CacrowleyOccasional ReaderI have all languages in my data base, it up to Microsoft how they want it distributed, politically correct then continue on I didn't ask prepare or educate for it anyway, nor been compensated. Help me I help you. Fixing to delete all data. It's up to Microsoft
- EricStarkerFormer EmployeeI'm sorry, but this doesn't sound like a relevant question or comment for the Azure AI Search team. Sorry about that.
- Leon MeijerCopper ContributorIs it possible to create a search engine/RAG that finds (almost) duplicate documents, and presents the most relevant ones, e.g. with the latest revision first?
- gia_mondragon
Microsoft
Azure AI Studio (https://azure.microsoft.com/en-gb/products/ai-studio/) keeps track of all document versions for RAG, if that is what you’re looking for.- gia_mondragon
Microsoft
If you would like to handle the freshness state of the document so the latest version is the one that comes up as part of the engine itself (in AI Search), we're currently working on a feature improvement that takes care of this functionality so this gets translated to the answers returned to the LLM when performing RAG.
- Kumar ChinnakaliCopper ContributorIn our previous conversations with Microsoft, they said that while you can edit and add to Copilot for some Microsoft 365 apps (like Teams), Copilot for Word can only be used out of the box (no changes made). Is document compare feature is already a part of this that we are looking to use or have there been new announcements indicating that copilot for word is adaptable?
- mike_carter_msft
Microsoft
Hi Kumar! We are from the Azure AI Search team and can’t speak to any future plans for M365 Copilot extensibility. As you stated, during the preview, extensibility is only supported for Teams.
- gyanguptaCopper ContributorIs it common to store all the user question to store as well in Vector DB for future model tuning or its automatic taken care by RAG?
- fsunavala-msft
Microsoft
It’s important to understand the difference between Prompt-Engineering, Fine-Tuning, and Retrieval Augmentation as they are all great methods to incorporate domain knowledge into your Generative AI application.
Prompt-Engineering is all about in-context learning. This is particularly good when you have a static situation and you can iterate on it until you get the answer you want. https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/advanced-prompt-engineering
Fine-Tuning is good to learn skills or new jargon, or ways to produce responses and are pernament to your LLM. To Fine Tune or Not Fine Tune? That is the question (youtube.com)
Retrieval-Augmentation is good for learning new facts by grounding your prompt with the relevant information it needs on the fly to answer a question. RAG and generative AI - Azure AI Search | Microsoft Learn
To your question, storing user queries in a vector database is a strategy that can significantly enhance the model's performance over time if they are relevant to the task the LLM is trying to solve. If the user queries are NOT relevant, and you’re leveraging them as history to maintain in your user prompt, you’re likely just providing the LLM noise.
Hope this helps!
- Eric JonesCopper ContributorIs there recommended guidance for how and where [relative to RAG/Search/re-ranking...etc.] to best incorporate small language models -- where we tailor for specific purposes -- in concert with an LLM?
- fsunavala-msft
Microsoft
This is a fantastic question! Small Language Models (SLMs) are very new and there is lots of research going on to find out more about their use case. Here are a few articles from Microsoft that talk about them more in depth: Phi-2: The surprising power of small language models - Microsoft Research and Orca 2: Teaching Small Language Models How to Reason - Microsoft Research.
My takeaways from the research are the following:
- Specialized Task Handling: SLMs like Phi-2 can be tailored for specific domains or tasks such as legal document analysis or technical support queries. In a RAG setup, these models can act as specialized agents that handle particular types of queries with high precision, complementing the broader knowledge base of LLMs.
- Efficient Re-Ranking: SLMs can be used to quickly re-rank the results retrieved by an initial search query based on more nuanced criteria or domain-specific knowledge. An SLM trained in specific criteria can refine and re-rank results before presenting them to the user, improving relevance and accuracy. Note, Azure AI Search has a re-ranker that is SOTA already 🙂 Semantic ranking - Azure AI Search | Microsoft Learn
- Augmenting Search Queries: Before executing a search query, an SLM could preprocess the query to better capture the user’s intent or expand it with additional relevant terms.
- Post-Search Query Analysis: After initial results are retrieved and presented, SLMs can offer an additional layer of interaction, where they analyze user feedback or follow-up questions to refine the search results further.
I think the integration of SLMs with LLMs in an RAG architecture offers a pretty compelling approach to creating more efficient, accurate, and context-aware search and information retrieval systems. However, this space is pretty new and we’re all continuously experimenting to see how SLMs work best in the context of RAG. Hopefully, more to come soon!
- Eric JonesCopper ContributorThank you so much for the in-depth insights! 🙂
- VBasuCopper Contributor
Is there some high-level guidance on which Microsoft AI solution/tool to use, under which circumstances/requirements, for building a RAG based Chatbot?
I found a few different options in the Microsoft documentation and understand some are low & no code options, but what is possible to do with them & what is not, is hard to understand from the documentation.
- Co-pilot Studio (PVA) based (No-code)
- Azure OpenAI Studio --> Chat completion --> Deploy default App (Low-code)
- Azure OpenAI Studio --> Chat completion --> Deploy to PVA (No/Low-code)
- Azure AI Studio --> Build your own Copilot
- Azure Solution Accelerator
- From scratch using Python / .NET / etc.
- allisonsparrow
Microsoft
Hi Vivek It really depends on how much control and customization you want. The specific technology powering the platforms is generally the same.
The intent for Azure AI Studio is that it will have parity features with Azure OpenAI Studio, and at some point, replace Azure OpenAI studio.
The intent is for Copilot Studio to replace Azure OpenAI Studio --> chat completion --> deploy to PVA.
I believe feature/capability is already at parity, Copilot Studio may have more available capabilities now.
From a RAG perspective, Azure AI Studio, Azure OpenAI Studio and Copilot Studio all use the same feature: Azure OpenAI on your data. This makes it easy to quickly ground your app with your own data.
If you require multitenancy, you should go with AI Studio. If you're building in a single tenant environment, you should go with Copilot Studio.
If you are building a custom copilot for internal users (e.g. an human resources knowledge base), business operations or creating a customer engagement solution, you should probably start with Copilot Studio.
If your chatbot will be using significant M365 data sources like Sharepoint and Onedrive, you should start with Copilot Studio. If you will be connecting to OneLake, blob storage or other databases, you should start with AI Studio.
You technically can connect any data source for both Copilot and AI Studio, but one studio has an easier experience for certain data.
Copilot studio, as you mentioned is a low code based platform, and provide out of the box chatbot interfaces/apps to use.
Azure AI Studio is in preview, as is the SDK, so you may prefer starting with a solution accelerator if you prefer to code.
Hopefully this helps - let me know what tech specifically you are interested in, and I can respond
- VBasuCopper ContributorThank you very much Allison! It is really helpful. :) At this point in time, we are in an exploratory mode and trying out a couple of Use-cases (interestingly HR knowledge base chatbot is actually one of them), internally focused, where the data is predominantly in our Sharepoint online and multi-tenancy is not a requirement yet. So, Copilot Studio seems to be a good starting point. As a EU based company, we have some strict requirements for data residency requirements and need the Solution Stack (App, Model, Datastore, etc.) to be in the EU. As in the Copilot Studio option, most of the Stack is pre-built, I will take a quick look at the MSFT Documentation to understand if/how I can choose the Model / Model location and if I have any remaining specific questions I will trouble you in this thread again 🙂 Thanks again!
- Senthil GopalCopper ContributorWhat are some best practices for optimizing the performance of Large Language Models (LLMs) using retrieval-augmented generation (RAG) with vector databases, particularly when running them with Azure AI?
- gia_mondragon
Microsoft
A) If you refer to retrieval optimization, please take a look at: Azure AI Search: https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/azure-ai-search-outperforming-vector-search-with-hybrid/ba-p/3929167
- Kumar ChinnakaliCopper ContributorCould you please provide/direct us to the technical documentation for integrating directly with Microsoft 365 Copilot? We're looking to invoke the Document Compare JavaScript API without using the Word Task Pane Add-in route, as we see some session management issues.
- mike_carter_msft
Microsoft
Hi Kumar! You can find documentation for M365 Copilot extensibility here: https://learn.microsoft.com/en-us/microsoft-365-copilot/extensibility/ There are several samples in GitHub that should help you get started here: https://github.com/OfficeDev/Copilot-for-M365-Plugins-Samples?tab=readme-ov-file Our particular favorite is this sample that shows how to integrate Azure AI Search with the M365 Copilot: https://github.com/OfficeDev/Copilot-for-M365-Plugins-Samples/tree/main/samples/msgext-doc-search-csharp- Kumar ChinnakaliCopper ContributorGreat, Mike. Thanks a ton.
- StawshBrass ContributorWe want to create a service which will use AzureAI to summarize a report contained in a PDF, DOCX, or similar document. My question is pretty basic: Which AzureAI/OpenAI service or API should we use to build this service, and do you have an outline of the steps to get it done?
- danquirkFormer Employee
You have options across Azure to accomplish report summarization. Within the context of RAG and Azure AI Search, you can use prompt engineering techniques to accomplish your desired outcome by giving the LLM specific instructions to summarize retrieved documents. Relevant documentation on prompt engineering for Azure OpenAI is here. Within Azure AI Services Language Service there is also an API with native document support for summarization (please see this announcement). The right detailed steps are going to be dependent on exactly what you are trying to accomplish in terms of summarization (extractive, generative, etc.) and where you are starting from (already have the report document, must retrieve it first, etc.).
- StawshBrass ContributorOur users will already have the PDF/DOCX/etc. file containing the report for which they want a summary. - Not sure I know LLM enough to yet answer "extractive, generative, etc.", but imagine a 20-30 page report about which you want a 3 paragraph summary. - What specifically are "retrieved documents" where you write "giving the LLM specific instructions to summarize retrieved documents"?