RAG with Azure AI: why your retrieval strategy matters AMA

63 Comments

EricStarker
Former Employee
Feb 14, 2024
Thanks for joining us for this RAG with Azure AI: why your retrieval strategy matters AMA!

The event is now over, but we'll be posting a summary of the questions and answers here soon.
EricStarker
Former Employee
Feb 14, 2024
Just nine minutes to go! Get your questions in!
- Cacrowley
  Occasional Reader
  Feb 14, 2024
  I have all languages in my data base, it up to Microsoft how they want it distributed, politically correct then continue on I didn't ask prepare or educate for it anyway, nor been compensated. Help me I help you. Fixing to delete all data. It's up to Microsoft
  - EricStarker
    Former Employee
    Feb 14, 2024
    I'm sorry, but this doesn't sound like a relevant question or comment for the Azure AI Search team. Sorry about that.
Leon Meijer
Copper Contributor
Feb 14, 2024
Is it possible to create a search engine/RAG that finds (almost) duplicate documents, and presents the most relevant ones, e.g. with the latest revision first?
- gia_mondragon
  Microsoft
  Feb 14, 2024
  Azure AI Studio (https://azure.microsoft.com/en-gb/products/ai-studio/) keeps track of all document versions for RAG, if that is what you’re looking for.
  - gia_mondragon
    Microsoft
    Feb 14, 2024
    If you would like to handle the freshness state of the document so the latest version is the one that comes up as part of the engine itself (in AI Search), we're currently working on a feature improvement that takes care of this functionality so this gets translated to the answers returned to the LLM when performing RAG.
Kumar Chinnakali
Copper Contributor
Feb 14, 2024
In our previous conversations with Microsoft, they said that while you can edit and add to Copilot for some Microsoft 365 apps (like Teams), Copilot for Word can only be used out of the box (no changes made). Is document compare feature is already a part of this that we are looking to use or have there been new announcements indicating that copilot for word is adaptable?
- mike_carter_msft
  Microsoft
  Feb 14, 2024
  Hi Kumar! We are from the Azure AI Search team and can’t speak to any future plans for M365 Copilot extensibility. As you stated, during the preview, extensibility is only supported for Teams.
gyangupta
Copper Contributor
Feb 14, 2024
Is it common to store all the user question to store as well in Vector DB for future model tuning or its automatic taken care by RAG?
- fsunavala-msft
  Microsoft
  Feb 14, 2024
  It’s important to understand the difference between Prompt-Engineering, Fine-Tuning, and Retrieval Augmentation as they are all great methods to incorporate domain knowledge into your Generative AI application.
  
  Prompt-Engineering is all about in-context learning. This is particularly good when you have a static situation and you can iterate on it until you get the answer you want. https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/advanced-prompt-engineering
  
  Fine-Tuning is good to learn skills or new jargon, or ways to produce responses and are pernament to your LLM. To Fine Tune or Not Fine Tune? That is the question (youtube.com)
  
  Retrieval-Augmentation is good for learning new facts by grounding your prompt with the relevant information it needs on the fly to answer a question. RAG and generative AI - Azure AI Search | Microsoft Learn
  
  To your question, storing user queries in a vector database is a strategy that can significantly enhance the model's performance over time if they are relevant to the task the LLM is trying to solve. If the user queries are NOT relevant, and you’re leveraging them as history to maintain in your user prompt, you’re likely just providing the LLM noise.
  
  Hope this helps!
Eric Jones
Copper Contributor
Feb 14, 2024
Is there recommended guidance for how and where [relative to RAG/Search/re-ranking...etc.] to best incorporate small language models -- where we tailor for specific purposes -- in concert with an LLM?
- fsunavala-msft
  Microsoft
  Feb 14, 2024
  This is a fantastic question! Small Language Models (SLMs) are very new and there is lots of research going on to find out more about their use case. Here are a few articles from Microsoft that talk about them more in depth: Phi-2: The surprising power of small language models - Microsoft Research and Orca 2: Teaching Small Language Models How to Reason - Microsoft Research.
  
  My takeaways from the research are the following:
  
  Specialized Task Handling: SLMs like Phi-2 can be tailored for specific domains or tasks such as legal document analysis or technical support queries. In a RAG setup, these models can act as specialized agents that handle particular types of queries with high precision, complementing the broader knowledge base of LLMs.
  
  Efficient Re-Ranking: SLMs can be used to quickly re-rank the results retrieved by an initial search query based on more nuanced criteria or domain-specific knowledge. An SLM trained in specific criteria can refine and re-rank results before presenting them to the user, improving relevance and accuracy. Note, Azure AI Search has a re-ranker that is SOTA already 🙂 Semantic ranking - Azure AI Search | Microsoft Learn
  
  Augmenting Search Queries: Before executing a search query, an SLM could preprocess the query to better capture the user’s intent or expand it with additional relevant terms.
  
  Post-Search Query Analysis: After initial results are retrieved and presented, SLMs can offer an additional layer of interaction, where they analyze user feedback or follow-up questions to refine the search results further.
  
  I think the integration of SLMs with LLMs in an RAG architecture offers a pretty compelling approach to creating more efficient, accurate, and context-aware search and information retrieval systems. However, this space is pretty new and we’re all continuously experimenting to see how SLMs work best in the context of RAG. Hopefully, more to come soon!
  - Eric Jones
    Copper Contributor
    Feb 14, 2024
    Thank you so much for the in-depth insights! 🙂
VBasu
Copper Contributor
Feb 14, 2024
Is there some high-level guidance on which Microsoft AI solution/tool to use, under which circumstances/requirements, for building a RAG based Chatbot?
I found a few different options in the Microsoft documentation and understand some are low & no code options, but what is possible to do with them & what is not, is hard to understand from the documentation.
Co-pilot Studio (PVA) based (No-code)
Azure OpenAI Studio --> Chat completion --> Deploy default App (Low-code)
Azure OpenAI Studio --> Chat completion --> Deploy to PVA (No/Low-code)
Azure AI Studio --> Build your own Copilot
Azure Solution Accelerator
From scratch using Python / .NET / etc.
- allisonsparrow
  Microsoft
  Feb 14, 2024
  Hi Vivek It really depends on how much control and customization you want. The specific technology powering the platforms is generally the same.
  
  The intent for Azure AI Studio is that it will have parity features with Azure OpenAI Studio, and at some point, replace Azure OpenAI studio.
  
  The intent is for Copilot Studio to replace Azure OpenAI Studio --> chat completion --> deploy to PVA.
  
  I believe feature/capability is already at parity, Copilot Studio may have more available capabilities now.
  
  From a RAG perspective, Azure AI Studio, Azure OpenAI Studio and Copilot Studio all use the same feature: Azure OpenAI on your data. This makes it easy to quickly ground your app with your own data.
  
  If you require multitenancy, you should go with AI Studio. If you're building in a single tenant environment, you should go with Copilot Studio.
  
  If you are building a custom copilot for internal users (e.g. an human resources knowledge base), business operations or creating a customer engagement solution, you should probably start with Copilot Studio.
  
  If your chatbot will be using significant M365 data sources like Sharepoint and Onedrive, you should start with Copilot Studio. If you will be connecting to OneLake, blob storage or other databases, you should start with AI Studio.
  
  You technically can connect any data source for both Copilot and AI Studio, but one studio has an easier experience for certain data.
  
  Copilot studio, as you mentioned is a low code based platform, and provide out of the box chatbot interfaces/apps to use.
  
  Azure AI Studio is in preview, as is the SDK, so you may prefer starting with a solution accelerator if you prefer to code.
  
  Hopefully this helps - let me know what tech specifically you are interested in, and I can respond
  - VBasu
    Copper Contributor
    Feb 15, 2024
    Thank you very much Allison! It is really helpful. :) At this point in time, we are in an exploratory mode and trying out a couple of Use-cases (interestingly HR knowledge base chatbot is actually one of them), internally focused, where the data is predominantly in our Sharepoint online and multi-tenancy is not a requirement yet. So, Copilot Studio seems to be a good starting point. As a EU based company, we have some strict requirements for data residency requirements and need the Solution Stack (App, Model, Datastore, etc.) to be in the EU. As in the Copilot Studio option, most of the Stack is pre-built, I will take a quick look at the MSFT Documentation to understand if/how I can choose the Model / Model location and if I have any remaining specific questions I will trouble you in this thread again 🙂 Thanks again!
Senthil Gopal
Copper Contributor
Feb 14, 2024
What are some best practices for optimizing the performance of Large Language Models (LLMs) using retrieval-augmented generation (RAG) with vector databases, particularly when running them with Azure AI?
- gia_mondragon
  Microsoft
  Feb 14, 2024
  A) If you refer to retrieval optimization, please take a look at: Azure AI Search: https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/azure-ai-search-outperforming-vector-search-with-hybrid/ba-p/3929167
Kumar Chinnakali
Copper Contributor
Feb 14, 2024
Could you please provide/direct us to the technical documentation for integrating directly with Microsoft 365 Copilot? We're looking to invoke the Document Compare JavaScript API without using the Word Task Pane Add-in route, as we see some session management issues.
- mike_carter_msft
  Microsoft
  Feb 14, 2024
  Hi Kumar! You can find documentation for M365 Copilot extensibility here: https://learn.microsoft.com/en-us/microsoft-365-copilot/extensibility/ There are several samples in GitHub that should help you get started here: https://github.com/OfficeDev/Copilot-for-M365-Plugins-Samples?tab=readme-ov-file Our particular favorite is this sample that shows how to integrate Azure AI Search with the M365 Copilot: https://github.com/OfficeDev/Copilot-for-M365-Plugins-Samples/tree/main/samples/msgext-doc-search-csharp
  - Kumar Chinnakali
    Copper Contributor
    Feb 14, 2024
    Great, Mike. Thanks a ton.
Stawsh
Brass Contributor
Feb 14, 2024
We want to create a service which will use AzureAI to summarize a report contained in a PDF, DOCX, or similar document. My question is pretty basic: Which AzureAI/OpenAI service or API should we use to build this service, and do you have an outline of the steps to get it done?
- danquirk
  Former Employee
  Feb 14, 2024
  You have options across Azure to accomplish report summarization. Within the context of RAG and Azure AI Search, you can use prompt engineering techniques to accomplish your desired outcome by giving the LLM specific instructions to summarize retrieved documents. Relevant documentation on prompt engineering for Azure OpenAI is here. Within Azure AI Services Language Service there is also an API with native document support for summarization (please see this announcement). The right detailed steps are going to be dependent on exactly what you are trying to accomplish in terms of summarization (extractive, generative, etc.) and where you are starting from (already have the report document, must retrieve it first, etc.).
  - Stawsh
    Brass Contributor
    Feb 14, 2024
    Our users will already have the PDF/DOCX/etc. file containing the report for which they want a summary. - Not sure I know LLM enough to yet answer "extractive, generative, etc.", but imagine a 20-30 page report about which you want a 3 paragraph summary. - What specifically are "retrieved documents" where you write "giving the LLM specific instructions to summarize retrieved documents"?

Event details