Forum Discussion
Setup Teams Toolkit RAG bot with Azure Ai Search indexes
Hello, new to the forum and im trying to learn how i could create a Teams RAG bot which would use Azure Ai Search indexers aswell as default OPENAI LLM.
I have a Azure OpenAI created and working.
I have created a Azure Storage where i have uploaded a bunch of PDFs, which i have connected through datasources in Azure Ai Search.
I have also created a indexer that will index files in the Azure Storage.
This all works, without any problems.
But when it comes to debugging my bot in Teams Toolkit on my computer, i keep running into that the bot doesn't have any info of those PDFs i've uploaded and indexed. I have already edited the files:
src/indexers/setup.py line 62 " index = 'myindexname' "
Aswell as
src/bot.py line 40 " indexName= 'myindexname' "
The only working solution i've found to this is if i put my PDFs in src/indexers/data/.
I have edited my src/indexers/get_data.py to look like this:
import os
import PyPDF2
async def get_doc_data(embeddings):
docs = []
data_dir = os.path.join(os.getcwd(), 'src/indexers/data/')
pdf_files = [f for f in os.listdir(data_dir) if f.endswith('.pdf')]
for idx, file_name in enumerate(pdf_files):
file_path = os.path.join(data_dir, file_name)
with open(file_path, 'rb') as file: # 'rb'
reader = PyPDF2.PdfReader(file)
raw_description = ""
for page in reader.pages:
raw_description += page.extract_text() or ""
doc = {
"docId": str(idx + 1),
"docTitle": file_name,
"description": raw_description,
"descriptionVector": await get_embedding_vector(raw_description, embeddings=embeddings),
}
docs.append(doc)
return docs
async def get_embedding_vector(text: str, embeddings):
result = await embeddings.create_embeddings(text)
if result.status != 'success' or not result.output:
if result.status == 'error':
raise Exception(f"Failed to generate embeddings for description: <{text[:200]+'...'}>\n\nError: {result.output}")
raise Exception(f"Failed to generate embeddings for description: <{text[:200]+'...'}>")
return result.output[0]
When i run the command "python src/indexers/setup.py" it uploads my PDFs to an index and the teams toolkit bot have my pdf data. I can now chat over my PDFs data.
But i don't want to be forced to upload my index like this everytime. I want to use a index that already exist and have datasources and indexers connected already to not be forced to manually update it or running that command.
This bot when finished will be uploaded to Teams, and will only be available for our users through Teams.
Does anybody know or have a guide to how i actually can achive this?
- balasubramanimIron ContributorJamie-Bech,
To achieve this, you can try the following.
Use the Azure AI Search SDK in your bot code to query the existing index and retrieve the PDF data. This way, you won't need to upload PDFs manually.
Configure the bot to use the existing index by specifying the index name and API key in your bot configuration.
Use the azure-ai-search package in your bot code to interact with the Azure AI Search service and retrieve the PDF data.
sample code to start
from azure.ai.search import SearchClient
from azure.ai.search.models import SearchResults
# Configure the search client
search_client = SearchClient("your_index_name", "your_api_key")
# Query the index
results = search_client.search("your_query")
# Process the search results
for result in results:
print(result.document["docTitle"])
print(result.document["description"])
This should help you use the existing Azure AI Search index with your Teams RAG bot- Jamie-BechCopper Contributor
Hello balasubramanim!
Thank you for the guidance. Got it to work after some tweaks, and help from your guide. Thank you once again!