Retrieval Augmented Generation (RAG) is a way of making chat applications intelligently retrieve a subset of data from your data store to provide specific, contextual knowledge to the large language model to support how it answers a user’s prompt and ground its responses to your specific use case. Azure Cosmos DB for Mongo DB vCore is one of the only Azure databases that provides built-in vector search at any scale which makes it easy for you to store your semi-structured data and query it at the same time and place with the powerful capabilities of speed and scalability that Azure Cosmos DB gives to you. You no longer need to store your data somewhere and perform a search over it in another place as Azure Cosmos DB was built for AI-Driven Applications to do everything you need in one place.
Imagine you are experimenting with Azure OpenAI large language models to develop your company's RAG chat application. You want to integrate cutting-edge LLM technology quickly and easily into your apps. You heard about the Semantic Kernel and that it provides a way to orchestrate your AI-driven chat flow in some very simple steps. You've created a powerful chat application using the Semantic Kernel but you want to make it grounded to your company's data. You may have heard about RAG and how to generate vector embeddings which are a bunch of numbers that you can compare the similarity between and provide the LLM with a specific source of information to respond from. You also heard that vector search is not supported by all databases so you'll need a way to work around this luckily Azure Cosmos DB for MongoDB vCore is one of the few databases that has built-in vector search. You decided to take your company's data and convert it to JSON to store it in the database and deploy a simple chat application on Azure App Service to compare the vector search alone and the RAG flow and share it with others to test it.
In this blog, you'll learn to:
Build a RAG Chat Web Application using the Semantic Kernel, Azure OpenAI, and Azure Cosmos DB for MongoDB vCore: Step-by-Step Guide
Step 1: Create an Azure Cosmos DB for MongoDB vCore Cluster
Step 2: Create an Azure OpenAI resource and Deploy chat and embedding Models
Step 3: Create an Azure App Service and Deploy the RAG Chat Application
In this step, you'll:
1. Visit the Azure Portal https://portal.azure.com in your browser and sign in.
Now you are inside the Azure portal!
In this step, you create an Azure Cosmos DB for MongoDB vCore Cluster to store your data, vector embedding, and perform vector search.
1. Type mongodb vcore in the search bar at the top of the portal page and select Azure Cosmos DB for MongoDB (vCore) from the available options.
2. Select Create from the toolbar to start provisioning your new cluster.
3. Add the following information to create a resource:
What | Value |
Subscription | Use your preferred subscription. It's advised to use the same subscription across all the resources that communicate with each other on Azure. |
Resource group | Select Create new to create a new resource group. Enter a unique name for the resource group. |
Cluster name | Enter a globally unique name. |
Location | Select a region close to you for the best response time. For example, Select UK South. |
MongoDB version | Select the latest available version of MongoDB. |
4. Select Configure to configure your cluster tier.
5. Add the following information to configure the cluster tier. You can scale it up later:
What | Value |
Cluster tier | Select M25 tier, 2 (Burstable) vCores. |
Storage |
Select 32 GiB. |
6. Select Save.
7. Enter the cluster Admin Username and Password and store them in a secure location.
8. Select Next to configure the networking settings.
9. Select Allow Public Access from Azure services and resources within the Azure to this cluster.
10. Select Add current IP address to the firewall rules to allow local access to the cluster.
11. Select Review + create.
12. Confirm your configuration settings and select Create to start provisioning the resource.
Note: The cluster creation can take up to 10 minutes. It's recommended to move on with the rest of the steps and get back to it later.
In this step, you'll:
In this step, you create an Azure OpenAI Service resource that enables you to interact with different large language models (LLMs).
1. Type openai in the search bar at the top of the portal page and select Azure OpenAI from the available options.
2. Select Create from the toolbar to provision a new OpenAI resource.
3. Add the following information to create a resource:
What | Value |
Subscription | Use the same subscription you used to apply for Azure OpenAI access. |
Resource group | Use the resource group you created in the previous step. |
Region | Select a region close to you for the best response time. For example, Select UK South. |
Name | Enter a globally unique name. |
Pricing tier |
Select S0. Currently, this is the only available pricing tier.
|
4. Now that the basic information is added, select Next to confirm your details and proceed to the next page.
5. Select Next to confirm your network details.
6. Select Next to confirm your tag details.
7. Confirm your configuration settings and select Create to start provisioning the resource. Wait for the deployment to finish.
8. After the deployment finishes, select Go to resource to inspect your created resource. Here, you can manage your resource and find important information like the endpoint URL and API keys.
In this step, you create an Azure OpenAI embedding model deployment and a chat model deployment. Creating a deployment on your previously provisioned resource allows you to generate text embeddings (i.e. numerical representation for text) and have a natural language conversation with your data.
1. Select Go to Azure OpenAI Studio from the toolbar to open the studio.
2. Select Create new deployment to go to the deployments tab.
3. Select + Create new deployment from the toolbar. A Deploy model window opens.
4. Add the following information to create a chat model deployment:
What | Value |
Select a model | Select gpt-35-turbo. |
Model version | Select 0301. |
Deployment name | Add a name that's unique for this cloud instance. For example, chat-model because this model type is optimized for having conversations. |
5. Select Create.
6. Select + Create new deployment from the toolbar. A Deploy model window opens.
7. Add the following information to create an embedding model deployment:
What | Value |
Select a model | Select text-embedding-ada-002. |
Model version | Select 2. |
Deployment name | Add a name that's unique for this cloud instance. For example, embedding-model because this model type is optimized for creating embeddings. |
8. Select Create.
In this step, you'll:
In this step, you create a copy from the source code on your GitHub account to be able to edit it and use it later.
1. Visit the sample github.com/john0isaac/rag-semantic-kernel-mongodb-vcore in your browser and sign in.
2. Select Fork from the top of the sample page.
3. Select an owner for the fork then, select Create fork.
In this step, you create an Azure App service resource and connect it with your GitHub account to deploy a Python application.
1. Type app service in the search bar at the top of the portal page and select App Services from the available options.
2. Select Create Web App from the toolbar to start provisioning a new web application.
3. Add the following information to fill in the basic configuration of the application:
What | Value |
Subscription | Use the same subscription you used to apply for Azure OpenAI access. |
Resource group | Use the same resource group you created before. |
Name | Enter a unique name for your website. For example, rag-mongodb-demo. |
Publish? | Select Code. This option specifies whether your deployment consists of code or a container. |
Runtime stack | Select Python 3.10. |
Operating System | Select Linux. |
Region | Select UK South. This is the region where the rest of the resources you created reside. |
4. Add the following information to create the app service plan. You can scale it up later:
What | Value |
Linux Plan | Select a pre-existing plan or create a new plan. |
Pricing Plan |
Select Basic B1. |
5. Select Deployment from the toolbar to move to the deployment configuration tab.
6. Add the following information to enable continuous deployment from GitHub:
What | Value |
Continuous deployment | Select Enable. |
GitHub account | Select your GitHub account. |
Organization | Select your organization. If you are using your personal account then select it. |
Repository | Select rag-semantic-kernel-mongodb-vcore. |
Branch | Select main. |
7. Select Review + create.
8. Confirm your configuration settings and select Create to start provisioning the resource. Wait for the deployment to finish.
9. After the deployment finishes, select Go to resource to inspect your created resource. Here, you can manage your resource and find important information like the application settings and logs.
What | Value |
AZURE_OPENAI_CHAT_DEPLOYMENT_NAME | <chatModelDeploymentName> |
AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME | <embeddingModelDeploymentName> |
AZURE_OPENAI_DEPLOYMENT_NAME | <azureOpenAiResourceName> |
AZURE_OPENAI_ENDPOINT | https://<azureOpenAiResourceName>.openai.azure.com/ |
AZURE_OPENAI_API_KEY | <azureOpenAiResourceKey> |
AZCOSMOS_API | mongo-vcore |
AZCOSMOS_CONNSTR | mongodb+srv://<mongoAdminUser>:<mongoAdminPassword>@<mongoClusterName>.global.mongocluster.cosmos.azure.com/?tls=true&authMechanism=SCRAM-SHA-256&retrywrites=false&maxIdleTimeMS=120000 |
AZCOSMOS_DATABASE_NAME | <cosmosDatabaseName> |
AZCOSMOS_CONTAINER_NAME | <cosmosContainerName> |
Any value should work for them.
4. Select Save.
5. Select General settings to edit the application startup command.
6. Type entrypoint.sh in the startup command field and select Save.
3. Open the file and select the pen icon to edit it.
4. Modify lines 31 and 36 to the following:
31 | run: cd src && pip install -r ./requirements.txt |
36 | run: cd src && zip ../release.zip ./* -r |
5. Select Commit changes, and review your commit message and description. Select Commit changes.
6. Select Actions to review the workflow run status.
2. Type in the chat message What is Azure Functions? and it should respond with I don't know.
3. Navigate to your Azure App service resource page and select SSH.
4. Select Go to open a new SSH page.
5. In the SSH terminal, run this command:
|
6. Navigate back to the live website and type in the chat message What is Azure Functions? and it should respond with the correct answer now.
Congratulations!! You successfully built the full application.
If you want to learn how to add your own data see this guide on the repository's main readme.
Once you finish experimenting on Microsoft Azure you might want to delete the resources to not consume any more money from your subscription.
You can delete the resource group and it will delete everything inside it or delete the resources one by one that's totally up to you.
Congratulations! You've learned how to create an Azure Cosmos DB for MongoDB vCore cluster, how to create an Azure OpenAI resource, how to deploy an embedding model and a chat model from Azure OpenAI studio, how to create an Azure App service and configure continuous deployment with GitHub, and how to modify application settings to enable the communication across Azure resources. By using these technologies, you can build a RAG chat application with the option to perform vector search too over your own data and provide grounded (relevant) responses.
Found this useful? Share it with others and follow me to get updates on:
Feel free to share your comments and/or inquiries in the comment section below..See you in future demos!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.