Businesses are increasingly relying on artificial intelligence (AI) and machine learning (ML) to realize the full potential of their business data, building smart applications. The appetite for such intelligent applications is surely fueling lots of investments around conversational AI way of finding insights from your own data through chatbot style applications implemented using large language model (LLM) ever since OpenAI LLM model came into life.
Now that we have the recent announcement of the launch of Azure OpenAI Service on your data in public preview, I thought I would share a walkthrough on how to work with your own data, by bringing the data into Azure AI Studio. While the blog post, Introducing Azure OpenAI Service On Your Data in Public Preview - Microsoft Community Hub has the details on the announcement, my effort here is to share how simple it is to interact with your own data, indexed in Azure Search Service, through the power if Azure OpenAI Service.
In a real-world scenario, we work with varieties of enterprise data sources, some are unstructured, some are semi-structures, and some are structured. We work with documents data stored in PDF files, image data, no-SQL Data stored represented in JSON or relational data stored in relational engines like Azure SQL DB. For our scenario, we will use structured data, stored in Azure SQLDB. I picked up this data source, as we have a QuickStart example, built on data stored in Azure SQL DB, which could be easily indexed through Azure Search. Please refer to article Quickstart: Search explorer query tool - Azure Cognitive Search | Microsoft Learn more details on how the indexes could be created with the data stored in SQL data sources.
Harnessing the power of large language models like GPT-4, businesses are building intelligent enterprise applications. When we want to build an elegant Chat Solution on our own data, leveraging Azure OpenAI Service, the Azure Cognitive Search can help indexing our data, understand and retrieve the right pieces of your own data across large knowledge bases, based on the questions we ask. Once retrieved, we can then send the question and the contextual data we received from Azure Search for our question, to GPT 4 or ChatGPT, to take advantage of the LLM’s impressive capability for interacting in natural language to answer questions or take turns in a conversation. Answering a question is then a two-step process: first retrieve the sources of truth, and then summarize them into a response
To be able to bring the data (search documents for our example) from Azure Search, first let us create the index. Following the steps from https://learn.microsoft.com/en-us/azure/search/search-explorer, I have created the realestate-us-sample-index index for my walk through. To be able to leverage the power of semantic search, I have also created the semantic configuration for my index, as shown below:
There are few considerations we must keep in mind, while we create the semantic configuration. Configure semantic search - Azure Cognitive Search | Microsoft Learn has more details on how to select the fields used in semantic ranking.
As the index is created, I am now ready to bring the indexed data into AI studio. Through the launch of “Azure OpenAI Service on your data”, we can now easily bring the data in the Azure AI Studio (formerly Azure OpenAI Studio), using the “Bring your own data” capability.
Please note that this capability is in public preview.
I followed the steps below to get my indexed data into the AI studio, navigating to the “Bring your own data” link. The “Chat” option will be the default selection, when working with your own data. If you close the “Add data” dialog, you will notice that “Add your data(preview)” tab will be available, with the option to “+ Add a data source”, which you can use to start the “Add data” dialog.
Our enterprise data sources will be used to help ground the GPT-4 model with our business specific data. In the public preview, we have thee data sources we can ground the model with, one of the options is the Azure Cognitive Search. Since my scenario is to leverage Azure OpenAI Service with Azure Cognitive Search documents, I chose this data source, from the drop down, the “Add data” dialog box.
Once the target data source is selected, the “Add data” dialog will now show the data source details option. For my scenario, I populated the details with the Azure Search Service I provisioned, the index I created for the realestate-us-sample SQL data.
In the “Next” step, I populate the content data option. Please note that the content data is the data that would be sent to Azure OpenAI Service, as part of the prompt (the question I ask + the content) to the ChatGpt, to answer the questions I ask during the interaction. For simplicity (and because other properties are optional), I only filled out the “Content data” field, through the drop-down selection.
I selected the “description” and “postcode” fields, off the available fields for my documents, as these two fields would contain the information, on the questions I am planning to ask. An example question, based on the data I have is “apartment residence at postcode 98133”. Please note that familiarity with my data to ask the questions was a key to find the relevant information when I interact with ChatGPT.
In the final step of “Add data”, I have also selected the semantic search configurated that I created for my index in Azure Search. The idea is to pass along hints about the index fields that are important for semantic ranking or answering questions by the underlying model powering Azure Cognitive Search Service.
After adding the “semantic configuration”, I reviewed the “Add data” steps and clicked “Save and close” so my data is now available to the AI data from which I will be able to gain insights, interacting with my data directly.
Note that the “Chat playground” now will have the data source information, as below:
Please also note the checkbox “Limit responses to your data content”. This will be the key of interacting with our own data, through the ChatGPT interface.
We have indexed the data, creating documents, within Azure Cognitive Search. We have enabled Semantic Search and now we have grounded the index in Azure AI Studio. Let us see how it responds to my questions in finding some information in the real estate data.
My first question in the chat session was: “apartment residence at postCode 98133”. Here is the response I received:
The first interaction gives a hint on how to interact with the data. Please note that my content field includes the description and postcode fields from my indexes. The response generated by the ChatGpt, will leverage these two fields, as it summarizes the responses. Here is another example:
Please note that it also generated the citations, clicking one of these citations, will also generate a response from ChatGPT, showing the citation about the search, on the side, making the chat interactive:
As we see, the availability of the feature in the Azure AI Studio, allowing us to bring our data right into the AI studio to directly interact with the data to get insights, with the power of Azure OpenAI Service is an impressive milestone. For now, the data source is on a per session basis. If you leave the AI studio and come back, you will lose the data source. However, this walkthrough shows us how we can easily bring our own data in Azure AI studio, and then leverage LLM models (I am using gpt-4 model) to put together an interaction chat experience for end user. The pattern of retrieving the data first, based on the query, from already indexed data in Azure Search and then using the data as the context for answering the questions, by GPT – 4, known as Retrieval Augmented Generation (RAG) pattern, is one learning that we can take from this quick walkthrough. We can then create elegant solution, similar to one discussed in the blog Revolutionize your Enterprise Data with ChatGPT: Next-gen Apps w/ Azure OpenAI and Cognitive Search ...
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.