Blog Post

Microsoft Foundry Blog

9 MIN READ

Revolutionize your Enterprise Data with ChatGPT: Next-gen Apps w/ Azure OpenAI and Cognitive Search

Microsoft

Mar 09, 2023

It took less than a week for OpenAI’s ChatGPT to reach a million users, and it crossed the 100 million user mark in under two months. The interest and excitement around this technology has been remarkable. Users around the world are seeing potential for applying these large language models to a broad range of scenarios.

In the context of enterprise applications, the question we hear most often is “how do I build something like ChatGPT that uses my own data as the basis for its responses?”

The combination of Azure Cognitive Search and Azure OpenAI Service yields an effective solution for this scenario. It integrates the enterprise-grade characteristics of Azure, the ability of Cognitive Search to index, understand and retrieve the right pieces of your own data across large knowledge bases, and ChatGPT’s impressive capability for interacting in natural language to answer questions or take turns in a conversation.

In this blog post we’ll describe the above solution pattern, from the internals of orchestrating conversation and knowledge bases to the considerations in user experience necessary to help end users judge responses and their supporting facts appropriately. Our goal is to give you the tools necessary to build ChatGPT-powered applications starting today, using the "gpt-35-turbo" model that's now in preview. We’re also releasing a GitHub repo with examples, including UX, orchestration, prompts, etc., that you can use to learn more or as a starting point for your own application.

Chatting with your own data

The way you interact with large language models like ChatGPT is using natural language, giving the model a “prompt” and requesting it to complete it. This could be a question, a conversation turn, a pattern to extend, etc. When used this way, the responses you get are based on what the model has learned during training, which can be useful for general knowledge questions or an informal chat, but not what you want if you’re building an application where users should see responses based on your own data. You can see in this example (asking about employee healthcare plans) using the base ChatGPT model, the response (in green) is not useful since it doesn’t account for the company’s own benefits policy:

<|im_start|>system
Assistant helps Contoso Inc employees with their internal healthcare plan questions. 
<|im_end|>
<|im_start|>user
Does my plan cover annual eye exams?
<|im_end|>
<|im_start|>assistant
I'm sorry, I'm not able to access your specific healthcare plan information. However, many healthcare plans do cover annual eye exams. I recommend checking your plan's summary of benefits or contacting your plan administrator for more information.

One approach to have ChatGPT generate responses based on your own data is simple: inject this information into the prompt. ChatGPT can read the information along with any instructions, context or questions, and respond accordingly. This approach doesn’t need retraining or fine-tuning of the model, and the responses can reflect any changes in the underlying data immediately.

This presents a new challenge though: these models have a limit on the “context length” they support (the current ChatGPT model can take up to 4000 tokens in a prompt), and even if they didn’t have those limits, it wouldn’t be practical to inject GBs worth of data into a text prompt in each interaction. The alternative is to keep all the data in an external knowledge base that can retrieve pieces quickly and with good relevance, exactly what Cognitive Search was designed for. Here’s what a starting point prompt and response could look like:

<|im_start|>system
Assistant helps Contoso Inc employees with their internal healthcare plan questions. Only answer questions based on the fact listed below. If the facts below don't answer the question, say you don't know. Each fact sentence is numbered. Always include a reference to the fact number in the form [number] when you answer. 
1. Employees have a choice of two plans, PlanA and PlanB, and they have different benefits.
2. PlanA includes coverage for annual eye and hearing exams.
3. Those with PlanB have a co-payment for routine eye exams and vision-related purchases such as glasses.
4. Both PlanA and PlanB cover preventive care visits.
<|im_end|>
<|im_start|>user
Does my plan cover annual eye exams?
<|im_end|>
<|im_start|>assistant
PlanA includes coverage for annual eye and hearing exams. [2]

In the example the blue parts were retrieved from Cognitive Search based on the user’s question and any accumulated context that might be applicable, the green part is the response generated by the model, and the rest is the prompt template we used to give the model instructions.

This retrieval-augmented generation approach opens the door for starting simple and getting more sophisticated as needed. There are many options for how to construct prompts, how to formulate queries for effective retrieval from the knowledge base, and how to orchestrate back-and-forth interaction between ChatGPT and the knowledge base. Before we dig into those, let’s talk about one more requirement: helping users validate that responses are trustworthy.

Generating trustworthy responses

We assume these large language models, prompts, and orchestration systems aren’t perfect, and see the responses generated by them as a candidate response that should include the right information for an end user to validate. As part of exploring this topic we implemented 3 simple experiences as starting points. That’s not to say these are the only ones; we welcome ideas and feedback on the best way to give users better tools to validate that results from the system are factually correct.

As you can see in the picture below, when we produce a response in our examples, we also offer the user 3 “drill down” tools:

Citations: Each statement in the response includes a citation with a link to the source content. You can see the citations in context (the superscript numbers) as well as the links at the bottom. When you click on one, we display the original content so the user can inspect it.
Supporting content: Each response or chat bubble generated by ChatGPT has an option (notebook icon) for displaying all the original content that was fed into the prompt as facts.
Orchestration process: Also present in each response or chat bubble, we include an option (lightbulb icon) to see the entire interaction process, including intermediate results and generated prompts.

Each of these options may or may not be useful for users depending on the audience. There are other options to offer transparency and validation tools for users to have confidence in responses. In particular, in this blog post and initial version of the example code we don’t tackle the critical topic of methods that can be implemented within the application to evaluate quality of responses and possibly reject or retry cases that don’t meet certain criteria. We encourage application developers to explicitly explore this topic in the context of each application experience.

Emerging interaction patterns

Approaches for more effective prompt design, retrieval query construction, and interaction models between components are emerging quickly. This is a nascent space where we expect to see lots of rapid progress. Here’s a small sampling of starting points for prompt and query generation, with references to literature for those interested in more detail:

Retrieve-then-read: a simple starting point for single-shot Q&A scenarios, where the user question contains enough information to retrieve candidates from the search index. This approach simply uses the question to retrieve from the index, take the top few candidates, and inline them in a prompt along with instructions and the question itself.
Read content and context before retrieving: in many cases a user question alone is not good enough for retrieval. For example, in conversational settings, the last user turn may be just a few words representing a follow-up point or question and cannot be used to retrieve related knowledge effectively. Even in single-shot interactions, context needs to be accounted for. In these cases, an interesting approach is to use ChatGPT for search query generation, by asking the tool to create a summary of the conversation for retrieval purposes and accounting for any context you want to inject.
Actions, tools, and more: often a single interaction between the user input, the prompt instructions, and the knowledge base is not enough. For example, it’s been shown that asking large language models to decompose responses into small steps increases the quality of responses and avoids certain error patterns. Once a question is decomposed, smaller and more pointed questions can be asked to external sources, either as unstructured searches as we’ve done so far, or as factoid question-answering steps (e.g. as supported by Semantic Answers in Cognitive Search), or as lookups in external systems (e.g. an employee table in an internal application, or an incident table in a customer support application). This is a broad space for exploration, and lots of early experimental work is being . Some interesting reads include the introduction of CoT (chain of thought) prompting and subsequent work, the ReAct approach to combine CoT with tools, and the Toolformer approach to teach models how to use multiple tools to produce a response.

The samples that accompany this blog post implement some of these, either directly or through open-source libraries such as Langchain. Just to cherry pick a particular example, the user chat turn for “I have the plus plan” in the screenshot below wouldn’t yield a good answer using a naïve retrieve-then-read approach, but works well with a slightly more sophisticated implementation that carries the context of the conversations:

Improving knowledge base retrieval

Since responses will ultimately be based on what we’re able to retrieve from the knowledge base, quality of retrieval becomes a significant aspect of these solutions. Here are a few considerations:

Semantic ranking: by default, Cognitive Search will use keyword search combined with a simple probabilistic model for scoring. You can choose to enable Semantic Ranking, which will use a sophisticated deep learning secondary ranking layer for improved precision.
Document chunking: when indexing content in Cognitive Search for the specific purpose of powering ChatGPT scenarios, you want content of the right length. If each document is too short, it will lack context. If it’s too long, it’s hard to locate the right parts for ChatGPT to “read.” We recommend targeting a few sentences (e.g. ~1/4 to 1/3 of a page) with a sliding window of text as starting point if your data allows. In some cases, such as parts catalogs, it’s reasonable not to chunk up the data and have each document contain the full description of a single part.
Summarization: even after chunking, sometimes you’ll want to fit more candidates in a prompt, by making each candidate shorter. You can achieve this by using a summarization step. A few options for this include using Semantic Captions (a query-contextualized summarization step supported directly in Cognitive Search), using hit highlighting (a more lexical, instead of semantic, mechanism to extract snippets), or post-processing the search results with an external summarization model.

The accompanying sample code includes functionality to easily experiment with some of the options above (click settings icon on the window top-right).

More scenarios

In this blog post we focused on conversation and question answering scenarios that combine ChatGPT from Azure OpenAI with Azure Cognitive Search as a knowledge base and retrieval system. There are other ways in which Azure OpenAI Service and Cognitive Search can be combined to improve existing scenarios or enable new ones. Examples include using natural language for query formulation, powering catalog browsing experiences, and using Azure OpenAI at indexing time to enrich data. We plan on continuing to publish guidance and examples to illustrate how to accomplish many of these.

Try this out today, on your own data or ours

We posted a few examples, including the complete UX shown in this blog post, in this GitHub repo. We plan on continuously expanding that repo with a focus on covering more scenarios.

You can clone this repo and either use the included sample data or adapt it to use your own. We encourage you to take an iterative approach. Data preparation will take a few tries. Start by uploading what you have and try out the experience.

We’re excited about the prospect of improved and brand-new scenarios powered by the availability of large language models combined with information retrieval technology. We look forward to seeing what you will build with Azure OpenAI and Azure Cognitive Search.

Updated Mar 08, 2023

Version 1.0

pablocastro

Microsoft

Joined March 07, 2023

View Profile

Microsoft Foundry Blog

Follow this blog board to get notified when there's new activity

119 Comments

ArifLakhani
Former Employee
Feb 02, 2024
How can we point the citations to the original source link (where the data has been scraped from and stored in the storage) instead of showing the content from the storage container. currently in my setup when citations are shown they are from the storage container.
dotnetster
Copper Contributor
Jan 19, 2024
There is now a much easier way to do this via Azure OpenAI Studio.

For more details see: https://learn.microsoft.com/en-us/azure/ai-studio/tutorials/deploy-chat-web-app

We messed around with the GitHub Repo for a while but gave up in the end. The above article helped us get up and running in no time.
Cecil (CJ) John
Copper Contributor
Oct 05, 2023
Hello everyone,
I ran azd up from within the Azure CLI (Azure Portal) and I get the following error. I am logged into Azure with the Owner role. Does anyone have any suggestions?

--------------------------------------------------------------------------------
RESPONSE 400: 400 Bad Request
ERROR CODE: InvalidTemplateDeployment
--------------------------------------------------------------------------------
{
"error": {
"code": "InvalidTemplateDeployment",
"message": "The template deployment failed with error: 'Authorization failed for template resource 'rg-NewIntellectual' of type 'Microsoft.Resources/resourceGroups'. The client 'email address removed for privacy reasons' with object id '4138354a-0a50-4a38-b869-ee704afb590c' does not have permission to perform action 'Microsoft.Resources/subscriptions/resourceGroups/write' at scope '/subscriptions/84a1544b-f036-4823-81b9-397ef4697c91/resourceGroups/rg-NewIntellectual'.'."
}
}
--------------------------------------------------------------------------------
TraceID: 00000000000000000000000000000000
vpedroso
Copper Contributor
Sep 06, 2023
I need a help with 1 problem. When you are using the API, you can select the datasource, like cognitive search type. But how could I use 2 datasource at once? I want to query in two indexes at the same time. Could I do that?
Rams22
Copper Contributor
Aug 29, 2023
pablocastro

Thanks for giving good demo,

I am trying implementing gpt4 model with azure cognitive semantic search ( text/hybrid) .
Developed my own UI and passing text prompt to backend models. Every time i am getting chat completion gpt4 model chat_content as exception is " I'm sorry, but I can't provide the information you're looking for because I don't have access to the document any reason why i am getting this message"

Given example I am getting response from search results and storing in content.

chat_completion = openai.ChatCompletion.create(
deployment_id=self.chatgpt_deployment,
model=self.chatgpt_model,
messages=messages,
temperature=overrides.get("temperature") or 0.7,
max_tokens=800,
n=1)

chat_content = chat_completion.choices[0].message.content

Please suggest why i am getting an error and suggest can I use gpt 4 or gpt3.5turbo
BenjaminBusari
Copper Contributor
Aug 18, 2023
Hi pablocastro
Thank you for your amazing guide.
I have seen your videos on semantic search and azure openAI, they have been quite helpful.

I cloned the repo, tweaked and updated the UI and Data, I would like to know, how do I integrate(replicate) this with a bot I created with Azure Bot service or Power Virtual Agent.

For the Azure bot service, I tried the language understanding, where I connected the cognitive search and it works fine, but how do I achieve that summarization part to make the response smarter and more intelligent, Because with this solution it can only be accessible via the url created by the App Service, but I want to be able to deploy to Teams channel or other services.
FritzieMae19
Copper Contributor
Aug 17, 2023
Has anyone experienced where the frontend/app-service was reset to the Python welcome page? We have successfully deployed the demo solution in our DevTest subscription 3 weeks and we've been doing exploration on the solution. But today, the front end was reverted back to the Python welcome page and we've not done any new deployments since 3 weeks ago.
Simba_He
Former Employee
Aug 14, 2023
Hi pablocastro , Semantic Search is mind blowing! My question is,

From document: Semantic ranking in Azure Cognitive Search It looks like only the first 128 tokens of fields selected in semantic config will be passed to semantic ranker for re-ranking. Does it suggest that in order to make all the content to be taking into consideration of semantic ranker, I have to make sure each of my document chunks contains less than only 128 tokens? 128 is a number too small for document chunks.
Ajinkya_Bankar
Copper Contributor
Aug 09, 2023
pablocastro Thanks for this blog post and GitHub repo. Is it safe to ingest enterprise level proprietary data for cognitive search and use OpenAI services to build such application? Any possibility of data breach?
prodyutpaul
Copper Contributor
Aug 08, 2023
Hi Pablo, thanks for putting all your efforts for such a great application development. Based on this git-hub solution we are trying do more with cognitive service and GPT3.5 and targeting to crawl all emails and attachment of any file type for current logged in and consent user, so that the end user can get into a conversation with in his mail boxes data. We are building a page and components for user authentication and targeting this application to be deployed as Teams app. But during working with the solution and services I have noticed few things that I would like to clarify with you. Please provide your feedback.

I often get service unavailable or "SyntaxError: Unexpected non-whitespace character after JSON at position 6" error while even the index is not too big (15MB)
what about OCR and extracted key phrases ? how can I make the GPT work on other fields except the content field. Adding these fields to semantic sometimes causes the toke error as well
Merged content is too lengthy that mostly gives token limit error.
directly asking a question out of a PPT / Excel or document is not always returning results like compare two different files , figures etc, Prompting relevant questions related to a document and after taking the conversation to that direction some times works nicely. But asking ad-hoc question from different documents stuck the application to generating answers and finally 504 error.
What if the multiple instances of this application runs by different user , does the GPT3.5 model can handle multiple API requests or firmly say load balance. Do we need orchestration then ?