Revolutionize your Enterprise Data with ChatGPT: Next-gen Apps w/ Azure OpenAI and Cognitive Search
Published Mar 09 2023 07:55 AM 363K Views
Microsoft

It took less than a week for OpenAI’s ChatGPT to reach a million users, and it crossed the 100 million user mark in under two months. The interest and excitement around this technology has been remarkable. Users around the world are seeing potential for applying these large language models to a broad range of scenarios.

 

In the context of enterprise applications, the question we hear most often is “how do I build something like ChatGPT that uses my own data as the basis for its responses?”

 

The combination of Azure Cognitive Search and Azure OpenAI Service yields an effective solution for this scenario. It integrates the enterprise-grade characteristics of Azure, the ability of Cognitive Search to index, understand and retrieve the right pieces of your own data across large knowledge bases, and ChatGPT’s impressive capability for interacting in natural language to answer questions or take turns in a conversation.

 

pablocastro_0-1678240914824.png

In this blog post we’ll describe the above solution pattern, from the internals of orchestrating conversation and knowledge bases to the considerations in user experience necessary to help end users judge responses and their supporting facts appropriately. Our goal is to give you the tools necessary to build ChatGPT-powered applications starting today, using the "gpt-35-turbo" model that's now in preview. We’re also releasing a GitHub repo  with examples, including UX, orchestration, prompts, etc., that you can use to learn more or as a starting point for your own application.

 

Chatting with your own data

The way you interact with large language models like ChatGPT is using natural language, giving the model a “prompt” and requesting it to complete it. This could be a question, a conversation turn, a pattern to extend, etc. When used this way, the responses you get are based on what the model has learned during training, which can be useful for general knowledge questions or an informal chat, but not what you want if you’re building an application where users should see responses based on your own data. You can see in this example (asking about employee healthcare plans) using the base ChatGPT model, the response (in green) is not useful since it doesn’t account for the company’s own benefits policy:

 

<|im_start|>system
Assistant helps Contoso Inc employees with their internal healthcare plan questions. 
<|im_end|>
<|im_start|>user
Does my plan cover annual eye exams?
<|im_end|>
<|im_start|>assistant
I'm sorry, I'm not able to access your specific healthcare plan information. However, many healthcare plans do cover annual eye exams. I recommend checking your plan's summary of benefits or contacting your plan administrator for more information.

 

One approach to have ChatGPT generate responses based on your own data is simple: inject this information into the prompt. ChatGPT can read the information along with any instructions, context or questions, and respond accordingly. This approach doesn’t need retraining or fine-tuning of the model, and the responses can reflect any changes in the underlying data immediately.

 

This presents a new challenge though: these models have a limit on the “context length” they support (the current ChatGPT model can take up to 4000 tokens in a prompt), and even if they didn’t have those limits, it wouldn’t be practical to inject GBs worth of data into a text prompt in each interaction. The alternative is to keep all the data in an external knowledge base that can retrieve pieces quickly and with good relevance, exactly what Cognitive Search was designed for. Here’s what a starting point prompt and response could look like:

 

<|im_start|>system
Assistant helps Contoso Inc employees with their internal healthcare plan questions. Only answer questions based on the fact listed below. If the facts below don't answer the question, say you don't know. Each fact sentence is numbered. Always include a reference to the fact number in the form [number] when you answer. 
1. Employees have a choice of two plans, PlanA and PlanB, and they have different benefits.
2. PlanA includes coverage for annual eye and hearing exams.
3. Those with PlanB have a co-payment for routine eye exams and vision-related purchases such as glasses.
4. Both PlanA and PlanB cover preventive care visits.
<|im_end|>
<|im_start|>user
Does my plan cover annual eye exams?
<|im_end|>
<|im_start|>assistant
PlanA includes coverage for annual eye and hearing exams. [2]

 

In the example the blue parts were retrieved from Cognitive Search based on the user’s question and any accumulated context that might be applicable, the green part is the response generated by the model, and the rest is the prompt template we used to give the model instructions.

 

This retrieval-augmented generation approach opens the door for starting simple and getting more sophisticated as needed. There are many options for how to construct prompts, how to formulate queries for effective retrieval from the knowledge base, and how to orchestrate back-and-forth interaction between ChatGPT and the knowledge base. Before we dig into those, let’s talk about one more requirement: helping users validate that responses are trustworthy.

 

Generating trustworthy responses

We assume these large language models, prompts, and orchestration systems aren’t perfect, and see the responses generated by them as a candidate response that should include the right information for an end user to validate. As part of exploring this topic we implemented 3 simple experiences as starting points. That’s not to say these are the only ones; we welcome ideas and feedback on the best way to give users better tools to validate that results from the system are factually correct.

 

As you can see in the picture below, when we produce a response in our examples, we also offer the user 3 “drill down” tools:

pablocastro_5-1678237234474.png

  1. Citations: Each statement in the response includes a citation with a link to the source content. You can see the citations in context (the superscript numbers) as well as the links at the bottom. When you click on one, we display the original content so the user can inspect it.
  2. Supporting content: Each response or chat bubble generated by ChatGPT has an option (notebook icon) for displaying all the original content that was fed into the prompt as facts.
  3. Orchestration process: Also present in each response or chat bubble, we include an option (lightbulb icon) to see the entire interaction process, including intermediate results and generated prompts.

Each of these options may or may not be useful for users depending on the audience. There are other options to offer transparency and validation tools for users to have confidence in responses. In particular, in this blog post and initial version of the example code we don’t tackle the critical topic of methods that can be implemented within the application to evaluate quality of responses and possibly reject or retry cases that don’t meet certain criteria. We encourage application developers to explicitly explore this topic in the context of each application experience.

 

Emerging interaction patterns

Approaches for more effective prompt design, retrieval query construction, and interaction models between components are emerging quickly. This is a nascent space where we expect to see lots of rapid progress. Here’s a small sampling of starting points for prompt and query generation, with references to literature for those interested in more detail:

  • Retrieve-then-read: a simple starting point for single-shot Q&A scenarios, where the user question contains enough information to retrieve candidates from the search index. This approach simply uses the question to retrieve from the index, take the top few candidates, and inline them in a prompt along with instructions and the question itself.
  • Read content and context before retrieving: in many cases a user question alone is not good enough for retrieval. For example, in conversational settings, the last user turn may be just a few words representing a follow-up point or question and cannot be used to retrieve related knowledge effectively. Even in single-shot interactions, context needs to be accounted for. In these cases, an interesting approach is to use ChatGPT for search query generation, by asking the tool to create a summary of the conversation for retrieval purposes and accounting for any context you want to inject.
  • Actions, tools, and more: often a single interaction between the user input, the prompt instructions, and the knowledge base is not enough. For example, it’s been shown that asking large language models to decompose responses into small steps increases the quality of responses and avoids certain error patterns. Once a question is decomposed, smaller and more pointed questions can be asked to external sources, either as unstructured searches as we’ve done so far, or as factoid question-answering steps (e.g. as supported by Semantic Answers in Cognitive Search), or as lookups in external systems (e.g. an employee table in an internal application, or an incident table in a customer support application). This is a broad space for exploration, and lots of early experimental work is being . Some interesting reads include the introduction of CoT (chain of thought) prompting and subsequent work, the ReAct approach to combine CoT with tools, and the Toolformer approach to teach models how to use multiple tools to produce a response.

The samples that accompany this blog post implement some of these, either directly or through open-source libraries such as Langchain. Just to cherry pick a particular example, the user chat turn for “I have the plus plan” in the screenshot below wouldn’t yield a good answer using a naïve retrieve-then-read approach, but works well with a slightly more sophisticated implementation that carries the context of the conversations:

 

pablocastro_1-1678240436279.png

Improving knowledge base retrieval

Since responses will ultimately be based on what we’re able to retrieve from the knowledge base, quality of retrieval becomes a significant aspect of these solutions. Here are a few considerations:

  1. Semantic ranking: by default, Cognitive Search will use keyword search combined with a simple probabilistic model for scoring. You can choose to enable Semantic Ranking, which will use a sophisticated deep learning secondary ranking layer for improved precision.
  2. Document chunking: when indexing content in Cognitive Search for the specific purpose of powering ChatGPT scenarios, you want content of the right length. If each document is too short, it will lack context. If it’s too long, it’s hard to locate the right parts for ChatGPT to “read.” We recommend targeting a few sentences (e.g. ~1/4 to 1/3 of a page) with a sliding window of text as starting point if your data allows. In some cases, such as parts catalogs, it’s reasonable not to chunk up the data and have each document contain the full description of a single part.  
  3. Summarization: even after chunking, sometimes you’ll want to fit more candidates in a prompt, by making each candidate shorter. You can achieve this by using a summarization step. A few options for this include using Semantic Captions (a query-contextualized summarization step supported directly in Cognitive Search), using hit highlighting (a more lexical, instead of semantic, mechanism to extract snippets), or post-processing the search results with an external summarization model.

The accompanying sample code includes functionality to easily experiment with some of the options above (click settings icon on the window top-right).

 

More scenarios

In this blog post we focused on conversation and question answering scenarios that combine ChatGPT from Azure OpenAI with Azure Cognitive Search as a knowledge base and retrieval system. There are other ways in which Azure OpenAI Service and Cognitive Search can be combined to improve existing scenarios or enable new ones. Examples include using natural language for query formulation, powering catalog browsing experiences, and using Azure OpenAI at indexing time to enrich data. We plan on continuing to publish guidance and examples to illustrate how to accomplish many of these.

 

Try this out today, on your own data or ours

We posted a few examples, including the complete UX shown in this blog post, in this GitHub repo. We plan on continuously expanding that repo with a focus on covering more scenarios.

 

You can clone this repo and either use the included sample data or adapt it to use your own. We encourage you to take an iterative approach. Data preparation will take a few tries. Start by uploading what you have and try out the experience.

 

We’re excited about the prospect of improved and brand-new scenarios powered by the availability of large language models combined with information retrieval technology. We look forward to seeing what you will build with Azure OpenAI and Azure Cognitive Search.

 

119 Comments
Copper Contributor

This sound great but:
Deploying the Repo return this error when deploying the Open AI service:

 

 
"status": "Failed",
"error": {
"code": "InvalidTemplateDeployment",
"message": "The template deployment 'openai' is not valid according to the validation procedure. The tracking id is '33b766c8-74f7-45a9-958a-c3af9a81d4d5'. See inner errors for details.",
"details": [
{
"code": "DeploymentModelNotSupported",
"message": "The model 'Format: OpenAI, Name: gpt-35-turbo, Version: 0301' of account deployment is not supported."
}
]
}
}
Copper Contributor

Hello,

My bad gpt-35-turbo only in US regions as explained in the Readme.

Will try to deploy there.

Cheers!

 

Brass Contributor

Thanks for the article! I created an Azure Cognitive Search resource and Cognitive Services resource.  I pushed data to an index programmatically and can use the Search Explorer on the index.  I understand I can get the results of the search and include it in the "system" message in ChatGPT to use my own data as the basis for its responses.  The results may still exceed 4000 tokens. Were there other methods than this?  

 

Also, just wandering if one could do this in the ChatGPT Playground, ie. read data from an index.  Thanks in advance.

 

 

Brass Contributor

EDIT #2:  In less than 24h after applying I was approved, now time to get to work. Sorry about the noise.

EDIT:  Sorry, I tried to deploy before even reading the readme where it clearly states that one needs to request access via the exact same form I describe below. So, fingers crossed.

Spoiler

So, I tried to deploy this thing and ran into a wall.  Automated deployment failed with

 

The subscription does not have QuotaId/Feature required by SKU 'S0' from kind 'OpenAI'

AndrsS_0-1678675670825.png

 

So, I tried to deploy Azure OpenAI to my account via the portal (here: https://portal.azure.com/#create/Microsoft.CognitiveServicesOpenAI) and ran into the this message:  "Azure OpenAI Service is currently available to customers via an application form. The selected subscription has not been enabled for use of the service and does not have quota for any pricing tiers.". 

 

It points to a form on which one requests access and by the types of questions asked it seems to me that it is not easy to be allowed access, so what then?   I did fill out the form with accurate answers but I'm not holding my breath on being approved. 

 

So, how can us, mere mortals, be able to access this?

 

Thanks!

 

 

 

Copper Contributor

This looks very useful. Busy porting it to C# so I can read the code atm. I appreciate that this is in the AI space, so...PYTHON ALL THE THINGS!!!!

But any chance you could provide this in a form a bit more generally accessible to regular enterprise developers. As it's Microsoft, that would be C#.

I'm just getting a little bored of porting everything and then having to maintain it in parallel. I imagine, few enterprises are going to have standardized on Python for general back-end development, (which this is all this is, it just happens to call AI rest services) so almost everyone is going to be in the same place. 

 

In the mean-time, it's off to get ChatGPT to help me port this code...

Microsoft

@aymiee prompts will (for now) remain limited token-wise.

If you want the Model to be able to access large(r) amounts of text you would need to embed that knowledge base: one version of looking into how to do that would be https://github.com/ruoccofabrizio/azure-open-ai-embeddings-qna. the other version I am aware of would be to copy how Colin did that (using Azure Open AI) in a 1 hour life demo session last Thursday at Ignite in Zurich, materials available here: https://www.linkedin.com/posts/colin-jarvis-50019658_public-powering-your-products-with-chatgpt-acti...

Microsoft

@BenPatt Acknowledged on your point on C#. In the meanwhile, if you have questions while you port this, happy to help.

Microsoft

@aymiee the pattern to make input fit within the prompt length limit is to split input content into smaller parts. There are various ways to do this depending on the type of data. A simple example is included in the github repo that goes with this blog post, where we split on a sliding window of text, you can see the code for this particular part here: azure-search-openai-demo/prepdocs.py at main · Azure-Samples/azure-search-openai-demo · GitHub

Brass Contributor

@S_Rappen - Thank you! will take a look at both!!!  

 

@pablocastro - Thanks for this link.   I just attempted to run this and it's taking over an hour to create.  The Overview page shows that it is still in progress but all status are OK and no outputs to address.  After an hour, it did finally return with refresh token has expired.  Will try again.  

 

Tried again and kept getting "DefaultAzureCredential failed to retrieve a token from the included credentials".  It attempted: EnvironmentCredential, ManagedIdentityCredential, SharedTokenCacheCredential.  Turns out I had to add the role of "Cognitive Services User" to the main app Resource Group.  No more errors except for a warning "Some chunks are larger than 500 kbs after minification".  I think it's referring to this file: backend/static/assets/index-41d57639.js 

 

When I browsed to : https://app-backend-blankblank.azurewebsites.net/, I can see the UI, but got an error that index was not found.  There were no data loaded in the table.  Might have to adjust the chunk size?

I've tried to run, but I got this message:

 

ERROR: deployment failed: error deploying infrastructure: failed deploying: deploying to subscription:

Deployment Error Details:
InvalidTemplateDeployment: The template deployment 'openai' is not valid according to the validation procedure. The tracking id is '9b2a8e36-80ce-464f-8d2d-5bfc8de225b1'. See inner errors for details.
SpecialFeatureOrQuotaIdRequired: The subscription does not have QuotaId/Feature required by SKU 'S0' from kind 'OpenAI' .

 

But in fact I have access to OpenAI in my Subscription since 23 January.

Microsoft

@Ernesto Cardenas Sorry this is giving you trouble. This might be caused by not having access in that subscription, or deploying to a region that doesn't have the service available (which region did you use?), or that you exceeded the number of OpenAI resources you can have in your subscription (your subscription is likely limited to 2 OpenAI instances). 

Microsoft

@aymiee looks like you're almost there getting this to work :) Since everything seems to be working but you see no data, you probably need to run the data prep/loading script again. It's scripts/prepdocs.py, though if you run the ps1 wrapper you'll get all the names of resources wired up for you.

 

Since it looks like permissions weren't assigned correctly (maybe that step is what timed out?), you may need to give your user Blob Contributor and Search Contributor roles for the storage and search services respectively.

 

btw- the chunk size warning is coming from the React build step, safe to ignore.

@pablocastro hi, I'm trying with Central US, in fact I only created one resource in my subscription. The command only creates the Storage account, App Service and the Search Service. In any case I'll delete the old one and try again.

Microsoft

@Ernesto Cardenas The region is the issue. The readme lists the regions where the gpt-35-turbo model is available, South Central US and East US right now.

Now the deploy is complete, but every time that I ask a question  I got "Error: () The index 'gptkbindex' for service 'gptkb-a3ehmsx7epo5k' was not found. Code: Message: The index 'gptkbindex' for service 'gptkb-a3ehmsx7epo5k' was not found.·

Brass Contributor

@Ernesto Cardenas  same here.  I did notice that after the whole deployments to azure finished, there were some errors, unfortunately I already closed all windows for the day; but it was related to the (data prep I believe) script not being able to retrieve credentials for connecting to Azure. 

 

Reading the errors that came afterwards it seems to me that the script simply was not able to finish the setup.    At the end of the script there were no Indexes on the Search service.  So, I believe the deployment on itself was successful - but the post-deployment tasks not.

Brass Contributor

In this comment I had suggested an approach to use either the Free or Basic service tiers of Search Services.  It is not possible, because Semantic Search is required and it is not part of the Basic or Free tiers, so I'm rolling back this comment to avoid having incorrect information around.

Hello @Andrés S  yes I was figuring that it was something related to the post deploy phases. I must understand that you updated the Bicep files in order to make it work? Guess that you could make a pull request to the main repo.

Microsoft

@Ernesto Cardenas @Andrés S if azd up failed due permissions or timed-out it didn't run the data prep script. You can find the script in scripts/prepdocs.py, though if you run it through prepdocs.ps1 you get all the parameters wired up to your environment for you. You'll want to double check in the Azure portal that your user has the roles of blob contributor and search contributor (for the storage account and search servie respectively) before you run the script.

Brass Contributor

 @pablocastro , I added the Storage Blob Data Contributor to the Storage Account and the Search Service Contributor to the Search Service, but still the index and tables are not being created.  Were there any other roles that I am missing?  I did re-run  azd login and then scripts/prepdocs.ps1.  The error I am still getting is:

 

Ensuring search index gptkbindex exists
DefaultAzureCredential failed to retrieve a token from the included credentials.

azure.core.exceptions.ClientAuthenticationError: DefaultAzureCredential failed to retrieve a token from the included credentials.
Attempted credentials:
EnvironmentCredential: EnvironmentCredential authentication unavailable. Environment variables are not fully configured.
ManagedIdentityCredential: ManagedIdentityCredential authentication unavailable, no response from the IMDS endpoint.
SharedTokenCacheCredential: Azure Active Directory error '(invalid_grant) AADSTS700082: The refresh token has expired due to inactivity. The token was issued on 2021-10-12T21:55:34.6692598Z and was inactive for 90.00:00:00

 

There are the permission under the Storage Account:

  • Cognitive Services OpenAI User
  • Cognitive Services User
  • Contributor
  • Log Analytics Contributor
  • Microsoft.Insights
  • Search Index Data Contributor
  • Search Index Data Reader
  • Storage Blob Data Contributor
  • Storage Blob Data Reader

 

These are the permission under the Search Service:

  • Cognitive Services OpenAI User
  • Cognitive Services User
  • Contributor
  • Log Analytics Contributor
  • Microsoft.Insights
  • Search Index Data Contributor
  • Search Index Data Reader
  • Search Service Contributor

 

 

 

 

Brass Contributor

@pablocastro  - I saw a question posed on github with similiar situation.  He added the --searchkey to the prepdocs.ps1

 

python ./scripts/prepdocs.py '.\data\*' --searchkey xxx -storageaccount  $env:AZURE_STORAGE_ACCOUNT  ....

and by doing so, the search index was created.  So I am getting a little bit further.  But now it's erroring on the processing the .pdf it looks like I also have to set the --storagekey .  I passed it in and everything is working finally.  For this demo, I can do this but in production, will look into the Key Vaults.  Now time to explore. Just in case others have my same issue!   

Brass Contributor

@Ernesto Cardenas I don't think my change to a lower Azure tier should become permanent as I am breaking specs by doing so.

 

You can check the changes on commit #1 on my fork at swarchitex/azure-search-openai-demo (github.com) but until you get it running 100% I wouldn't make those changes (b/c semantic search is being turned off).  

Brass Contributor

I have a new one :slightly_frowning_face: 

 

After recreating everything from scratch based on pablocastro's comment above, the index has been succesfully created.  

 

Now the script begins to upload the data (PDF files).  Looks to me like it fails while creating the HTTP request header (to the Storage service), specifically they key? Not related to the PDF files at all.

 

Ensuring search index gptkbindex exists
Search index gptkbindex already exists
Processing files...
Processing './data\a northwind traders business plan.pdf'
Traceback (most recent call last):
File "C:\Users\Andres\AppData\Local\Programs\Python\Python311\Lib\site-packages\azure\storage\blob\_shared\authentication.py",
line 119, in _add_authorization_header

signature = sign_string(self.account_key, string_to_sign)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "C:\Users\Andres\AppData\Local\Programs\Python\Python311\Lib\site-packages\azure\storage\blob\_shared\__init__.py",
line 47, in sign_string

key = decode_base64_to_bytes(key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "C:\Users\Andres\AppData\Local\Programs\Python\Python311\Lib\site-packages\azure\storage\blob\_shared\__init__.py",
line 37, in decode_base64_to_bytes

return base64.b64decode(data)
^^^^^^^^^^^^^^^^^^^^^^

File "C:\Users\Andres\AppData\Local\Programs\Python\Python311\Lib\base64.py", line 88, in b64decode
return binascii.a2b_base64(s, strict_mode=validate)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
binascii.Error: Incorrect padding

 

This cascades as shown below, but really the problem is on binascii.a2b_base64.  It is being called with strict=false already (tried hardcoding it to false, same thing). 

 

 

Spoiler
(continues from above)
binascii.Error: Incorrect padding
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Desarrollo\VSCode\ACS\T1\azure-search-openai-demo\scripts\prepdocs.py", line 223, in <module>
upload_blobs(pages)
File "C:\Desarrollo\VSCode\ACS\T1\azure-search-openai-demo\scripts\prepdocs.py", line 49, in upload_blobs
if not blob_container.exists():
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Andres\AppData\Local\Programs\Python\Python311\Lib\site-packages\azure\core\tracing\decorator.py", line 78, in wrapper_use_tracer
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Andres\AppData\Local\Programs\Python\Python311\Lib\site-packages\azure\storage\blob\_container_client.py", line 538, in exists
process_storage_error(error)
File "C:\Users\Andres\AppData\Local\Programs\Python\Python311\Lib\site-packages\azure\storage\blob\_shared\response_handlers.py", line 93, in process_storage_error
raise storage_error
File "C:\Users\Andres\AppData\Local\Programs\Python\Python311\Lib\site-packages\azure\storage\blob\_container_client.py", line 534, in exists
self._client.container.get_properties(**kwargs)
File "C:\Users\Andres\AppData\Local\Programs\Python\Python311\Lib\site-packages\azure\core\tracing\decorator.py", line 78, in wrapper_use_tracer
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Andres\AppData\Local\Programs\Python\Python311\Lib\site-packages\azure\storage\blob\_generated\operations\_container_operations.py", line 1055, in get_properties
pipeline_response = self._client._pipeline.run( # type: ignore # pylint: disable=protected-access
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Andres\AppData\Local\Programs\Python\Python311\Lib\site-packages\azure\core\pipeline\_base.py", line 205, in run
return first_node.send(pipeline_request) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Andres\AppData\Local\Programs\Python\Python311\Lib\site-packages\azure\core\pipeline\_base.py", line 69, in send
response = self.next.send(request)
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Andres\AppData\Local\Programs\Python\Python311\Lib\site-packages\azure\core\pipeline\_base.py", line 69, in send
response = self.next.send(request)
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Andres\AppData\Local\Programs\Python\Python311\Lib\site-packages\azure\core\pipeline\_base.py", line 69, in send
response = self.next.send(request)
^^^^^^^^^^^^^^^^^^^^^^^
[Previous line repeated 2 more times]
File "C:\Users\Andres\AppData\Local\Programs\Python\Python311\Lib\site-packages\azure\core\pipeline\policies\_redirect.py", line 160, in send
response = self.next.send(request)
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Andres\AppData\Local\Programs\Python\Python311\Lib\site-packages\azure\core\pipeline\_base.py", line 69, in send
response = self.next.send(request)
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Andres\AppData\Local\Programs\Python\Python311\Lib\site-packages\azure\storage\blob\_shared\policies.py", line 546, in send
raise err
File "C:\Users\Andres\AppData\Local\Programs\Python\Python311\Lib\site-packages\azure\storage\blob\_shared\policies.py", line 520, in send
response = self.next.send(request)
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Andres\AppData\Local\Programs\Python\Python311\Lib\site-packages\azure\core\pipeline\_base.py", line 69, in send
response = self.next.send(request)
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Andres\AppData\Local\Programs\Python\Python311\Lib\site-packages\azure\core\pipeline\_base.py", line 69, in send
response = self.next.send(request)
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Andres\AppData\Local\Programs\Python\Python311\Lib\site-packages\azure\core\pipeline\_base.py", line 67, in send
_await_result(self._policy.on_request, request)
File "C:\Users\Andres\AppData\Local\Programs\Python\Python311\Lib\site-packages\azure\core\pipeline\_tools.py", line 35, in await_result
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Andres\AppData\Local\Programs\Python\Python311\Lib\site-packages\azure\storage\blob\_shared\authentication.py", line 142, in on_request
self._add_authorization_header(request, string_to_sign)
File "C:\Users\Andres\AppData\Local\Programs\Python\Python311\Lib\site-packages\azure\storage\blob\_shared\authentication.py", line 125, in _add_authorization_header
raise _wrap_exception(ex, AzureSigningError)
azure.storage.blob._shared.authentication.AzureSigningError: Incorrect padding
PS C:\Desarrollo\VSCode\ACS\T1\azure-search-openai-demo>

 

Any help will be appreciated.

Copper Contributor

Hi @pablocastro great article. In your diagram you show an App UX component all the way to the left. I'm wondering if this Cognitive Search + Azure Open AI orchestration can be integrated with Bot Framework.

We are currently building a chatbot using Azure Bot Framework (DirectLine) based on Language Studio Knowledge Base / Cognitive Search and would like to integrate Open AI Service into the mix. For our current implementation we have a C# backend exported from Azure Web Bot and for frontend we're using Angular with Bot Framework V4 SDK. 

Is it possible to still rely on our Angular Bot Framework UI + C# Backend and Open AI service? 

Thanks!

Brass Contributor

@pablocastro Some of our files are in a Sharepoint document library and some will be in Azure Storage.  When creating a Sharepoint index, would I need to create the new index in the same Search Service?  Or do I add to the existing index  (gptkbindex) in the Search Service? 

Thanks again! It is working pretty well with our own data just by adding the pdf to the /data folder.  

Microsoft

@pablocastro ping me if you want support on setting this up as a .Net / C# project. I'd be happy to help :) 

Copper Contributor

@pablocastro great article and sample code. I'm wondering how to get around the prompt size limitation. I know the size will increase significantly with chatgpt4, but it's still limited, i.e. if the enterprise has 10 years worth of data in some storage, it's not feasible to feed it to the model with the prompt. The ideas you pointed out could help, but they still somehow might not represent the data in its entirety.

 

As an alternative, if absolute real-time isn't necessary, I was wondering if a scheduled training of the model would work. For example, export the enterprise data every night in the format that's used for training and train the model anew. This way there would be a one day delay in regards to the knowledge the model has, but it would know basically everything (and as much) as we want. 

 

Thoughts on this approach? If you think it's feasible, is there a way to incrementally train the model, e.g. train once with the data of the past 10 years, then every day just the new data from yesterday in addition to what it already knows?

Copper Contributor

Following up on my previous question: it appears that the gpt35-turbo model and indeed the davinci model doesn't support training with custom data right now. The best model that can be customized is curie and it won't work properly in a chat-like scenario. @pablocastro are there any plans to allow customization/training of the gpt model(s)?

Microsoft

@aymiee glad to see you got everything working. As for this question:

 

When creating a Sharepoint index, would I need to create the new index in the same Search Service?  Or do I add to the existing index  (gptkbindex) in the Search Service? 

It depends on your scenario. If you want to always ask questions across the content coming from all these sources, and the schema reasonably aligns (maybe with just a few fields specific to each source), it might be a good idea to put everything in a single index. You could keep a filterable field that has the name of the original source, so you can filter which source you want data from, or drop the filter in the query and search across all sources.

 

I would only do separate indexes if the schema was completely different, or if you want to make sure you never mix content in search results from different sources.

Microsoft

@Andrés S first time I see this padding error. Are you running on Windows, Linux or MacOS? Which version of Python?

Microsoft

@Adolfo Perez you can definitely front this with a Bot interface, and you could use all C# in the backend. We used Python in the initial app just as an example. For simple orchestration cases you can just call Azure OpenAI directly, sort of the equivalent of the retrieve-the-read approach you'll see in the sample code. If you're looking for a more sophisticated framework that can do LLM orchestration and that's in C#, check out microsoft/semantic-kernel: Integrate cutting-edge LLM technology quickly and easily into your apps (...

Microsoft

@adrianhara the main point of the Retrieval-Agumented Generation pattern discussed in this post and implemented in the linked sample is to work around the context length limits. Instead of fine-tuning models, we combine the model with retrieval-augmentation, where we pull a tiny subset of the knowledge using a retriever, and then only feed that into the prompt. This allows us to have an arbitrarily large knowledge base and still use the model to answer questions. Of course the quality of answer now also depends on the quality of the retriever and its ranking steps. I don't think fine-tuning is a practical approach to this since data changes often and you'll want to see changes quickly, and it would have hard to enforce other constraints such as not everyone being allowed to see all the documents (i.e. doc-level access control). Fine-tuning is useful in different scenarios such as when you want to teach the model certain interaction patterns or want to specialize on a very specific domain. 

Copper Contributor

@pablocastro thanks for the answer. This "want to specialize on a very specific domain" is exactly what we are looking for. As mentioned, if the enterprise data goes 10 years back, even with the best quality retriever we won't be able to feed any significant percentage of that data into the model.

 

You are right about the security constraints, but in our case we don't have them, the stored knowledge is available to everyone. Currently people can search for this knowledge using "classical" methods, it would be amazing to be able to offer them the possibility to search using natural language and chatgpt.

 

Also, real-time or near real-time answers aren't an issue in our case. It's good enough if we can somehow train the model once a day or even once every few days. However at the moment it's not possible to train gpt35 or gpt4 at all. Is this something you are looking at to add as a possibility in the near future?

Hello guys, since I noticed that the GitHub code has been updated I tried to deploy again (deleting my old deployments), everything looked fine when running, but I got the same error: 

 

Error: () The index 'gptkbindex' for service 'gptkb-fpdd6ep5smnfs' was not found. Code: Message: The index 'gptkbindex' for service 'gptkb-fpdd6ep5smnfs' was not found.

 

It would be helpful to update the instructions (in the GitHub) explaining how to solve this situation: details of how to ingest the docs AFTER the initial deployment if something (as in my case) something went wrong.

 

Also, the instructions rely on azd, but I believe that is needed to automate the deployment using Azure DevOps or GitHub Actions.

Copper Contributor

@Ernesto Cardenas  I am stuck in the same situation. Let me know if you have found the solution and I will do the same! 


Brass Contributor

@pablocastro Couple of questions:

1) I am noticing that sometimes the response includes the citation with a link to the source content and it is also stated in the response but the actual superscript and links at the bottom are missing.  Is this a bug?

 

2) I viewed the "though process" for one of our question.  The "Sources" included 3 chunks of data and it's total was roughly 606 tokens. Do "Sources" contribute to the token calculation? The sources might exceed 4000 tokens.  How can we avoid this?

 

Strangely, just two of the chunks/sections mentioned in the PDF contained the solution to the query, while the inclusion of the third chunk/section seemed unnecessary, potentially conserving tokens. How can we avoid such a situation?  

Microsoft

@pablocastro, Azd way of installing the project is not working for me, since for some reason Azd cannot find the Python, even though both 32 and 64 versions are installed in Windows. Any alternate you can recommend so I can test the application?

Brass Contributor

@pablocastro I also noticed that sometimes, it answers the question incorrectly  (i.e. "I don't have that information...") but yet displays the citations. 

Today it took awhile to come back with a response that was easily addressed earlier.

Error: The operation was timeout. { "error": { "code": "Timeout", "message": "The operation was timeout." } } 408 {'error': {'code': 'Timeout', 'message': 'The operation was timeout.'}} {'Content-Length': '75', 'Content-Type': 'application/json', 'apim-request-id': '####', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'x-content-type-options': 'nosniff', 'x-ms-region': 'South Central US', 'Date': 'Mon, 27 Mar 2023 18:41:33 GMT'}

 

 

Microsoft

@Ernesto Cardenas @YashoaSingh that missing index error often happens then the prepdocs.py script failed to run (that script creates the index and preprocesses/uploads the data). You can run it manually (prepdocs.ps1 will pick up your environment details), but first you'll want to make sure your account has the blob contributor and search contributor roles for storage/search services respectively.

Microsoft

@aymiee 

 

>>1) I am noticing that sometimes the response includes the citation with a link to the source content and it is also stated in the response but the actual superscript and links at the bottom are missing.  Is this a bug?

 

Maybe the generated reference had the wrong format? If you have an example (both UX and the output from "Thought process") I'm happy to take a look.

 

>> 2) I viewed the "though process" for one of our question.  The "Sources" included 3 chunks of data and it's total was roughly 606 tokens. Do "Sources" contribute to the token calculation? The sources might exceed 4000 tokens.  How can we avoid this?

 

Yes, sources contribute to the token count since they end up being part of the prompt. The retriever is good enough to find candidates but ultimately we need GPT to tell which are usable documents, so until we send it to GPT we don't know which ones we need, which means we do need to use a few of the documents. We chunk documents to a relatively small size to minimize the chances of exceeding the context lenght limit.

Microsoft

@Meer Alam 

 

>>even though both 32 and 64 versions are installed in Windows

 

Did you check that Python is on the path? The Python installer for Windows doesn't add it to the path by default. 

Microsoft

@aymiee 

>>I also noticed that sometimes, it answers the question incorrectly  (i.e. "I don't have that information...") but yet displays the citations. 

 

Citations show that the retriever found some candidate documents that are related, but even after reading through those  GPT wasn't able to find an answer. If you want to avoid this perhaps you can instruct the model (in the prompt) not to produce citations when it doens't know.

@pablocastro  I gave myself the following permissions:

 

Search Index Data Contributor

Storage Blob Data Contributor (but this was already inherited from the Resource Group)

 

But when running the script I got the following messages:

 

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\ecard\openaidemo03\scripts\.venv\Lib\site-packages\azure\core\tracing\decorator.py", line 78, in wrapper_use_tracer
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\ecard\openaidemo03\scripts\.venv\Lib\site-packages\azure\search\documents\indexes\_generated\operations\_indexes_operations.py", line 413, in create
raise HttpResponseError(response=response, model=error)
azure.core.exceptions.HttpResponseError: () Authorization failed.
Code:
Message: Authorization failed.

 

The point is that the messages are not clear regarding which permissions were the wrong ones.

 

Thanks in advance

Microsoft

@Ernesto Cardenas that's odd, is the Search index Data Contributor applied to the Cognitive Search service that's provisioned as part of this sample? That's the only permission needed on the search side.

 

Alternatively, you can run the prepdocs script with --searchkey <apikey>, and use a key obtained from the Azure portal for the search service.

@pablocastro no, no, what is part of the sample is "Storage Blob Data Contributor" over the Storage Account. 

 

I ran using --searchkey, same error message. I'll delete everything again and try again, but as I mention this workaround must be included in the GitHub page to make the deployment straightforward.

Microsoft

@Ernesto Cardenas the sample assigns a number of permissions to different principals. For search, it assigns search index contributor to the logged-in user, you can see it here: https://github.com/Azure-Samples/azure-search-openai-demo/blob/fc78449194f6f9ba48631591cc2e1b5e1dc81...

 

Once we understand what went wrong, we can document it, I'm just not sure what failed yet. The --searchkey option makes the script completely not dependent on the AAD identity for creating the search index and indexing documents, it's surprising that it resulted in the same error. Did you pass --searchkey to prepdocs.py directly?

Copper Contributor

We get the API timeout error "The operation was timeout". It's over 120 seconds. The same request works fine. How can we troubleshoot this?

 

Copper Contributor

@pablocastro @BenPatt - I am also trying to get this done in C# and it is quite tiresome converting the Python code. Do you guys have an sample demo of this use case I can use for C# as a base.

@pablocastro Same error even with searchkey as parameter, these samples have been tested from Windows environments?.

@pablocastro now it seems to work, I had to edit the ps1 file and put the admin key, not the search one, I'll continue with it

Co-Authors
Version history
Last update:
‎Mar 08 2023 12:07 PM
Updated by: