Revolutionize your Enterprise Data with ChatGPT: Next-gen Apps w/ Azure OpenAI and Cognitive Search

Copper Contributor

May 11, 2023

Hi pablocastro this is an inspiring demo. Like WJK-DEV, I would like to use URLs as datasources, specifically from the company's internal Confluence space, but also other enterprise sources like Sharepoint. As I understand it, I would need to

regularly crawl those pages that are accessible to everyone.
cut them up into smaller chunks, similar to what prepdocs.py does.
store those chunks in Azure with a field equivalent to `sourcepage` that links to the Confluence article.
Create an index on those confluence chunks (or use an suitable existing index).
The demo app's ask and chat functionality should more or less work out of the box, but I'll need to make citations open external links.

Is the above correct?

I have a few other questions:

How can I enable access-based search? i.e., users A and B have access to different sets of documents, so the search should retrieve results from those respective sets.
Can you refer me to any sample code that regularly crawls external data sources and cuts them into chunks like your demo app does?
Would it be practical to store the document chunks in a non-Azure data store, such as s3, or is Azure Blob Storage adding some magic here?

Blog Post

Revolutionize your Enterprise Data with ChatGPT: Next-gen Apps w/ Azure OpenAI and Cognitive Search