Startups at Microsoft

8 MIN READ

Build a chatbot to query your documentation using Langchain and Azure OpenAI

Microsoft

May 30, 2023

In this article, I will introduce LangChain and explore its capabilities by building a simple question-answering app querying a pdf that is part of Azure Functions Documentation.

Langchain

Harrison Chase's LangChain is a powerful Python library that simplifies the process of building NLP applications using large language models. Its primary goal is to create intelligent agents that can understand and execute human language instructions. With LangChain, you can connect to a variety of data and computation sources and build applications that perform NLP tasks on domain-specific data sources, private repositories, and more.

As of May 2023, the LangChain GitHub repository has garnered over 42,000 stars and has received contributions from more than 270 developers worldwide.

The langchain library is comprised of different modules:

LLMs and Prompts

This includes prompt management, prompt optimization, a generic interface for all LLMs, and common utilities for working with LLMs like Azure OpenAI. It supports a variety of LLMs, including OpenAI, LLama, and GPT4All.

Chains

Chains in LangChain involve sequences of calls that can be chained together to perform specific tasks. For instance, you may need to retrieve data from a particular URL, summarize the returned text, and answer questions using the generated summary. Chains can also be simple, such as reading user input, constructing a prompt, and generating a response.

Data Augmented Generation

Data Augmented Generation involves specific types of chains that first interact with an external data source to fetch data for use in the generation step. Examples include summarization of long pieces of text and question/answering over specific data sources. LangChain’s Document Loaders and Utils modules facilitate connecting to sources of data and computation. If you have a mix of text files, PDF documents, HTML web pages, etc, you can use the document loaders in Langchain.

Agents

Agents involve an LLM making decisions about which actions to take, taking that action, seeing an observation, and repeating that until done.

As we explained before, chains can help chain together a sequence of LLM calls. In some tasks, however, the sequence of calls is often not deterministic and the next step will depend on the user input and the response in the previous steps.

“Agents” can take actions based on inputs along the way instead of a hardcoded deterministic sequence.

Memory

Memory refers to persisting state using VectorStores. Vector databases are optimized for doing quick searches in high dimensional spaces. LangChain makes this effortless.

Embeddings

An embedding is a mapping of a discrete, categorical variable to a vector of continuous numbers. In the context of neural networks, embeddings are low-dimensional, learned continuous vector representations of discrete variables. Neural network embeddings are useful because they can reduce the dimensionality of categorical variables and meaningfully represent categories in the transformed space.

Neural network embeddings have 3 primary purposes:

Finding nearest neighbours in the embedding space. These can be used to make recommendations based on user interests or cluster categories.
As input to a machine learning model for a supervised task.
For visualization of concepts and relations between categories.

Example of clustering of vector values for sentences

Vector Stores or Vector Databases

A vector database is a specialized type of database that stores data as high-dimensional vectors. These vectors are mathematical representations of the features or attributes of the data being stored. The number of dimensions in each vector can vary widely, ranging from tens to thousands, depending on the complexity and granularity of the data. In this article, we will explore the concept of vector databases and their applications in various fields.

Let’s build the Application

Let’s build a tool that can read developers documentation – in this case Azure Functions Documentation as PDF.

Then answer arbitrary questions by referencing the documentation text.

We will follow these steps:

One time procedure:

Index the pdf document (azure functions documentation), split the document into chunks, indexing all of the text creating embeddings.
Store all of the embeddings in a vector store (Faiss in our case) which can be searched in the application.

The application:

When a user asks a question, we will use the FAISS vector index to find the closest matching text.
Feed that into GPT-3.5 as context in the prompt
GPT-3.5 will generate an answer that accurately answers the question.

Steps

Download the Documents to search. In our case we can download Azure functions documentation from here and save it in data/documentation folder.
In Azure OpenAI deploy
- Ada
- Gpt35

Get Azure OpenAI endpoint and key and add it to a file called .env as follows:

OPENAI_DEPLOYMENT_ENDPOINT ="<your openai endpoint>" 
OPENAI_API_KEY = "<your openai api key>"
OPENAI_DEPLOYMENT_NAME = "<your gpt35 deployment name>"
OPENAI_DEPLOYMENT_VERSION = "<gpt35 api version>"
OPENAI_MODEL_NAME="<gpt35 model name>"

OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME = "<your text embedding ada deployment name>"
OPENAI_ADA_EMBEDDING_MODEL_NAME = "<your text embedding ada model name>"

Creating the embeddings

The flow of app_indexer.py is:

Load the PDF
Split up all of the text into chunks.
Send those chunks to the OpenAI Embeddings API, which returns a 1536 dimensional vector for each chunk.
Index all of the vectors into a FAISS index.
Save the FAISS index to a .faiss and .pkl file.

Note: As you probably know, LLMs cannot accept long instructions since there is a token limitation, so we will be splitting the document into chunks, see below.

Running this code takes time since we need to read and split the whole document and send the chunks to Ada model to get the embeddings.

Here is the code for app_indexer.py

from langchain.document_loaders import PyPDFLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from dotenv import load_dotenv
import openai
import os

#load environment variables
load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") 
OPENAI_DEPLOYMENT_ENDPOINT = os.getenv("OPENAI_DEPLOYMENT_ENDPOINT")
OPENAI_DEPLOYMENT_NAME = os.getenv("OPENAI_DEPLOYMENT_NAME")
OPENAI_MODEL_NAME = os.getenv("OPENAI_MODEL_NAME")
OPENAI_DEPLOYMENT_VERSION = os.getenv("OPENAI_DEPLOYMENT_VERSION")

OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME = os.getenv("OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME")
OPENAI_ADA_EMBEDDING_MODEL_NAME = os.getenv("OPENAI_ADA_EMBEDDING_MODEL_NAME")

#init Azure OpenAI
openai.api_type = "azure"
openai.api_version = OPENAI_DEPLOYMENT_VERSION
openai.api_base = OPENAI_DEPLOYMENT_ENDPOINT
openai.api_key = OPENAI_API_KEY

if __name__ == "__main__":
    embeddings=OpenAIEmbeddings(deployment=OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME,
                                model=OPENAI_ADA_EMBEDDING_MODEL_NAME,
                                openai_api_base=OPENAI_DEPLOYMENT_ENDPOINT,
                                openai_api_type="azure",
                                chunk_size=1)
    dataPath = "./data/documentation/"
    fileName = dataPath + "azure-azure-functions.pdf"

    #use langchain PDF loader
    loader = PyPDFLoader(fileName)

    #split the document into chunks
    pages = loader.load_and_split()

    #Use Langchain to create the embeddings using text-embedding-ada-002
    db = FAISS.from_documents(documents=pages, embedding=embeddings)

    #save the embeddings into FAISS vector store
    db.save_local("./dbs/documentation/faiss_index")

Creating the Application

The flow of app_chatbot.py works something like:

FAISS index is loaded into RAM
User asks a question
User's question is sent to the OpenAI Embeddings API, which returns a 1536 dimensional vector.
The FAISS index is queried for the closest matching vector.
The closest matching vector is returned, along with the text that it was generated from.
The returned text is fed into GPT-35 as context in a GPT-35 prompt
GPT-35 generates a response, which is returned to the user.

Note: What is important to note here is that Langchain does most of the heavy lifting for us and this happens behind the scenes.

Here is the code for app_chatbot.py

from dotenv import load_dotenv
import os
import openai
from langchain.chains import RetrievalQA
from langchain.vectorstores import FAISS
from langchain.chains.question_answering import load_qa_chain
from langchain.chat_models import AzureChatOpenAI
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import ConversationalRetrievalChain
from langchain.prompts import PromptTemplate

#load environment variables
load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
OPENAI_DEPLOYMENT_ENDPOINT = os.getenv("OPENAI_DEPLOYMENT_ENDPOINT")
OPENAI_DEPLOYMENT_NAME = os.getenv("OPENAI_DEPLOYMENT_NAME")
OPENAI_MODEL_NAME = os.getenv("OPENAI_MODEL_NAME")
OPENAI_DEPLOYMENT_VERSION = os.getenv("OPENAI_DEPLOYMENT_VERSION")

OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME = os.getenv("OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME")
OPENAI_ADA_EMBEDDING_MODEL_NAME = os.getenv("OPENAI_ADA_EMBEDDING_MODEL_NAME")



def ask_question(qa, question):
    result = qa({"query": question})
    print("Question:", question)
    print("Answer:", result["result"])


def ask_question_with_context(qa, question, chat_history):
    query = "what is Azure OpenAI Service?"
    result = qa({"question": question, "chat_history": chat_history})
    print("answer:", result["answer"])
    chat_history = [(query, result["answer"])]
    return chat_history


if __name__ == "__main__":
    # Configure OpenAI API
    openai.api_type = "azure"
    openai.api_base = os.getenv('OPENAI_API_BASE')
    openai.api_key = os.getenv("OPENAI_API_KEY")
    openai.api_version = os.getenv('OPENAI_API_VERSION')
    llm = AzureChatOpenAI(deployment_name=OPENAI_DEPLOYMENT_NAME,
                      model_name=OPENAI_MODEL_NAME,
                      openai_api_base=OPENAI_DEPLOYMENT_ENDPOINT,
                      openai_api_version=OPENAI_DEPLOYMENT_VERSION,
                      openai_api_key=OPENAI_API_KEY,
                      openai_api_type="azure")
    
    embeddings=OpenAIEmbeddings(deployment=OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME,
                                model=OPENAI_ADA_EMBEDDING_MODEL_NAME,
                                openai_api_base=OPENAI_DEPLOYMENT_ENDPOINT,
                                openai_api_type="azure",
                                chunk_size=1)


    # Initialize gpt-35-turbo and our embedding model
    #load the faiss vector store we saved into memory
    vectorStore = FAISS.load_local("./dbs/documentation/faiss_index", embeddings)

    #use the faiss vector store we saved to search the local document
    retriever = vectorStore.as_retriever(search_type="similarity", search_kwargs={"k":2})

    QUESTION_PROMPT = PromptTemplate.from_template("""Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.

    Chat History:
    {chat_history}
    Follow Up Input: {question}
    Standalone question:""")

    qa = ConversationalRetrievalChain.from_llm(llm=llm,
                                            retriever=retriever,
                                            condense_question_prompt=QUESTION_PROMPT,
                                            return_source_documents=True,
                                            verbose=False)


    chat_history = []
    while True:
        query = input('you: ')
        if query == 'q':
            break
        chat_history = ask_question_with_context(qa, query, chat_history)

Now we can run app_chatbot.py and start asking questions:

you: what are azure functions?
answer:  Azure Functions is a cloud service available on-demand that provides all the continually updated infrastructure and resources needed to run your applications. You focus on the code that matters most to you, in the most productive language for you, and Functions handles the rest. Functions provides serverless compute for Azure. You can use Functions to build web APIs, respond to database changes, process IoT streams, manage message queues, and more.

you: can I use events hub as a trigger for an azure function?
answer:  Yes, you can use events hub as a trigger for an Azure Function.
`Azure Functions supports trigger and output bindings for Event Hubs. Use the function trigger to respond to an event sent to an event hub event stream. You must have read access to the underlying event hub to set up the trigger. When the function is triggered, the message passed to the function is typed as a string.`<|im_end|>

you: can I deploy azure functions in multi-region?
answer:  Yes, you can deploy Azure Functions in multi-region. There are two patterns to consider: Active/Active which is used for HTTP trigger functions and Active/Passive which is used for event-driven, non-HTTP triggered functions. Azure Front Door needs to be used to coordinate requests between both regions when using the active/active pattern for HTTP trigger functions. When using the active/passive pattern, the second region is activated when failover is required and takes over processing. To learn more about multi-region deployments, see the guidance in Highly available multi-region web application.<|im_end|>

A full functioning example of this can be found in my github repo:

The indexer:

azure-data-and-ai-examples/openai/app_indexer.py at master · denisa-ms/azure-data-and-ai-examples (github.com)

The chatbot:

azure-data-and-ai-examples/openai/app_chatbot.py at master · denisa-ms/azure-data-and-ai-examples (github.com)

Thanks for reading, hope you enjoyed it.

Denise

Updated Jul 30, 2023

Version 6.0

Denise_Schlesinger

Microsoft

Joined January 28, 2023