Add a context-grounded AI chatbot to your Azure Static Web Apps with streaming responses

thomasgauvinmsft · ‎Mar 26 2024

Azure Static Web Apps is the best service to host websites with a lot of static content, such as documentation, marketing, or educational material. But sometimes, these websites are quite large, and with recent developments in large-language models, we can provide our users AI-generated answers to their questions within seconds, grounded in the content of our website.

In this article, we demonstrate how we can add a chatbot to allow our users to browse the content of our website quickly and easily with the use of a chatbot grounded in the content of our website. We’ll follow the RAG pattern (retrieval augmented generation), to ground our responses in the static content of our website, and use OpenAI’s APIs to provide responses. We’ll also feature Azure Functions’ Node.js v4 programming model with streaming support to stream the responses from the API.

If you’re looking to jump right into the code, it’s available in this GitHub repository, and a live demo is available here: https://black-rock-0fa0bcd1e.5.azurestaticapps.net/

How it works

We first start off with a static website, in this case a Next.js static export site pulling content from a WordPress instance, and deployed to Static Web Apps. This starting point is the website created in Integrating WordPress on App Service with Azure Static Web Apps - Microsoft Community Hub, but this could also work for any website made up of static HTML files. With a static website deployed to Static Web Apps, we can build out the architecture that will enable our chatbot experience.

Architecture diagram of the RAG chatbot with Static Web Apps

To setup the required architecture, we’ll use our static HTML files as the grounding material for our AI chatbot. To do so, we’ll store these files within an Azure Storage account, and directly index them with AI Search’s Azure Storage integration. This is the fastest way to get an index of documents, which will provide us with an API to query with a user question and retrieve the most relevant chunks of documents. This could be further automated such that our CI/CD task outputs our built HTML files to the storage account and trigger the indexing of our documents by AI Search, but we’ll keep this step manual for now.

Then, we’ll add an API to our Static Web App using Static Web Apps' managed functions. We’ll call this API from the chatbot UI we’ll be building into our application. This API will provide an endpoint that does the following:

The API will first invoke the AI Search service to get the 3 most relevant sources for the users’ query. We’ll now use the chunks of these 3 most relevant sources to generate a response.
Then, we invoke the OpenAI API with the user’s question, along with the 3 most relevant chunks from our static pages. This will enable us to provide a customized and highly specific answer from the OpenAI API. This ensures that the response in grounded in the content of the static pages of our website, which will be returned within our chatbot interface.

Finally, our chatbot will provide an input for our user’s questions and show the AI-generated responses.

Prerequisites

To follow along with this article, you will need Node.js, the SWA CLI, the Functions core tools, an Azure subscription, OpenAI keys (in addition to Azure OpenAI access), and a static content site.

Setting up our AI Search Index with Azure OpenAI

We’ll start by creating a new Azure Storage account. From the Portal, we can then create a new container `website-content` and upload all the HTML files of our statically generated website. In this case, our Next.js project is configured to statically generate the website and output folder `out`. We can then upload the `out` folder to out storage account via the storage browser of the Azure Portal.

Add HTML output files to storage account

With our HTML files now in our storage account and ready to be ingested, we’ll start creating an index for our documents. First, as a pre-requisite to our Azure AI Search resource, we’ll create a new Azure OpenAI resource, and create a deployment for the text-embedding-ada-002 model. This will be used by Azure AI Search for indexing. Then, we can create an Azure AI Search resource. Once the AI Search resource is created, we can use the quick option to Import and Vectorize data from the AI Search Service overview. In this step, we’ll configure our AI Search quickstart to use the deployment of our Azure OpenAI text embedding model.

Select Import and Vectorize data quickstart from the Azure AI Search resource

Select the storage account containing the HTML files to index

Select the Azure OpenAI model deployment to vectorize your content

This quickstart will setup an index, an indexer and a data source on our behalf. For instance, clicking into the index, we can see that we can make searches from the Azure Portal to see what the results will look like.

We can query the index to retrieve the most relevant documents for the query

We can now invoke this resource via API or via the AI Search SDK by specifying our AI Search index, which we’ll do in the next step to provide grounding context for our LLM-generated response as per the RAG pattern.

Create the API for our chatbot

With our AI Search properly configured, we can create a managed functions API for our Static Web Apps. We can create a Node.js v4 Azure Functions using the quickstart from the Azure Functions docs. Essentially, we’ll create an Azure Functions project with a single Function with the following commands:

mkdir api
cd api
func init –javascript
func new --name chat --template "HTTP trigger" --authlevel "anonymous" 
func start

We now have our Azure Functions development server running. We’ll make sure to provide the correct environment variables that we’ll be requiring in the code, so in /api/local.settings.json, add the following API keys from the required services.

{
  "IsEncrypted": false,
  "Values": {
    "FUNCTIONS_WORKER_RUNTIME": "node",
    "AzureWebJobsFeatureFlags": "EnableWorkerIndexing",
    "AzureWebJobsStorage": "UseDevelopmentStorage=true",
    "AI_SEARCH_KEY": "<ENTER AI SEARCH KEY HERE>",
    "AI_SEARCH_ENDPOINT": "<ENTER AI SEARCH ENDPOINT HERE>",
    "AI_SEARCH_INDEX": "<ENTER AI SEARCH INDEX ID HERE>",
    "OPENAI_KEY":"<ENTER KEY HERE>",
  }
}

Within this newly created chat file, we’ll include the following contents in order to provide a /api/chat GET endpoint that fetches the most relevant documents to answer the users’ question, and then include these questions when invoking OpenAI to obtained the natural language chat response.

const { app } = require('@azure/functions');
const { SearchClient, AzureKeyCredential } = require("@azure/search-documents");
const { OpenAIClient } = require("@azure/openai");
const OpenAI = require("openai")

app.setup({ enableHttpStream: true });

app.http('chat', {
    methods: ['GET'],
    authLevel: 'anonymous',
    handler: async (request, context) => {
        context.log(`Http function processed request for url "${request.url}"`);
        
        const question = request.query.get('question');
        const topNDocs = await retrieveTopNDocuments(question);
        const streamingOpenAIResponse = await getChatResponseOpenAI(question, topNDocs);
    
        return {
            body: convertOpenAIStreamToExtractedContentStream(streamingOpenAIResponse),
            headers: {
                'Content-Type': 'text/event-stream'
            }
        }
    }
});

async function retrieveTopNDocuments(query, n = 3){
    const client = new SearchClient(process.env["AI_SEARCH_ENDPOINT"], process.env["AI_SEARCH_INDEX"], new AzureKeyCredential(process.env["AI_SEARCH_KEY"]));

    const searchResults = await client.search(query, {
        top: 3,
        select:  ["chunk", "title"]
    });

    const resultsArray = [];
    for await (const result of searchResults.results) {
        resultsArray.push(result)
    }

    return resultsArray;
}

async function getChatResponseOpenAI(query, topNDocs){
    //adapted from https://github.com/Azure-Samples/azure-search-openai-javascript

    const openai = new OpenAI({
        apiKey: process.env["OPENAI_KEY"]
      });

    const SYSTEM_CHAT_TEMPLATE = `//VIEW PROMPT IN SOURCE CODE, OMITTED DUE TO LENGTH `;

    const SAMPLE_QUESTION =`//VIEW SAMPLE IN SOURCE CODE, OMITTED DUE TO LENGTH`;
    const SAMPLE_ANSWER = `//VIEW SAMPLE IN SOURCE CODE, OMITTED DUE TO LENGTH`;

    const QUESTION = `${query}\nSources:${topNDocsToString(topNDocs)}`;

    const streamingChatCompletion = await openai.beta.chat.completions.stream({
        messages: [{ role: 'system', content: `${SYSTEM_CHAT_TEMPLATE} ${SAMPLE_QUESTION} ${SAMPLE_ANSWER}`},
            {role: 'user', content: QUESTION}
        ],
        model: 'gpt-3.5-turbo',
        stream: true
    });
    return streamingChatCompletion;
}

Taking a look at the handler for the /api/chat endpoint, we can see a simple flow. First, we are retrieving the 3 most relevant documents for our questions with retrieveTopNDocuments. Then, we are obtaining the OpenAI natural language response to the question in getChatResponseOpenAI, passing along the relevant documents for the OpenAI API to provide a response grounded in the provided context, following the RAG pattern. Finally, we return the stream from the API, noting the headers as ‘text/event-stream’ such that this is processed as a stream by Static Web Apps. Helper functions topNDocsToString and convertOpenAIStreamToExtractedContentStream are omitted due to length, but are available in thomasgauvin/swa-with-rag-chat-functions: This is a sample Static Web Apps site with a retrieval aug....

Now, we can access http://localhost:7071/api/chat?question=what+does+snippets+do and we can see a streaming response from our Azure Functions API.

Gif of our AI-generated API response being streamed

With our API created to provide a context-grounded, streamed response for a question, we can now call this API from our frontend code using JavaScript.

Call the API from our frontend client code

We can call this API via any JavaScript code in our frontend client code. Since our site has been created with Next.js, we’ll create a React chatbot component that fetches the response from the API we’ve created above. Note that this could be accomplished with JavaScript without additional libraries as well.

Before diving into the code, we’ll setup our Next.js server and Functions API to be setup as they would be in Azure Static Web Apps. With the Next.js development server started (npm run dev from /nextjs-site) and the Functions development server started (func start from /api), you can run the following command to setup local development for your Static Web Apps setup:

swa start --api-devserver-url http://localhost:7071 --app-devserver-url http://localhost:3000

Now, you can access http://localhost:4280/ and see your frontend application, along with proxying of route /api to your Azure Functions development server, as it would be deployed to Static Web Apps.

Within our Next.js code, we’ll create a new chatbot component, /nextjs-site/app/chatbot.tsx, and add the following content:

"use client";
import React, { FormEvent, FormEventHandler, useEffect } from "react";

export const Chatbot = () => {
  const [message, setMessage] = React.useState("");
  const [chatHistory, setChatHistory] = React.useState([]);
  const [loading, setLoading] = React.useState(false);
  const [suggestedQuestions] = React.useState([
    "What is snippets?",
    "What are community standups?",
  ]);
  const [chatOpen, setChatOpen] = React.useState(false);
  const [streamedResponse, setStreamedResponse] = React.useState('');

  const submitQuestion = async (question?: string) => {
    let questionToSubmit = message;
    if (question) {
      questionToSubmit = question;
    }

    setLoading(true);

    // Send API call to /api/chat with the question
    const response = await fetch(
      `/api/chat?question=${encodeURIComponent(questionToSubmit)}`
    );
    const body = await response.body;
    const reader = body?.pipeThrough(new TextDecoderStream()).getReader();

    // -ignore
    setChatHistory((prev) => [
      ...prev,
      { user: questionToSubmit },
    ]);

    let result = '';
    while(true){
      const { value, done } = await reader!.read();
      if (done) {
        break;
      }
      result += value;
      setStreamedResponse((prev) => prev + value);
    }
    
    setStreamedResponse('');

    // Add user message and assistant response to chat history
    // -ignore
    setChatHistory((prev) => [
      ...prev,
      { assistant: result },
    ]);
  
    setMessage("");
    setLoading(false);

    // Scroll to the bottom of the chat
    scrollToBottom();
  };

  const handleSubmit = async (e: FormEvent<HTMLFormElement>) => {
    e.preventDefault();
    if (!message.trim()) return;

    //@ts-expect-error
    submitQuestion(null);
  };

  return (
    <div>...</div>
  );
};

The above code snippet is responsible for calling our API and properly handling the state of our component. We’ve removed additional helpers to focus on the submitQuestion function, but these helpers are available in thomasgauvin/swa-with-rag-chat-functions: This is a sample Static Web Apps site with a retrieval aug....

Diving into the submitQuestion function, we can see that it takes a question from the function arguments or from the component state and makes a fetch request to our API, encoding the question as a query parameter. Then, it retrieves the stream response from the response body and sets the streamResponse state variable to contain the streamed response from the API (streamResponse will be rendered in our component to show the response as it is generated by the OpenAI API). The function also sets the state of the chat history throughout.

The full code for the chatbot component is included in the source code accompanying this blog post, and I highly recommend checking it out if you are interested in the details of how this component is rendered. For instance, in the following JSX, we render the chat history and show the streamedResponse for the last entry in the chat history.

[…]

export const Chatbot = () => {
[…]
  return (
	[…]
              {chatHistory.map((msg, index) => {
                const { text, source } = extractTextAndSource(msg["assistant"]);

                return (
                  <div key={index} className="mb-2">
                    {msg["user"] &&
                      <p className="text-gray-600">
                        <strong>You:</strong> {msg["user"]}
                      </p>
                    }
                    {
                      (index == chatHistory.length - 1) &&  streamedResponse &&
                      <p className="text-gray-600">
                        <strong>AI:</strong> {streamedResponse}
                      </p>
                    }
                    {msg["assistant"] &&
                      <p className="text-gray-600">
                        <strong>AI:</strong> {text}
                        <span className="text-sm underline text-blue-500">
                          {source}
                        </span>
                      </p>
                    }
                  </div>
                );
              })}
	[…]
	)
}

We can now import the chatbot component into the page of our website where we want our chatbot to exist. We now successfully have added a chatbot to our website, which provides streamed, natural responses based on the HTML content of our website!

Gif of the chatbot providing streamed context-grounded AI generated responses to questions

Deploy to Azure Static Web Apps

The last step will be to deploy this to Azure Static Web Apps. We’ll start by creating a GitHub repository for our project which contains our Next.js/static site within the nextjs-site folder, and our Functions project contained within the api folder (take a look at the sample code for reference).

Once the GitHub repository has been created, we can navigate to the Azure Portal to create a Static Web Apps resource. We’ll indicate a Custom deployment, with configuration an app location of nextjs-site, an api location of api and an output location of out. These are the values that we have to indicate based on the application code in the sample repository, but if you’ve changed your project configuration, these may be different (refer to SWA docs for build configuration). This will create the necessary GitHub Actions to deploy your project.

Finally, we’ll set environment variables for our Azure Static Web Apps resource. From the Static Web Apps resource in the Azure Portal, navigate to the environment variables tab. Then set the values for AI_SEARCH_KEY, AI_SEARCH_ENDPOINT, AI_SEARCH_INDEX, and OPENAI_KEY.

Accessing our deployed Static Web Apps, we can now see our streamed natural language responses again, now deployed and accessible to our site visitors!

Conclusion

This article demonstrated how we can use recent developments in Static Web Apps to build AI experiences into our websites. By leveraging Static Web Apps’ managed functions, Azure Functions Node.js v4 support for streaming, OpenAI streamed responses, and Azure AI Search, we can build a chatbot grounded in the HTML content of our website. Best of all, this can be easily added to existing static web apps by leveraging Static Web Apps’ built-in managed functions.

If you're interested in trying this out on your own, check out the reference GitHub repository for the complete project and try out the demo as well: https://black-rock-0fa0bcd1e.5.azurestaticapps.net/

Get started building AI applications with Static Web Apps!

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs