How ChatGPT Plugins (could) work
Published Mar 27 2023 05:16 AM 21.8K Views
Microsoft

OpenAI just announced ChatGPT Plugins - a way to have ChatGPT execute actions in the web. This doesn't just mean it can go to the internet and browse through up-to-date content and news, but also execute actions for us, like buying groceries, booking flights, and so much more!

 

The implementation is, at least conceptually, very simple:

  • The plugin provider writes a specification for an API using the OpenAPI standard. This is a standard that's been around for a while, and is what powers tools like Swagger for API documentation.
  • Then, this specification is compiled into a prompt, which explains to ChatGPT how it may use the API to enhance its answers. Think of a detailed prompt, including a description of each endpoint that's available.
  • Finally, the user asks new questions. If ChatGPT decides it should grab information from the API, it will make the request and add to the context before attempting to respond.

This process is already documented in the official OpenAI documentation, though it has restricted access at the time of writing.

Instead of waiting until I get access, I decided to implement my own mechanism based on the above. So, below is my (somewhat successful) attempt at implementing my own ChatGPT Plugin mechanism.

 

Quick disclaimer: I have no additional information outside of what has been made public of the current implementation of ChatGPT Plugins. This exercise was done to illustrate a concept of how it could be done, and is not intended to imply what the actual implementation looks like. 

 

Choosing an API Specification

 

The first step was to understand how to specify an API. OpenAI has made a few sample API specifications available, so I decided to use the same input for my own solution, and drafted a simple specification for a single endpoint.

 

I used DummyJSON, a simple API designed for testing, specifically the "Get all todos" endpoint. And I wrote the following YAML file as a specification:

 

openapi: 3.0.1
info:
  title: TODO Plugin
  description: A plugin that allows the user to create and manage a TODO list using ChatGPT. 
  version: 'v1'
servers:
  - url: https://dummyjson.com/todos
paths:
  /todos:
    get:
      operationId: getTodos
      summary: Get the list of todos
      parameters:
      - in: query
        name: limit
        schema:
          type: integer
        description: Number of todos to return
      - in: query
        name: skip
        schema:
          type: integer
        description: Number of todos to skip from the beginning of the list
      responses:
        "200":
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/getTodosResponse'
components:
  schemas:
    getTodosResponse:
      type: object
      properties:
        todos:
          type: array
          items:
            type: object
            properties:
              id:
                type: int
              todo:
                type: string
              completed:
                type: bool
              userId:
                type: string
          description: The list of todos.

 

One endpoint with two parameters: "limit" and "skip". This will serve nicely as an example.

 

Prompt engineering

 

Now, I needed to convert the YAML file above into a prompt, explaining to ChatGPT how to actually structure and execute this request. Luckily, ChatGPT is already great at writing code! So all I had to do was get it to output something in a specific format, so that I could identify it, run the request for it, and provide it the result.

 

After going back and forth a couple of times, this was the final result:

 

You are a virtual assistant that helps users with their questions by relying on
information from HTTP APIs. When the user asks a question, you should determine whether
you need to fetch information from the API to properly answer it. If so, you will
request the user to provide all the parameters you need, and then ask them to run the
request for you. When you are ready to ask for a request, you should specify it using
the following syntax:

<http_request>{
"url": "<request URL>",
"method": "<method>",
"body": {<json request body>},
"headers": {<json request headers>}
}</http_request>

Replace in all the necessary values the user provides during the interaction, and do not
use placeholders. The user will then provide the response body, which you may use to
formulate your answer. You should not respond with code, but rather provide an answer
directly.

The following APIs are available to you:

---
<OpenAPI Specification goes here>

 

Notice I'm asking ChatGPT to respond with a specific syntax, and telling it the user will give it the response. This happens because the AI model itself does not really execute any API calls - it must delegate that action to a different system. Since we don't have access to the inner components of ChatGPT, we are asking it to delegate HTTP requests back to the user. This will not be a problem, as long as we hide these conversation turns from the end user, who might not even know what an HTTP request is.

 

Which brings us to the next section!

 

Orchestration

 

ChatGPT is an AI model, exposed through a REST API. Making the request to an OpenAI model is just one step in an end-to-end chatbot experience. This means we can manipulate what information goes to the model, and what information is displayed to the end user.

 

To implement a virtual assistant with ChatGPT, I used Bot Framework Composer, which is a UI based tool that allows us to build conversation experiences and publish them to many different channels. Below is the solution architecture, at a high level:

 

MarcoCardoso_1-1679692709239.png

 

I used Bot Framework Composer for its ability to quickly deploy to multiple end-user channels, with minimal code required. If you're looking to replicate this solution, you may also want to consider using Power Virtual Agents, especially in production.

 

This is how the flow of the conversation was built:

  1. User asks a question - for example: "What are my top 5 todos?"
  2. ChatGPT responds with a preformatted message: 

    <http_request>{

    "url": "https://dummyjson.com/todos?limit=5",

    "method": "GET",

    "body": "",

    "headers": {}

    }</http_request>

  3. Azure Bot detects the format, and submits the request to the DummyJSON API, without looping in the end user.
  4. Azure Bot makes a new request to ChatGPT, on behalf of the user, with the response body.
  5. ChatGPT formats the response: "Here are your top 5 todos: ..."
  6. Azure Bot responds to the user.

 

One thing that immediately called my attention - ChatGPT can now make requests to any server on the internet. We only "explained" to it how to use one endpoint, but there is nothing stopping it from generating code that goes somewhere else! For that reason, I applied a simple domain allowlisting, so that requests could only go to the DummyJSON API, and only one at a time - it never hurts to be too safe.

 

That concludes the design section. Now, on to implementing!

 

Final result

 

I will spare the readers the details of the (many) iterations until the experience was just right. This being a statistical tool, some trial and error is expected until the right prompt is reached. But finally, here's a conversation I had with the final version of the bot:

 

MarcoCardoso_0-1679693480543.png

 

You can go and check out https://dummyjson.com/todos and you will see these are, in fact, the first todos listed!

I also wanted to share the following conversation from a previous test, where I had an incorrect parameter name in my specification. Realizing my mistake, I explained what was wrong, and ChatGPT gave me the right answer. 

 

MarcoCardoso_0-1679881339673.png

 

Conclusion

 

It is very likely that ChatGPT Plugins have more to them than the quick demo above. The purpose of this exercise was to show how this integration could be done - and trust me, I'm just as curious as you might be as to what the actual implementation is. Giving ChatGPT the ability to integrate over HTTP opens up a whole new set of possibilities, and I can't wait to see what the community can come up with.

 

At the same time, our responsibility as users of this technology are also increased: what happens if an ill-intended prompt gets Azure Bot to make a request to an unknown server? What new attack vectors are now available? In my bot, I applied a simple domain allowlisting - will that be enough as new use cases start to appear? I also managed to rewrite the API specification in a later prompt - are there risks associated with that possibility? There are many security and Responsible AI concerns at stake - and OpenAI is certainly aware of this.

 

Overall, I am very impressed. The possibilities are truly endless, and I'll certainly keep an eye on this functionality as it evolves in the next weeks and months. And I hope to see it in Azure OpenAI soon!

2 Comments
Co-Authors
Version history
Last update:
‎Mar 27 2023 05:22 AM
Updated by: