OpenAI just announced ChatGPT Plugins - a way to have ChatGPT execute actions in the web. This doesn't just mean it can go to the internet and browse through up-to-date content and news, but also execute actions for us, like buying groceries, booking flights, and so much more!
The implementation is, at least conceptually, very simple:
This process is already documented in the official OpenAI documentation, though it has restricted access at the time of writing.
Instead of waiting until I get access, I decided to implement my own mechanism based on the above. So, below is my (somewhat successful) attempt at implementing my own ChatGPT Plugin mechanism.
Quick disclaimer: I have no additional information outside of what has been made public of the current implementation of ChatGPT Plugins. This exercise was done to illustrate a concept of how it could be done, and is not intended to imply what the actual implementation looks like.
The first step was to understand how to specify an API. OpenAI has made a few sample API specifications available, so I decided to use the same input for my own solution, and drafted a simple specification for a single endpoint.
I used DummyJSON, a simple API designed for testing, specifically the "Get all todos" endpoint. And I wrote the following YAML file as a specification:
openapi: 3.0.1
info:
title: TODO Plugin
description: A plugin that allows the user to create and manage a TODO list using ChatGPT.
version: 'v1'
servers:
- url: https://dummyjson.com/todos
paths:
/todos:
get:
operationId: getTodos
summary: Get the list of todos
parameters:
- in: query
name: limit
schema:
type: integer
description: Number of todos to return
- in: query
name: skip
schema:
type: integer
description: Number of todos to skip from the beginning of the list
responses:
"200":
description: OK
content:
application/json:
schema:
$ref: '#/components/schemas/getTodosResponse'
components:
schemas:
getTodosResponse:
type: object
properties:
todos:
type: array
items:
type: object
properties:
id:
type: int
todo:
type: string
completed:
type: bool
userId:
type: string
description: The list of todos.
One endpoint with two parameters: "limit" and "skip". This will serve nicely as an example.
Now, I needed to convert the YAML file above into a prompt, explaining to ChatGPT how to actually structure and execute this request. Luckily, ChatGPT is already great at writing code! So all I had to do was get it to output something in a specific format, so that I could identify it, run the request for it, and provide it the result.
After going back and forth a couple of times, this was the final result:
You are a virtual assistant that helps users with their questions by relying on
information from HTTP APIs. When the user asks a question, you should determine whether
you need to fetch information from the API to properly answer it. If so, you will
request the user to provide all the parameters you need, and then ask them to run the
request for you. When you are ready to ask for a request, you should specify it using
the following syntax:
<http_request>{
"url": "<request URL>",
"method": "<method>",
"body": {<json request body>},
"headers": {<json request headers>}
}</http_request>
Replace in all the necessary values the user provides during the interaction, and do not
use placeholders. The user will then provide the response body, which you may use to
formulate your answer. You should not respond with code, but rather provide an answer
directly.
The following APIs are available to you:
---
<OpenAPI Specification goes here>
Notice I'm asking ChatGPT to respond with a specific syntax, and telling it the user will give it the response. This happens because the AI model itself does not really execute any API calls - it must delegate that action to a different system. Since we don't have access to the inner components of ChatGPT, we are asking it to delegate HTTP requests back to the user. This will not be a problem, as long as we hide these conversation turns from the end user, who might not even know what an HTTP request is.
Which brings us to the next section!
ChatGPT is an AI model, exposed through a REST API. Making the request to an OpenAI model is just one step in an end-to-end chatbot experience. This means we can manipulate what information goes to the model, and what information is displayed to the end user.
To implement a virtual assistant with ChatGPT, I used Bot Framework Composer, which is a UI based tool that allows us to build conversation experiences and publish them to many different channels. Below is the solution architecture, at a high level:
I used Bot Framework Composer for its ability to quickly deploy to multiple end-user channels, with minimal code required. If you're looking to replicate this solution, you may also want to consider using Power Virtual Agents, especially in production.
This is how the flow of the conversation was built:
<http_request>{
"url": "https://dummyjson.com/todos?limit=5",
"method": "GET",
"body": "",
"headers": {}
}</http_request>
One thing that immediately called my attention - ChatGPT can now make requests to any server on the internet. We only "explained" to it how to use one endpoint, but there is nothing stopping it from generating code that goes somewhere else! For that reason, I applied a simple domain allowlisting, so that requests could only go to the DummyJSON API, and only one at a time - it never hurts to be too safe.
That concludes the design section. Now, on to implementing!
I will spare the readers the details of the (many) iterations until the experience was just right. This being a statistical tool, some trial and error is expected until the right prompt is reached. But finally, here's a conversation I had with the final version of the bot:
You can go and check out https://dummyjson.com/todos and you will see these are, in fact, the first todos listed!
I also wanted to share the following conversation from a previous test, where I had an incorrect parameter name in my specification. Realizing my mistake, I explained what was wrong, and ChatGPT gave me the right answer.
It is very likely that ChatGPT Plugins have more to them than the quick demo above. The purpose of this exercise was to show how this integration could be done - and trust me, I'm just as curious as you might be as to what the actual implementation is. Giving ChatGPT the ability to integrate over HTTP opens up a whole new set of possibilities, and I can't wait to see what the community can come up with.
At the same time, our responsibility as users of this technology are also increased: what happens if an ill-intended prompt gets Azure Bot to make a request to an unknown server? What new attack vectors are now available? In my bot, I applied a simple domain allowlisting - will that be enough as new use cases start to appear? I also managed to rewrite the API specification in a later prompt - are there risks associated with that possibility? There are many security and Responsible AI concerns at stake - and OpenAI is certainly aware of this.
Overall, I am very impressed. The possibilities are truly endless, and I'll certainly keep an eye on this functionality as it evolves in the next weeks and months. And I hope to see it in Azure OpenAI soon!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.