GitHub Models - Limited Public Beta SIGNUP TODAY
Welcome to GitHub Models! We've got everything fired up and ready for you to explore AI Models hosted on Azure AI. So as Student developer you already have access to amazing GitHub Resources like Codespaces and Copilot from http://education.github.com now you get started on developing with Generative AI and Language Models with the Model Catalog.
Access and onboarding
GitHub Models provide free access to a set of AI models for anyone with a GitHub account.
This makes it significantly easier to get familiar with AI models without having to create Azure Resources or download models from Hugging Face.
The GitHub Model; is your opportunity to test out these models for free.
Key features of GitHub Models
Seamless integration with Codespaces allows for quick learning and engagement.
- The ease of local use, for free, in code they may have already written.
- The ability to switch between model providers using the same API call via the Azure AI inference API, eliminating the need to change code between providers.
For more information about the Models available on GitHub Models, check out the GitHub Model Marketplace
Each model has a dedicated playground and sample code available in a dedicated codespaces environment and utilizes the Azure Inference API so swapping models is simply changing the model name.
There are a few basic examples that are ready for you to run. You can find them in the samples directory within the codespaces environment.
If you want to jump straight to your favorite language, you can find the examples in the following Languages:
- Python
- JavaScript
- cURL
The dedicated Codespaces Environment is an excellent way to get started running the samples and models.
Below are example code snippets for a few use cases. For additional information about Azure AI Inference SDK, see full documentation and samples.
- Create a personal access token You do not need to give any permissions to the token. Note that the token will be sent to a Microsoft service.
To use the code snippets below, create an environment variable to set your token as the key for the client code.
If you're using bash:
export GITHUB_TOKEN="<your-github-token-goes-here>"
$Env:GITHUB_TOKEN="<your-github-token-goes-here>"
set GITHUB_TOKEN=<your-github-token-goes-here>
Install the Azure AI Inference SDK using pip (Requires: Python >=3.8):
pip install azure-ai-inference
This sample demonstrates a basic call to the chat completion API. It is leveraging the GitHub AI model inference endpoint and your GitHub token. The call is synchronous.
import os
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential
endpoint = "https://models.inference.ai.azure.com"
# Replace Model_Name
model_name = "Phi-3-small-8k-instruct"
token = os.environ["GITHUB_TOKEN"]
client = ChatCompletionsClient(
endpoint=endpoint,
credential=AzureKeyCredential(token),
)
response = client.complete(
messages=[
SystemMessage(content="You are a helpful assistant."),
UserMessage(content="What is the capital of France?"),
],
model=model_name,
temperature=1.,
max_tokens=1000,
top_p=1.
)
print(response.choices[0].message.content)
This sample demonstrates a multi-turn conversation with the chat completion API. When using the model for a chat application, you'll need to manage the history of that conversation and send the latest messages to the model.
import os
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import AssistantMessage, SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential
token = os.environ["GITHUB_TOKEN"]
endpoint = "https://models.inference.ai.azure.com"
# Replace Model_Name
model_name = "Phi-3-small-8k-instruct"
client = ChatCompletionsClient(
endpoint=endpoint,
credential=AzureKeyCredential(token),
)
messages = [
SystemMessage(content="You are a helpful assistant."),
UserMessage(content="What is the capital of France?"),
AssistantMessage(content="The capital of France is Paris."),
UserMessage(content="What about Spain?"),
]
response = client.complete(messages=messages, model=model_name)
print(response.choices[0].message.content)
For a better user experience, you will want to stream the response of the model so that the first token shows up early and you avoid waiting for long responses.
import os
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential
token = os.environ["GITHUB_TOKEN"]
endpoint = "https://models.inference.ai.azure.com"
# Replace Model_Name
model_name = "Phi-3-small-8k-instruct"
client = ChatCompletionsClient(
endpoint=endpoint,
credential=AzureKeyCredential(token),
)
response = client.complete(
stream=True,
messages=[
SystemMessage(content="You are a helpful assistant."),
UserMessage(content="Give me 5 good reasons why I should exercise every day."),
],
model=model_name,
)
for update in response:
if update.choices:
print(update.choices[0].delta.content or "", end="")
client.close()
Install Node.js.
Copy the following lines of text and save them as a file package.json inside your folder.
{
"type": "module",
"dependencies": {
"@azure-rest/ai-inference": "latest",
"@azure/core-auth": "latest",
"@azure/core-sse": "latest"
}
}
Note: azure/core-sse is only needed when you stream the chat completions response.
Open a terminal window in this folder and run npm install.
For each of the code snippets below, copy the content into a file sample.js and run with node sample.js.
This sample demonstrates a basic call to the chat completion API. It is leveraging the GitHub AI model inference endpoint and your GitHub token. The call is synchronous.
import ModelClient from "@azure-rest/ai-inference";
import { AzureKeyCredential } from "@azure/core-auth";
const token = process.env["GITHUB_TOKEN"];
const endpoint = "https://models.inference.ai.azure.com";
// Update your modelname
const modelName = "Phi-3-small-8k-instruct";
export async function main() {
const client = new ModelClient(endpoint, new AzureKeyCredential(token));
const response = await client.path("/chat/completions").post({
body: {
messages: [
{ role:"system", content: "You are a helpful assistant." },
{ role:"user", content: "What is the capital of France?" }
],
model: modelName,
temperature: 1.,
max_tokens: 1000,
top_p: 1.
}
});
if (response.status !== "200") {
throw response.body.error;
}
console.log(response.body.choices[0].message.content);
}
main().catch((err) => {
console.error("The sample encountered an error:", err);
});
This sample demonstrates a multi-turn conversation with the chat completion API. When using the model for a chat application, you'll need to manage the history of that conversation and send the latest messages to the model.
import ModelClient from "@azure-rest/ai-inference";
import { AzureKeyCredential } from "@azure/core-auth";
const token = process.env["GITHUB_TOKEN"];
const endpoint = "https://models.inference.ai.azure.com";
// Update your modelname
const modelName = "Phi-3-small-8k-instruct";
export async function main() {
const client = new ModelClient(endpoint, new AzureKeyCredential(token));
const response = await client.path("/chat/completions").post({
body: {
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "What is the capital of France?" },
{ role: "assistant", content: "The capital of France is Paris." },
{ role: "user", content: "What about Spain?" },
],
model: modelName,
}
});
if (response.status !== "200") {
throw response.body.error;
}
for (const choice of response.body.choices) {
console.log(choice.message.content);
}
}
main().catch((err) => {
console.error("The sample encountered an error:", err);
});
For a better user experience, you will want to stream the response of the model so that the first token shows up early and you avoid waiting for long responses.
import ModelClient from "@azure-rest/ai-inference";
import { AzureKeyCredential } from "@azure/core-auth";
import { createSseStream } from "@azure/core-sse";
const token = process.env["GITHUB_TOKEN"];
const endpoint = "https://models.inference.ai.azure.com";
// Update your modelname
const modelName = "Phi-3-small-8k-instruct";
export async function main() {
const client = new ModelClient(endpoint, new AzureKeyCredential(token));
const response = await client.path("/chat/completions").post({
body: {
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Give me 5 good reasons why I should exercise every day." },
],
model: modelName,
stream: true
}
}).asNodeStream();
const stream = response.body;
if (!stream) {
throw new Error("The response stream is undefined");
}
if (response.status !== "200") {
stream.destroy();
throw new Error(`Failed to get chat completions, http operation failed with ${response.status} code`);
}
const sseStream = createSseStream(stream);
for await (const event of sseStream) {
if (event.data === "[DONE]") {
return;
}
for (const choice of (JSON.parse(event.data)).choices) {
process.stdout.write(choice.delta?.content ?? ``);
}
}
}
main().catch((err) => {
console.error("The sample encountered an error:", err);
});
Paste the following into a shell:
curl -X POST "https://models.inference.ai.azure.com/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $GITHUB_TOKEN" \
-d '{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the capital of France?"
}
],
"model": "Phi-3-small-8k-instruct"
}'
Call the chat completion API and pass the chat history:
curl -X POST "https://models.inference.ai.azure.com/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $GITHUB_TOKEN" \
-d '{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the capital of France?"
},
{
"role": "assistant",
"content": "The capital of France is Paris."
},
{
"role": "user",
"content": "What about Spain?"
}
],
"model": "Phi-3-small-8k-instruct"
}'
This is an example of calling the endpoint and streaming the response.
curl -X POST "https://models.inference.ai.azure.com/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $GITHUB_TOKEN" \
-d '{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Give me 5 good reasons why I should exercise every day."
}
],
"stream": true,
"model": "Phi-3-small-8k-instruct"
}'
The rate limits for the playground and free API usage are intended to help you experiment with models and prototype your AI application. For use beyond those limits, and to bring your application to scale, you must provision resources from an Azure account, and authenticate from there instead of your GitHub personal access token. You don't need to change anything else in your code. Use this link to discover how to go beyond the free tier limits in Azure AI.