GitHub Models: Retrieval Augmented Generation (RAG)

Microsoft

Nov 19, 2024

We recently announced that we’re launching GitHub Models, enabling more than 100 million GitHub developers to become AI developers and build with industry-leading AI models. GitHub Models opens the door for you to rapidly go from idea to code to cloud, simplifying model experimentation and selection across the best of Azure AI catalog.

Today, we’re announcing Retrieval Augmented Generation (RAG), powered by Azure AI Search, for GitHub Models. Coming soon to public beta, GitHub Models RAG simplifies the development of user friendly and high-quality RAG. With an intuitive interface and seamless integration within GitHub, you can effortlessly create data-grounded applications with your own data. Your RAG indexes automatically take advantage of Azure AI Search’s capabilities – hybrid text and vector retrieval, semantic ranking, integrated vectorization, and more – right out of the box.

Key features you’ll love:

Playground for Experimentation: For the first time, you can easily ground your model with your own data by uploading files directly within the Playground. This intuitive setup lets you quickly experiment with RAG.
Advanced Scenarios in Code: With only your GitHub personal access token (PAT), explore ready-to-use code samples and dive into more advanced RAG and retrieval scenarios, whether in Codespaces or your preferred development tools.
Free, Full-Featured Azure AI Search Service: With only your GitHub credentials, get started with Azure AI Search at no cost. It comes auto provisioned for you without an Azure subscription. This free version provides a growth path for expanding into full production as your needs evolve.
Full Azure AI Search Query API: Enjoy complete access to the Azure AI Search query API, including powerful features like vectors, hybrid, and semantic ranker – all within GitHub Models.

From Playground to Code: Frictionless Grounding with Your Data

Traditional standard models can present significant challenges, relying on static, outdated knowledge that often results in inaccurate, generalized responses lacking the specificity your projects require. Customizing these models for your needs is both time-consuming and costly, and requires navigating multiple tools and workflows, which adds unnecessary complexity.

GitHub Models RAG transforms this experience by enabling your models to access up-to-date information, retrieving current, relevant documents as needed. This approach reduces reliance on outdated data, providing contextual accurate answers grounded in your data – without the need for frequent retraining. You can start in the GitHub Models playground at no cost and easily transition to code for more advanced scenarios. With only your GitHub PAT, you can experiment with RAG building blocks, tune retrieval and response accuracy with Azure AI Search features like vectors, hybrid search, and semantic ranker, and set up scalable solutions tailored to your domain.

import os
import json
  
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential
from azure.ai.inference.models import *
import requests
from azure.search.documents import SearchClient
from azure.search.documents.models import QueryType, VectorizableTextQuery
    
# To authenticate with the model you will need to generate a personal access token (PAT) in your GitHub settings.
# Create your PAT token by following instructions here: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens
GITHUB_TOKEN = os.environ["GITHUB_TOKEN"]
MODELS_ENDPOINT = "https://models.inference.ai.azure.com"
  
client = ChatCompletionsClient(
    endpoint=MODELS_ENDPOINT,
    credential=AzureKeyCredential(GITHUB_TOKEN),
)
 
# Your Free Azure AI Search endpoint will be auto-provisioned
SEARCH_ENDPOINT = https://[AzureAISearchServiceName].search.windows.net
 
search_client = SearchClient(
    endpoint=SEARCH_ENDPOINT,
    index_name="default-vector-index",
    credential=AzureKeyCredential(GITHUB_TOKEN),
)
  
def search_documents(query: str):
    """Search documents matching a query given as a series of keywords."""
    # search for documents that are semantically similar to the query
    r = search_client.search(
        search_text=query,
        vector_queries=[
            VectorizableTextQuery(
                text=query, k_nearest_neighbors=50, fields="text_vector"
            )
        ],
        query_type=QueryType.SEMANTIC,
        top=5,
        select=["chunk_id", "title", "chunk"],
    )
  
    # create a json structure for the tool output
    results = [f"title:{doc['title']}\n" + f"content: {doc['chunk']}".replace("\n", " ") for doc in r]
    # return the json structure as a string
    return json.dumps(results)
    
search_tool = ChatCompletionsToolDefinition(
    function=FunctionDefinition(
        name=search_documents.__name__,
        description=search_documents.__doc__,
        parameters={
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "a series of keywords to use as query to search documents",
                }
            },
            "required": ["query"],
        },
    )
)
  
# start a loop to interact with the model
agent_returned = False
messages = [
    SystemMessage(
        content="""You are an assistant that answers users questions about documents. You are given a tool that can search within those documents. Do that systematically to provid the best answers to the user's questions. If you do not find the information, just say you do not know."""
    ),
    UserMessage(content="Can I claim an ambulance?"),]
print(f"User> {messages[-1].content}")
  
while not agent_returned:
    # Initialize the chat history with the assistant's welcome message
    response = client.complete(
        messages=messages,
        tools=[search_tool],
        model="gpt-4o",
        temperature=1,
        max_tokens=4096,
        top_p=1,
    )
  
    # We expect the model to ask for a tool call
    if response.choices[0].finish_reason == CompletionsFinishReason.TOOL_CALLS:
        # Append the model response to the chat history
        messages.append(
            AssistantMessage(tool_calls=response.choices[0].message.tool_calls)
        )
 
        # There might be multiple tool calls to run in parallel
        if response.choices[0].message.tool_calls:
            for tool_call in response.choices[0].message.tool_calls:
                # We expect the tool to be a function call
                if isinstance(tool_call, ChatCompletionsToolCall):
                    # Parse the function call arguments and call the function
                    function_args = json.loads(
                        tool_call.function.arguments.replace("'", '"')
                    )
                    print(
                        f"System> Calling function `{tool_call.function.name}` with arguments {function_args}"
                    )
                    callable_func = locals()[tool_call.function.name]
                    function_return = callable_func(**function_args)
                    print(f"System> Function returned.")
 
                    # Append the function call result to the chat history
                    messages.append(
                        ToolMessage(tool_call_id=tool_call.id, content=function_return)
                    )
                else:
                    raise ValueError(
                        f"Expected a function call tool, instead got: {tool_call}"
                    )
    else:
        agent_returned = True
        print(f"Assistant> {response.choices[0].message.content}")

The code snippet can be accessed in the code tab of the GitHub Models playground. This provides a convenient way for initiating quick experimentation, allowing you to obtain relevant responses tailored to your specific data. All you need is your GitHub PAT.

Enhance Retrieval with Azure AI Search’s Hybrid Search and Semantic Ranking

Azure AI Search enhances your experience with GitHub Models RAG by providing advanced retrieval capabilities through hybrid search and semantic ranking. With hybrid search, you can combine keyword and vector retrieval, using Reciprocal Rank Fusion (RRF) to merge and select the most relevant results from each method. This fusion step ensures that both precise keywords and contextual relevance play a role in the results you receive.

Additionally, Semantic Ranker adds a reranking layer to your initial results, whether they are BM25-ranked or RRF-ranked. Leveraging deep learning models that support multi-lingual contexts, Semantic Ranker emphasizes results that are semantically relevant to the query. Because Semantic Ranker runs natively within the Azure AI Search stack, our data shows that combining semantic ranking with hybrid retrieval delivers the most effective and relevant retrieval experience right out of the box.

Both hybrid search and semantic ranking are enabled by default in the GitHub Models playground. Query rewriting will come to GitHub Models soon, and as we introduce new Azure AI Search capabilities, they will be part of GitHub Models RAG for free, ensuring you always have access to the latest advancements.

What’s Next?

GitHub Models RAG enters public beta next month, giving you the perfect opportunity to dive in, experiment, and start building intelligent, data-grounded applications. Join the public beta to explore the ease of use of GitHub Models RAG firsthand.