Leveraging Cohere Embed V3 int8 embeddings with Azure AI Search

Microsoft

Apr 09, 2024

Last week, we announced our partnership with Cohere, enabling customers to easily leverage Cohere models via Azure AI Studio Model Catalog, including Cohere’s latest LLM - Command R+. Today, we are thrilled to announce that you can store and search over Cohere's latest Embed-V3-Multilingual and Embed V3-English int8 embeddings using Azure AI Search. This capability offers significant memory cost reductions while often maintaining high search quality, making it an ideal solution for semantic search over large datasets powering your Generative AI applications.

"With int8 Cohere embeddings available in Azure AI Search, Cohere and Azure users alike can now run advanced RAG using a memory-optimized embedding model and a state-of-the-art retrieval system.” - Nils Reimers, Cohere's Director of Machine Learning,

With int8 embeddings, customers can achieve a 4x memory saving and about a 30% speed-up in search, while keeping 99.99% of the search quality. Read the full announcement from Cohere here: Cohere int8 & binary Embeddings - Scale Your Vector Database to Large Datasets.

Here's a step-by-step guide on how to use Cohere Embed V3 int8 embeddings with Azure AI Search:

Install required libraries

Install the necessary libraries, including the Azure Search Python SDK and Cohere Python SDK. Note, ensure you use our latest 2024-03-01-Preview API Version.

pip install --pre azure-search-documents
pip install azure-identity

Set up Cohere and Azure AI Search credentials

Set up your credentials for both Cohere and Azure AI Search. Set up your credentials for both Cohere and Azure AI Search. You can find these in the Cohere Dashboard and on the Keys blade of the Azure Portal in your Azure AI Search service.

cohere_api_key = os.getenv("COHERE_API_KEY") 
co = cohere.Client(cohere_api_key) 
search_service_endpoint = os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT") 
search_service_api_key = os.getenv("AZURE_SEARCH_ADMIN_KEY") 
index_name = "cohere-embed-v3-index" 
credential = AzureKeyCredential(search_service_api_key)

Generate Embeddings Function

Use the Cohere Embed API to generate int8 embeddings for a list of documents.

def generate_embeddings(texts, input_type="search_document"): 
    model = "embed-english-v3.0" 
    response = co.embed( 
        texts=texts, 
        model=model, 
        input_type=input_type, 
        embedding_types=["int8"], 
    ) 
    return [embedding for embedding in response.embeddings.int8]

Create Azure AI Search Index

Create or update an Azure AI Search index to include a vector field for storing the document embeddings.

def create_or_update_index(client, index_name): 
    fields = [ 
        SimpleField(name="id", type=SearchFieldDataType.String, key=True), 
        SearchField( 
            name="text", 
            type=SearchFieldDataType.String, 
            searchable=True, 
        ), 
        SearchField( 
            name="embedding", 
            type="Collection(Edm.SByte)", 
            vector_search_dimensions=1024, 
            vector_search_profile_name="my-vector-config",  
        ), 
    ] 
    vector_search = VectorSearch( 
        profiles=[ 
            VectorSearchProfile( 
                name="my-vector-config", 
                algorithm_configuration_name="my-hnsw", 
            ) 
        ], 
        algorithms=[ 
            HnswAlgorithmConfiguration( 
                name="my-hnsw", 
                kind=VectorSearchAlgorithmKind.HNSW, 
            ) 
        ], 
    ) 
    index = SearchIndex(name=index_name, fields=fields, vector_search=vector_search) 
    client.create_or_update_index(index=index)

Index Documents and their Embeddings

Index the documents along with their int8 embeddings into Azure AI Search.

def index_documents(search_client, documents, embeddings): 
    documents_to_index = [ 
        {"id": str(idx), "text": doc, "embedding": emb} 
        for idx, (doc, emb) in enumerate(zip(documents, embeddings)) 
    ] 
    search_client.upload_documents(documents=documents_to_index)

Run the workflow

Run the above steps to generate the embeddings, create the search index, and upload the documents.

documents = [ 
    "Alan Turing  was an English mathematician, computer scientist, logician, cryptanalyst, philosopher and theoretical biologist.", 
    "Albert Einstein was a German-born theoretical physicist who is widely held to be one of the greatest and most influential scientists of all time.", 
    "Isaac Newton was an English polymath active as a mathematician, physicist, astronomer, alchemist, theologian, and author who was described in his time as a natural philosopher.", 
    "Marie Curie was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity" 
] 

# Generate embeddings 
embeddings = generate_embeddings(documents) 

# Initialize Azure Search Index Client 
search_index_client = SearchIndexClient( 
    endpoint=search_service_endpoint, 
    credential=credential, 
    index_name=index_name 
)

# Create or update the search index to include the embedding field 
create_or_update_index(search_index_client, index_name) 

# Initialize the SearchClient 
search_client = SearchClient( 
    endpoint=search_service_endpoint,  
    index_name=index_name,  
    credential=credential 
) 
# Index the documents and their embeddings 
index_documents(search_client, documents, embeddings)

Perform a vector search

Use the Azure AI Search client to perform a vector search using the generated embeddings.

# Query for vector search 
query = "foundational figures in computer science" 

# Generate query embeddings 
# Use input_type="search_query" for query embeddings 
query_embeddings = generate_embeddings(query, input_type="search_query") 
search_client = SearchClient(search_service_endpoint, index_name, credential) 
vector_query = VectorizedQuery( 
    vector=query_embeddings[0], k_nearest_neighbors=3, fields="embedding" 
) 
results = search_client.search( 
    search_text=None,  # No search text for pure vector search 
    vector_queries=[vector_query], 
) 
for result in results: 
    print(f"Text: {result['text']}") 
    print(f"Score: {result['@search.score']}\n")

Title: Alan Turing  was an English mathematician, computer scientist, logician, cryptanalyst, philosopher and theoretical biologist. 
Score: 0.62440896 

Title: Albert Einstein was a German-born theoretical physicist who is widely held to be one of the greatest and most influential scientists of all time. 
Score: 0.59141135 

Title: Marie Curie was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity 
Score: 0.57616836

Find the full notebook here: azure-search-vector-samples/demo-python/code/community-integration/cohere/azure-search-cohere-embed-v3-sample.ipynb at main · Azure/azure-search-vector-samples (github.com)