Blog Post

Startups at Microsoft
3 MIN READ

Azure AI Studio Prompt Flow with Azure Data Explorer for Vector Search

Denise_Schlesinger's avatar
Nov 19, 2024

In this article I will guide you on how to use Azure AI Studio Prompt Flow with Azure Data Explorer to test your prompts when building a RAG application.

Introduction

First let's dive into the tools we are using in this tutorial.

  • Azure AI Studio is a trusted platform that empowers developers to drive innovation and shape the future with AI in a safe, secure, and responsible way. The comprehensive platform accelerates the development of production-ready copilots to support enterprise chat, content generation, data analysis, and more. Developers can explore cutting-edge APIs and models for their use cases; build and test solutions with collaborative and responsible AI tools, safeguards, and best practices; deploy AI innovations for use in websites, applications, and other production environments; and manage solutions with continuous monitoring and governance in production.

 

  • Prompt flow is a development tool designed to streamline the entire development cycle of AI applications powered by Large Language Models (LLMs). Prompt flow provides a comprehensive solution that simplifies the process of prototyping, experimenting, iterating, and deploying your AI applications. Prompt flow is available independently as an open-source project on GitHub, with its own SDK and VS Code extension. Prompt flow is also available and recommended to use as a feature within both Azure AI Studio and Azure Machine Learning studio. This set of documentation focuses on prompt flow in Azure AI Studio.

 

  • Azure Data Explorer as a Vector DB - At the core of Vector Similarity Search is the ability to store, index, and query vector data.  ADX is a cloud-based data analytics service that enables users to perform advanced analytics on large datasets in real-time. It is particularly well-suited for handling large volumes of data, making it an excellent choice for storing and searching vectors.  ADX supports a special data type called dynamic, which can store unstructured data such as arrays and property bags. Dynamic data type is perfect for storing vector values. You can further augment the vector value by storing metadata related to the original object as separate columns in your table.  We have introduced a new encoding type Vector16 designed for storing vectors of floating-point numbers in 16 bits precision (utilizing the Bfloat16 instead of the default 64 bits). It is highly recommended for storing ML vector embeddings as it reduces storage requirements by a factor of 4 and accelerates vector processing functions such as series_dot_product() and series_cosine_similarity(), by orders of magnitude.

This tutorial

This tutorial has 2 parts:

  • Generate the embeddings and ingest them into our Database in the Azure Data Explorer cluster. 
    • A notebook for the generation and ingestion
    • A notebook to search the Vector DB
  • Use Azure AI studio Prompt Flow with a Custom Connection to ADX to test our prompts.

The code can be found here.

Creating the embeddings in Azure Data Explorer

Follow the instructions in the README file to create a DB and ingest the embeddings into our Vector DB (Azure Data Explorer).

Then run the “RAG - Azure Data Explorer - search your data” notebook

Please copy the kcsb (Kusto connection string) value for the variable that will be printed here:

cluster = KUSTO_CLUSTER
kcsb = KustoConnectionStringBuilder.with_aad_application_key_authentication(cluster, KUSTO_MANAGED_IDENTITY_APP_ID, KUSTO_MANAGED_IDENTITY_SECRET,  AAD_TENANT_ID)
print(kcsb)
client = KustoClient(kcsb)
kusto_db = KUSTO_DATABASE

Copy the kcsb value and set it aside – we will use it later. We will need the ADX connection string to create a custom connection in Prompt Flow.

Prompt Flow with Azure Data Explorer as our Vector DB

Go to Azure Ai Studio (Ai.azure.com) and create new project called “pf-rag”

 

 

 

 

Click on “all hubs and projects”

 

 

Click on “Connected Resources”

 

 

Create new connection

 

 

Click on “custom keys”

 

 

 

 

 

 

Now you should see a new connection named “adx_aidemox_conn_string

Click on “all hubs + projects”

 

 

Click on the project “pf-rag-proj” and click “Go to project”

 

 

Click on Prompt flow

 

 

Create a prompt flow

 

 

Let’s use a predefined template for our prompt flow and clone it

 

 

Change the input question to “nonstick grills”

 

 

Delete the “lookup” block

 

 

Copy the code in promptflow - search.py

And paste it in the python block in Prompt Flow

 

 

Change the name of the python block to “adx_vectordb_search”

 

 

 

 

Change the name of the Prompt block to “Generate_NL_answer”

 

 

 

 

Start a compute session and click on “more tools”, select “embeddings”

 

 

Call the new node “embed_the_query”

 

 

Choose the OpenAI models as follows:

 

 

Change the  node as follows:

 

Your prompt flow flow chart should look like this:

 

Save the prompt flow

Run the prompt flow

 

 

 

Conclusion

  • We created an ADX Database and table.
  • We created embeddings and saved them to ADX to use it as a Vector DB.
  • We created a prompt flow and used ADX as our Vector DB to test our prompts.

I hope you enjoyed this tutorial!

Denise

Updated Nov 19, 2024
Version 1.0
No CommentsBe the first to comment