With public preview of integrated vectorization, a ground-breaking capability of vector search in Azure AI Search (previously Azure Cognitive Search), you can do vector search with data stored in Azure SQL Database easily. This feature is designed to streamline the process of chunking, generating, storing, and querying vectors for vector search in Azure AI Search. This feature significantly speeds up the development of the vectorization pipeline and minimizes maintenance tasks during data ingestion and query time. It paves the way for seamless integration of vectors into your applications with traditional search demands and Retrieval-Augmented Generation (RAG) applications.
Vector search: In Azure AI Search, this is a capability for indexing, storing, and retrieving vector embeddings from a search index. By representing text as vectors, vector search can identify the most similar documents based on their proximity in a vector space. In vector search, vectorization refers to the conversion of text data into vector embeddings.
Chunking: Process of dividing data into smaller manageable parts (chunks) that can be processed independently. Chunking is required if source documents are too large for the maximum input size of embedding and/or large language models.
Retrieval Augmented Generation (RAG): Architecture that augments the capabilities of a Large Language Model (LLM) like ChatGPT by adding an information retrieval system (i.e., Azure AI Search) that provides the data.
Example with data stored in Azure SQL Database
Let’s create an example of an AI application that responds to users' queries based on the Azure SQL DB table of Amazon product reviews. This example is using a Python notebook. The same operations can be done via the Azure portal or scripting in your favorite deployment option. You can find this example at: https://aka.ms/sql-ai/sql-acs/sample (Thanks to our SQL Data Science Team especially Jordan DuBeau).
The end behavior will be something like:
[User search]: Canned dog food
[AI Response]: After searching through our product database, I recommend <product ID> because...
Behind the scenes, we take the following steps:
Set up a sample table in SQL DB and upload data to it.
Set up an index in Azure AI Search to store the data we need, including vectorized versions of the text reviews.
Set up an indexer in Azure AI Search to pull data into the index.
Automatically chunks and vectorizes the data using an Azure OpenAI Embedding service.
Use Azure AI Search to process the user's query and search for the most relevant data.
Use an Azure OpenAI Completion service to respond to the user's query.
Setting up environment
You will need to set up the environment and add the environment information in example.env file.
Pre-requisites in this example:
An existing SQL Database with server name, DB name, username, and password copied into example.env
The user must have permission to create a new table and enable and view change tracking on the database
You must whitelist your IP to access your SQL server by opening the SQL server resource in the Azure portal, navigating to Security / Networking, and adding your IP.
An OpenAI resource with the endpoint and key copied into example.env
An Azure AI Search resource with the endpoint and key copied into example.env
The Python packages listed in requirements.txt
The Microsoft ODBC 18
This sample shows how to load data as well, you can skip that step and start with your own product data if needed.
Next, create a data source connection. This step creates a connection that will be used to pull data from our SQL table. Documentation can be found here.
Chunk, vectorize, configure, and store data
Create an index and configure integrated vectorization
Here are the steps:
Take the combined text (summary + review text) from each product review.
Split the combined text into chunks.
Embed each chunk as a vector.
(Later) search for the most relevant chunk based on the incoming query.
To enable this, the search index will store all of the following data, for each chunk of text:
Id of chunk
Vector version of chunk text
Id of parent row
Product Id from parent row
Review text from parent row
Summary text from parent row
Score from parent row
All of these values will be stored in SearchFields specified in the code.
In this step we also configure the search algorithm(s), and the vectorizer that will automatically vectorize the incoming query.
Documentation about creating indexes can be found here.
Create skillset and indexer
We use two built-in skills provided by Azure AI Search: 1. The Split Skill takes the review text and divides it into chunks (to stay within the token limits for the OpenAI embedding service). 2. The Azure Open AI Embedding Skill takes the outputs of the Split Skill and vectorizes them individually. Then we create an indexer that uses our skillset to pull data from the SQL Database, separate the text into chunks, vectorize each chunk, and store all the required data in the index we created above. Documentation is here to start learning more about indexers and skillsets.
Use vector search for sample application
user_query = "Canned dog food"
In the following output, we find the top 3 chunks that are most relevant to the user's query.
Search score: 0.88524085
Parent Id: 1 | Chunk id: f59640a3248d_1_pages_0
Product Id: B001E4KFG0
Text chunk: Summary: Good Quality Dog Food | Review: I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better.
Review summary: Good Quality Dog Food
Review text: I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than most.
Review score: 5
Search score: 0.87025785
Parent Id: 94 | Chunk id: f327d3004d0c_94_pages_3
Product Id: B0019CW0HE
Text chunk: a couple of cans. I came home and to my surprise realized that I could save $20 each time I bought dog food if I just buy it off Amazon.<br /><br />All in all, I definitely recommend and give my stamp of approval to natural balance dog food. While I have never eaten it, my dog seems to love it.
Review summary: Great Dog Food!
Review text: My golden retriever is one of the most picky dogs I've ever met. After experimenting with various types of food, I have found she loves natural balance. What I really like about natural balance is the fact that it has multiple flavors in dry and wet varieties. I mix her dry food with a little wet food and my golden loves it. Furthermore, I do like mixing up the flavors each time as I think the same meal day over day might get a little boring, so I figured why not. I tend to stay away from the fish type though as it smells...<br /><br />Additionally, I started purchasing off Amazon because Petco didn't have the wet food box and only had a couple of cans. I came home and to my surprise realized that I could save $20 each time I bought dog food if I just buy it off Amazon.<br /><br />All in all, I definitely recommend and give my stamp of approval to natural balance dog food. While I have never eaten it, my dog seems to love it.
Review score: 5
Search score: 0.8640232
Parent Id: 98 | Chunk id: 0bc498329489_98_pages_0
Product Id: B0019CW0HE
Text chunk: Summary: Great allergy sensitive dog food, dogs love it | Review: Our pup has experienced allergies in forms of hotspots and itching from other dog foods. The cheap 'you can buy it anywhere' food not only have crazy preservatives in them but can cause health problems for your pets.
Review summary: Great allergy sensitive dog food, dogs love it
Review text: Our pup has experienced allergies in forms of hotspots and itching from other dog foods. The cheap 'you can buy it anywhere' food not only have crazy preservatives in them but can cause health problems for your pets. This food works wonders on reducing allergies and our dog loves the food.<br />This message is RAMSEY FrAnkenSteiN approved.
Result: After searching through our product database, we recommend the Vitality canned dog food (B001E4KFG0). This product looks more like a stew than a processed meat, smells better, and was given a 5-star review by a finicky Labrador.
Clean up resources
After finishing the sample, remember to delete unneeded resources:
Table created within existing SQL DB
Within the Search Service resource:
Data source connection
These resources can always be recreated by rerunning the notebook.