Embeddings
8 TopicsConfigure Embedding Models on Azure AI Foundry with Open Web UI
Introduction Let’s take a closer look at an exciting development in the AI space. Embedding models are the key to transforming complex data into usable insights, driving innovations like smarter chatbots and tailored recommendations. With Azure AI Foundry, Microsoft’s powerful platform, you’ve got the tools to build and scale these models effortlessly. Add in Open Web UI, a intuitive interface for engaging with AI systems, and you’ve got a winning combo that’s hard to beat. In this article, we’ll explore how embedding models on Azure AI Foundry, paired with Open Web UI, are paving the way for accessible and impactful AI solutions for developers and businesses. Let’s dive in! To proceed with configuring the embedding model from Azure AI Foundry on Open Web UI, please firstly configure the requirements below. Requirements: Setup Azure AI Foundry Hub/Projects Deploy Open Web UI – refer to my previous article on how you can deploy Open Web UI on Azure VM. Optional: Deploy LiteLLM with Azure AI Foundry models to work on Open Web UI - refer to my previous article on how you can do this as well. Deploying Embedding Models on Azure AI Foundry Navigate to the Azure AI Foundry site and deploy an embedding model from the “Model + Endpoint” section. For the purpose of this demonstration, we will deploy the “text-embedding-3-large” model by OpenAI. You should be receiving a URL endpoint and API Key to the embedding model deployed just now. Take note of that credential because we will be using it in Open Web UI. Configuring the embedding models on Open Web UI Now head to the Open Web UI Admin Setting Page > Documents and Select Azure Open AI as the Embedding Model Engine. Copy and Paste the Base URL, API Key, the Embedding Model deployed on Azure AI Foundry and the API version (not the model version) into the fields below: Click “Save” to reflect the changes. Expected Output Now let us look into the scenario for when the embedding model configured on Open Web UI and when it is not. Without Embedding Models configured. With Azure Open AI Embedding models configured. Conclusion And there you have it! Embedding models on Azure AI Foundry, combined with the seamless interaction offered by Open Web UI, are truly revolutionizing how we approach AI solutions. This powerful duo not only simplifies the process of building and deploying intelligent systems but also makes cutting-edge technology more accessible to developers and businesses of all sizes. As we move forward, it’s clear that such integrations will continue to drive innovation, breaking down barriers and unlocking new possibilities in the AI landscape. So, whether you’re a seasoned developer or just stepping into this exciting field, now’s the time to explore what Azure AI Foundry and Open Web UI can do for you. Let’s keep pushing the boundaries of what’s possible!1.1KViews0likes0CommentsOptimizing Vector Similarity Search on Azure Data Explorer – Performance Update
This post is co-authored by Anshul_Sharma (Senior Program Manager, Microsoft). This blog is an update of Optimizing Vector Similarity Searches at Scale. We continue to improve the performance of vector similarity search in Azure Data Explorer (Kusto). We present the new functions and policies to maximize performance and the resulting search times. The following table and chart present the search time for the top 3 most similar vectors to a supplied vector: # of vectors Total time [sec.] 25,000 0.03 50,000 0.035 100,000 0.047 200,000 0.062 400,000 0.094 800,000 0.125 1,600,000 0.14 3,200,000 0.15 6,400,000 0.19 12,800,000 0.35 25,600,000 0.55 51,200,000 1.1 102,400,000 2.3 204,800,000 3.9 409,600,000 7.6 This benchmark was done on a medium size Kusto cluster (containing 29 nodes), searching for the most similar vectors in a table of Azure OpenAI embedding vectors. Each vector was generated using ‘text-embedding-ada-002’ embedding model and contains 1536 coefficients. These are the steps to achieve the best performance of similarity search: Use series_cosine_similarity(), the new optimized native function to calculate cosine similarity Set the encoding of the embeddings column to Vector16, the new 16 bit encoding of the vectors coefficients (instead of the default 64 bit) Store the embedding vectors table on all nodes with at least one shard per processor. This can be achieved by limiting the number of embedding vectors per shard by altering ShardEngineMaxRowCount of the sharding policy and RowCountUpperBoundForMerge of the merging policy. Suppose our table contains 1M vectors and our Kusto cluster has 20 nodes each has 16 processors. The table’s shards should contain at most 1000000/(20*16)=3125 rows. These are the KQL commands to create the empty table and set the required policies and encoding: .create table embedding_vectors(vector_id:long, vector:dynamic) // more columns can be added .alter-merge table embedding_vectors policy sharding '{ "ShardEngineMaxRowCount" : 3125 }' .alter-merge table embedding_vectors policy merge '{ "RowCountUpperBoundForMerge" : 3125 }' .alter column embedding_vectors.vector policy encoding type = 'Vector16' Now we can ingest the vectors into the table. And here is a typical search query: let searched_vector = repeat(1536, 0); // to be replaced with real embedding vector. embedding_vectors | extend similarity = series_cosine_similarity_fl(vector, searched_vector, 1, 1) | top 10 by similarity desc The current semantic search times enable usage of ADX as embedding vectors storage platform for RAG (Retrieval Augmented Generation) scenarios and beyond, We continue to improve vector search performance, stay tuned!5KViews4likes2CommentsOptimizing Vector Similarity Searches at Scale
This post is co-authored by @adieldar (Principal Data Scientist, Microsoft) In a previous blog – Azure Data Explorer for Vector Similarity Search, we focused on how Azure Data Explorer (Kusto) is perfectly suited for storing and searching vector embeddings. In this blog, we will focus on performance tuning and optimizations for running vector similarity searches at scale. We will continue working on the Wikipedia scenario where we generate the embeddings of wiki pages using OpenAI and store them in kusto. We then use series_cosine_similarity_fl kusto function to perform similarity searches. Demo scenario Optimizing for scale To optimize the cosine similarity search we need to split the vectors table to many extents that are evenly distributed among all cluster nodes. This can be done by setting Partitioning Policy for the embedding table using the .alter-merge policy partitioning command: .alter-merge table WikipediaEmbeddingsTitleD policy partitioning ``` { "PartitionKeys": [ { "ColumnName": "vector_id_str", "Kind": "Hash", "Properties": { "Function": "XxHash64", "MaxPartitionCount": 2048, // set it to max value create smaller partitions thus more balanced spread among all cluster nodes "Seed": 1, "PartitionAssignmentMode": "Uniform" } } ], "EffectiveDateTime": "2000-01-01" // set it to old date in order to apply partitioning on existing data } ``` In the example above we modified the partitioning policy for WikipediaEmbeddingsTitleD. This table was created from WikipediaEmbeddings by projecting the documents’ title and embeddings. Notes: The partitioning process requires a string key with high cardinality, so we also projected the unique vector_id and converted it to string. The best practice is to create an empty table, modify its partition policy then ingest the data. In that case there is no need to define the old EffectiveDateTime as above. It takes some time after data ingestion until the policy is applied. To test the effect of partitioning we created in a similar manner multiple tables containing up to 1M embedding vectors and tested the cosine similarity performance on clusters with 1, 2, 4, 8 & 20 nodes (SKU Standard_E4d_v5). The following table and chart compare search performance (in seconds) before and after partitioning: Number of Nodes # of vectors 1* (no partitioning) 2 4 8 20 25,000 Vectors 3.4 0.95 0.67 0.57 0.51 50,000 Vectors 6.2 1.5 0.92 0.65 0.55 100,000 Vectors 12.4 2.6 1.55 1 0.57 200,000 Vectors 24.2 5.2 2.8 1.65 0.63 400,000 Vectors 48.5 10.3 5.4 2.95 0.87 800,000 Vectors 96.5 20.5 10.5 6 1.2 1,000,000 Vectors 102 26 13.3 7.2 1.4 * Note that the cluster has 2 nodes, but the tables are stored on a single node (this is our baseline before applying the partitioning policy) You can see that even on the smallest 2 nodes cluster the search speed is improved by more than x4 factor, and in general the speed is inversely proportional to the number of nodes. The number of embedding vectors that are needed for common LLM scenarios (e.g. Retrieval Augmented Generation) rarely exceeds 100K, thus by having 8 nodes searching can be done in 1 sec. How can you get started? If you would like to try this demo, head to the azure_kusto_vector GitHub repository and follow the instructions. The Notebook in the repo will allow you to - Download precomputed embeddings created by OpenAI API. Store the embeddings in ADX. Convert raw text query to an embedding with OpenAI API. Use ADX to perform cosine similarity search in the stored embeddings You can start by - Using KQL Database in Microsoft Fabric by signing up for a free trial - https://aka.ms/try-fabric Spinning up your own free Kusto cluster - https://aka.ms/kustofree We look forward to your feedback and all the exciting things you build with kusto & vectors!7.2KViews3likes4Comments