vectors

7 Topics

Ollama on HTTPS for SQL Server
Here is a quick procedure to deploy an Ubuntu container with Ollama and expose its API over HTTPS. The goal is to allow a fast deployment, even for those unfamiliar with Docker or Language Models, making it easy to set up an offline platform for generating embeddings and using Small Language Models This is particularly useful when testing SQL Server 2025 for fully on-premises environment use cases, since SQL Server only allows access to HTTPS endpoints. However, HTTP remains open for testing purposes. Please note that this example is CPU-based, as deploying with (integrated) GPU support involves additional, less straightforward steps. This example is provided solely to illustrate the concept, is not intended for production use, and comes without any guarantee of performance or security. Prerequisites To continue, you need to have Docker Desktop, WSL and SQL Server 2025 (currently Release Candidate 1) Docker Desktop Install WSL | Microsoft Learn SQL Server 2025 Preview | Microsoft Evaluation Center Create a Dockerfile First, create a working directory. In this example, C:\Docker\Ollama will be used. Simply create a file named Dockerfile (without an extension) and paste the following content into it. FROM ubuntu:25.10 RUN apt update && apt install -y curl gnupg2 ca-certificates lsb-release apt-transport-https software-properties-common unzip nano openssl net-tools RUN curl -fsSL https://ollama.com/install.sh | bash RUN curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg RUN curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | tee /etc/apt/sources.list.d/caddy-stable.list RUN apt update && apt install -y caddy RUN mkdir -p /etc/caddy/certs RUN cat > /etc/caddy/certs/san.cnf <<EOF [req] default_bits = 2048 prompt = no default_md = sha256 req_extensions = req_ext distinguished_name = dn [dn] CN = 127.0.0.1 [req_ext] subjectAltName = @alt_names [alt_names] IP.1 = 127.0.0.1 DNS.1 = localhost EOF RUN openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /etc/caddy/certs/localhost.key -out /etc/caddy/certs/localhost.crt -config /etc/caddy/certs/san.cnf -extensions req_ext RUN echo "https://:443 {\n tls /etc/caddy/certs/localhost.crt /etc/caddy/certs/localhost.key\n reverse_proxy localhost:11434\n}" >> /etc/caddy/Caddyfile RUN echo "#!/bin/bash" > /usr/local/bin/entrypoint.sh && \ echo "set -e" >> /usr/local/bin/entrypoint.sh && \ echo "OLLAMA_HOST=0.0.0.0 ollama serve >> /var/log/ollama.log 2>&1 &" >> /usr/local/bin/entrypoint.sh && \ echo "caddy run --config /etc/caddy/Caddyfile --adapter caddyfile >> /var/log/caddy.log 2>&1 &" >> /usr/local/bin/entrypoint.sh && \ echo "tail -f /var/log/ollama.log /var/log/caddy.log" >> /usr/local/bin/entrypoint.sh && \ chmod 755 /usr/local/bin/entrypoint.sh ENTRYPOINT ["/usr/local/bin/entrypoint.sh"] For your information, this file allows the creation of an image based on Ubuntu 25.10 and includes: Ollama, for running the models Caddy, for the reverse proxy Creation of a certificate for the HTTPS endpoint on localhost Create the container After opening a Powershell terminal, execute the following commands: cd C:\Docker\Ollama #Build the image from the Dockerfile. docker build -t ollama-https . #Create a container based on the image ollama-https docker run --name ollama-https -d -it -p 443:443 -p 11434:11434 ollama-https #Copy the certificate created into the current Windows directory docker cp ollama-https:/etc/caddy/certs/localhost.crt . # Install the certificate in Trusted Root Certification Authorities Import-Certificate -FilePath "localhost.crt" -CertStoreLocation "Cert:\LocalMachine\Root" #Check Https (wget https://localhost).Content #Check Http (wget http://localhost:11434).Content Ollama is now running With a browser, connect to https://localhost Retrieve Models No model is retrieved when the image is created, as this depends on each use case, and for some models, the size can be substantial. Here’s a quick example for pulling an embedding model, Nomic, and a small language model, Phi3. Ollama Search docker exec ollama-https ollama pull nomic-embed-text docker exec ollama-https ollama pull phi3:mini A quick example with SQL Server 2025 A quick demonstration using the WideWorldImporters database (Wide World Importers sample database) use [master] GO ALTER DATABASE WideWorldImporters SET COMPATIBILITY_LEVEL = 170 WITH ROLLBACK IMMEDIATE GO DBCC TRACEON(466, 474, 13981, -1) GO Note: With RC1, you can use the PREVIEW_FEATURES database-scoped configuration T-SQL Declare an external model for embeddings. use [WideWorldImporters] GO CREATE EXTERNAL MODEL NomicLocal AUTHORIZATION dbo WITH ( LOCATION = 'https://localhost/api/embed', API_FORMAT = 'ollama', MODEL_TYPE = EMBEDDINGS, MODEL = 'nomic-embed-text' ) to enable semantic search capabilities on StockItems, we will create a dedicated table to store embeddings (no chunking in this example) along with a vector index optimized for cosine similarity use [WideWorldImporters] GO CREATE TABLE [Warehouse].[StockItemsEmbedding](StockItemEmbeddingID int identity (1,1) PRIMARY KEY, StockItemId int, SearchDetails nvarchar(max), Embedding vector(768)) GO INSERT INTO [Warehouse].[StockItemsEmbedding] SELECT si.StockItemID, si.SearchDetails, AI_GENERATE_EMBEDDINGS(si.SearchDetails USE MODEL NomicLocal) /* Generate embeddings from declared external model */ FROM [Warehouse].[StockItems] si GO /* Check */ SELECT * FROM [Warehouse].[StockItemsEmbedding] GO CREATE VECTOR INDEX IXV_1 ON [Warehouse].[StockItemsEmbedding] (Embedding) WITH (METRIC = 'cosine', TYPE = 'DiskANN') GO /* User Input */ DECLARE @UserInput varchar(max) = 'Which product is best suited for shipping small items?' /* and Generate embeddings for user input */ DECLARE @UserInputV vector(768) = AI_GENERATE_EMBEDDINGS(@UserInput USE MODEL NomicLocal) DECLARE @ModelInput nvarchar(max) DECLARE Payload nvarchar(max) DECLARE Response nvarchar(max) /* Similarity Search on StockItems and Model Input creation*/ SELECT @ModelInput = STRING_AGG('ProductDetails: ' + sie.SearchDetails + 'UnitPrice: ' + CAST(si.UnitPrice AS nvarchar(max)), ' \n\n') FROM VECTOR_SEARCH( TABLE = [Warehouse].[StockItemsEmbedding] as sie, COLUMN = Embedding, SIMILAR_TO = @UserInputV, METRIC = 'cosine', TOP_N = 10 ) JOIN [Warehouse].[StockItems] si ON si.StockItemId = sie.StockItemId /* Generate payload for response generation */ SELECT = '{"model": "phi3:mini", "stream": false, "prompt":"You are acting as a customer advisor responsible for recommending the most suitable products based on customer needs, providing clear and personalized suggestions. Question : ' + @UserInput + '\n\nList of Items : ' + @ModelInput + '"}'; EXECUTE sp_invoke_external_rest_endpoint @url = 'https://localhost/api/generate', @method = 'POST', = , @timeout = 230, = OUTPUT; PRINT JSON_VALUE(@response, '$.result.response') LangChain You can also have a try with LangChain. Same demo with a small difference, there is no vector index created on the vector store table. The table has been modified, but only for demonstration purposes. Reference: SQLServer | 🦜️🔗 LangChain # PREREQ #sudo apt-get update && sudo apt-get install -y unixodbc # sudo apt-get update # sudo apt-get install -y curl gnupg2 # curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add - # curl https://packages.microsoft.com/config/debian/11/prod.list | sudo tee /etc/apt/sources.list.d/mssql-release.list # sudo apt-get update # sudo ACCEPT_EULA=Y apt-get install -y msodbcsql18 # pip3 install langchain langchain-sqlserver langchain-ollama langchain-community import pyodbc from langchain_sqlserver import SQLServer_VectorStore from langchain_ollama import OllamaEmbeddings from langchain_ollama import ChatOllama from langchain.schema import Document from langchain_community.vectorstores.utils import DistanceStrategy #Prompt for testing _USER_INPUT = 'Which product is best suited for shipping small items?' ############### Params ########################################## print("\033[93mSetting up variables...\033[0m") _SQL_DRIVER = "ODBC Driver 18 for SQL Server" _SQL_SERVER = "localhost\\SQL2K25" _SQL_DATABASE = "WideWorldImporters" _SQL_USERNAME = "lc" _SQL_PASSWORD = "lc" _SQL_TRUST_CERT = "yes" _SQL_VECTOR_STORE_TABLE = "StockItem_VectorStore" # Table name for vector storage _MODIFY_TABLE_TO_USE_SQL_VECTOR_INDEX = True #As vector index not considered currently in langchain and structure does not match vector index requirements _CONNECTION_STRING = f"Driver={{{_SQL_DRIVER}}};Server={_SQL_SERVER};Database={_SQL_DATABASE};UID={_SQL_USERNAME};PWD={_SQL_PASSWORD};TrustServerCertificate={_SQL_TRUST_CERT}" _OLLAMA_API_URL = "https://localhost" _OLLAMA_EMBEDDING_MODEL = "nomic-embed-text:latest" _OLLAMA_EMBEDDING_VECTOR_SIZE = 768 _OLLAMA_SLM_MODEL = "phi3:mini" # Model for SLM queries ################################################################### #Define Ollama embeddings embeddings = OllamaEmbeddings( model=_OLLAMA_EMBEDDING_MODEL, base_url=_OLLAMA_API_URL ) conn = pyodbc.connect(_CONNECTION_STRING) cursor = conn.cursor() #Drop embeddings table if it exists print("\033[93mDropping existing vector store table if it exists...\033[0m") cursor.execute(f"DROP TABLE IF EXISTS Warehouse.{_SQL_VECTOR_STORE_TABLE};") print("\033[93mConnecting to SQL Server and fetching data...\033[0m") cursor.execute("SELECT StockItemId, SearchDetails, UnitPrice FROM Warehouse.StockItems;") rows = cursor.fetchall() print(f"\033[93mFound {len(rows)} records to process\033[0m") # Create documents from the fetched data documents = [ Document( page_content=row.SearchDetails, metadata={ "StockItemId": row.StockItemId, "UnitPrice": float(row.UnitPrice) # Convert Decimal to float } ) for row in rows ] conn.commit() #Creating vector store print("\033[93mCreating vector store...\033[0m") vector_store = SQLServer_VectorStore( connection_string=_CONNECTION_STRING, distance_strategy=DistanceStrategy.COSINE, # If not provided, defaults to COSINE embedding_function=embeddings, embedding_length=_OLLAMA_EMBEDDING_VECTOR_SIZE, db_schema = "Warehouse", table_name=_SQL_VECTOR_STORE_TABLE ) print("\033[93mAdding to vector store...\033[0m") try: vector_store.add_documents(documents) print("\033[93mSuccessfully added to vector store!\033[0m") except Exception as e: print(f"\033[91mError adding documents: {e}\033[0m") #Vector index not yet integrated in SQL Server VectorStore (drop auto-created nonclustered PK and generating int clustered PK if (_MODIFY_TABLE_TO_USE_SQL_VECTOR_INDEX): print("\033[93mModifying structure to create vector index...\033[0m") cursor.execute("DECLARE @AutoCreatedPK sysname, @SQL nvarchar(max);" f"SELECT @AutoCreatedPK = name FROM sys.key_constraints WHERE type = 'PK' AND parent_object_id = object_id('Warehouse.{_SQL_VECTOR_STORE_TABLE}');" f"SELECT @SQL = 'ALTER TABLE Warehouse.{_SQL_VECTOR_STORE_TABLE} DROP CONSTRAINT ' + @AutoCreatedPK + ';'" "EXEC sp_executesql @SQL;" f"ALTER TABLE Warehouse.{_SQL_VECTOR_STORE_TABLE} ADD Alt_Id int identity(1,1);" f"ALTER TABLE Warehouse.{_SQL_VECTOR_STORE_TABLE} ADD CONSTRAINT PK_{_SQL_VECTOR_STORE_TABLE} PRIMARY KEY (Alt_Id);") conn.commit() print("\033[93mCreating vector index...\033[0m") cursor.execute(f"CREATE VECTOR INDEX IV_{_SQL_VECTOR_STORE_TABLE} ON [Warehouse].[{_SQL_VECTOR_STORE_TABLE}] (embeddings) WITH (METRIC = 'cosine', TYPE = 'DiskANN');") conn.commit() #Generate prompt then answer print(f"\033[92mUser Input: {_USER_INPUT}\033[0m") context = [ { "Item": doc.page_content, "UnitPrice": doc.metadata.get("UnitPrice", None) } for doc in vector_store.similarity_search(_USER_INPUT, k=3) ] llm = ChatOllama(model=_OLLAMA_SLM_MODEL,base_url=_OLLAMA_API_URL) prompt = ( f"You are acting as a customer advisor responsible for recommending the most suitable products based on customer needs, providing clear and personalized suggestions" f"Context: {context}\n\nQuestion: {_USER_INPUT}\n\n") response = llm.invoke(prompt) print(f"\033[36m{response.content}\033[0m") Note : If using devcontainer with VSCode add "runArgs": [ "--network=host" ] to devcontainer.json to allow connections to “localhost”. Import and install the previously created certificat docker cp C:\Docker\Ollama\localhost.crt <devcontainer name>:/usr/local/share/ca-certificates/localhost.crt docker exec <devcontainer name> "update-ca-certificates" Disclaimer The sample scripts are not supported under any Microsoft standard support program or service. The sample scripts are provided AS IS without warranty of any kind. Microsoft further disclaims all implied warranties including, without limitation, any implied warranties of merchantability or of fitness for a particular purpose. The entire risk arising out of the use or performance of the sample scripts and documentation remains with you. In no event shall Microsoft, its authors, or anyone else involved in the creation, production, or delivery of the scripts be liable for any damages whatsoever (including, without limitation, damages for loss of business profits, business interruption, loss of business information, or other pecuniary loss) arising out of the use of or inability to use the sample scripts or documentation, even if Microsoft has been advised of the possibility of such damages.
RyadB
Nov 03, 2025 Place Core Infrastructure and Security Blog
346Views
0likes
0Comments
Optimizing Vector Similarity Search on Azure Data Explorer – Performance Update
This post is co-authored by Anshul_Sharma (Senior Program Manager, Microsoft). This blog is an update of Optimizing Vector Similarity Searches at Scale. We continue to improve the performance of vector similarity search in Azure Data Explorer (Kusto). We present the new functions and policies to maximize performance and the resulting search times. The following table and chart present the search time for the top 3 most similar vectors to a supplied vector: # of vectors Total time [sec.] 25,000 0.03 50,000 0.035 100,000 0.047 200,000 0.062 400,000 0.094 800,000 0.125 1,600,000 0.14 3,200,000 0.15 6,400,000 0.19 12,800,000 0.35 25,600,000 0.55 51,200,000 1.1 102,400,000 2.3 204,800,000 3.9 409,600,000 7.6 This benchmark was done on a medium size Kusto cluster (containing 29 nodes), searching for the most similar vectors in a table of Azure OpenAI embedding vectors. Each vector was generated using ‘text-embedding-ada-002’ embedding model and contains 1536 coefficients. These are the steps to achieve the best performance of similarity search: Use series_cosine_similarity(), the new optimized native function to calculate cosine similarity Set the encoding of the embeddings column to Vector16, the new 16 bit encoding of the vectors coefficients (instead of the default 64 bit) Store the embedding vectors table on all nodes with at least one shard per processor. This can be achieved by limiting the number of embedding vectors per shard by altering ShardEngineMaxRowCount of the sharding policy and RowCountUpperBoundForMerge of the merging policy. Suppose our table contains 1M vectors and our Kusto cluster has 20 nodes each has 16 processors. The table’s shards should contain at most 1000000/(20*16)=3125 rows. These are the KQL commands to create the empty table and set the required policies and encoding: .create table embedding_vectors(vector_id:long, vector:dynamic) // more columns can be added .alter-merge table embedding_vectors policy sharding '{ "ShardEngineMaxRowCount" : 3125 }' .alter-merge table embedding_vectors policy merge '{ "RowCountUpperBoundForMerge" : 3125 }' .alter column embedding_vectors.vector policy encoding type = 'Vector16' Now we can ingest the vectors into the table. And here is a typical search query: let searched_vector = repeat(1536, 0); // to be replaced with real embedding vector. embedding_vectors | extend similarity = series_cosine_similarity_fl(vector, searched_vector, 1, 1) | top 10 by similarity desc The current semantic search times enable usage of ADX as embedding vectors storage platform for RAG (Retrieval Augmented Generation) scenarios and beyond, We continue to improve vector search performance, stay tuned!
adieldar
Sep 29, 2024 Place Azure Data Explorer Blog
5.1KViews
4likes
2Comments
LangChain integration with Azure Database for PostgreSQL (Part 1)
Use LangChain to split documents into smaller chunks, generate embeddings for each chunk using Azure OpenAI, and store them in a PostgreSQL database via the pgvector extension. Then, we’ll perform a vector similarity search on the embedded documents.
JoshMSFT
Jun 11, 2024 Place Microsoft Blog for PostgreSQL
5.7KViews
2likes
1Comment
Optimizing Vector Similarity Searches at Scale
This post is co-authored by @adieldar (Principal Data Scientist, Microsoft) In a previous blog – Azure Data Explorer for Vector Similarity Search, we focused on how Azure Data Explorer (Kusto) is perfectly suited for storing and searching vector embeddings. In this blog, we will focus on performance tuning and optimizations for running vector similarity searches at scale. We will continue working on the Wikipedia scenario where we generate the embeddings of wiki pages using OpenAI and store them in kusto. We then use series_cosine_similarity_fl kusto function to perform similarity searches. Demo scenario Optimizing for scale To optimize the cosine similarity search we need to split the vectors table to many extents that are evenly distributed among all cluster nodes. This can be done by setting Partitioning Policy for the embedding table using the .alter-merge policy partitioning command: .alter-merge table WikipediaEmbeddingsTitleD policy partitioning ``` { "PartitionKeys": [ { "ColumnName": "vector_id_str", "Kind": "Hash", "Properties": { "Function": "XxHash64", "MaxPartitionCount": 2048, // set it to max value create smaller partitions thus more balanced spread among all cluster nodes "Seed": 1, "PartitionAssignmentMode": "Uniform" } } ], "EffectiveDateTime": "2000-01-01" // set it to old date in order to apply partitioning on existing data } ``` In the example above we modified the partitioning policy for WikipediaEmbeddingsTitleD. This table was created from WikipediaEmbeddings by projecting the documents’ title and embeddings. Notes: The partitioning process requires a string key with high cardinality, so we also projected the unique vector_id and converted it to string. The best practice is to create an empty table, modify its partition policy then ingest the data. In that case there is no need to define the old EffectiveDateTime as above. It takes some time after data ingestion until the policy is applied. To test the effect of partitioning we created in a similar manner multiple tables containing up to 1M embedding vectors and tested the cosine similarity performance on clusters with 1, 2, 4, 8 & 20 nodes (SKU Standard_E4d_v5). The following table and chart compare search performance (in seconds) before and after partitioning: Number of Nodes # of vectors 1* (no partitioning) 2 4 8 20 25,000 Vectors 3.4 0.95 0.67 0.57 0.51 50,000 Vectors 6.2 1.5 0.92 0.65 0.55 100,000 Vectors 12.4 2.6 1.55 1 0.57 200,000 Vectors 24.2 5.2 2.8 1.65 0.63 400,000 Vectors 48.5 10.3 5.4 2.95 0.87 800,000 Vectors 96.5 20.5 10.5 6 1.2 1,000,000 Vectors 102 26 13.3 7.2 1.4 * Note that the cluster has 2 nodes, but the tables are stored on a single node (this is our baseline before applying the partitioning policy) You can see that even on the smallest 2 nodes cluster the search speed is improved by more than x4 factor, and in general the speed is inversely proportional to the number of nodes. The number of embedding vectors that are needed for common LLM scenarios (e.g. Retrieval Augmented Generation) rarely exceeds 100K, thus by having 8 nodes searching can be done in 1 sec. How can you get started? If you would like to try this demo, head to the  azure_kusto_vector  GitHub repository and follow the instructions. The Notebook in the repo will allow you to -    Download precomputed embeddings created by OpenAI API.  Store the embeddings in ADX.  Convert raw text query to an embedding with OpenAI API.  Use ADX to perform cosine similarity search in the stored embeddings  You can start by - Using KQL Database in Microsoft Fabric by signing up for a free trial - https://aka.ms/try-fabric Spinning up your own free Kusto cluster -  https://aka.ms/kustofree We look forward to your feedback and all the exciting things you build with kusto & vectors!
Anshul_Sharma
Apr 09, 2024 Place Azure Data Explorer Blog
7.3KViews
3likes
4Comments
Build RAG Chat App using Azure Cosmos DB for MongoDB vCore and Azure OpenAI: Step-by-Step Guide
Build a RAG Chat Web Application using the Semantic Kernel, Azure OpenAI, and Azure Cosmos DB for MongoDB vCore: Step-by-Step Guide
JohnAziz
Mar 14, 2024 Place Educator Developer Blog
34KViews
3likes
0Comments
Azure Cognitive Search and LangChain: A Seamless Integration for Enhanced Vector Search Capabilities
Unveil the power of enhanced search capabilities with Azure Cognitive Search vector search and LangChain!
gia_mondragon
Feb 26, 2024 Place Microsoft Foundry Blog
58KViews
6likes
17Comments
Azure Data Explorer for Vector Similarity Search
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/series-cosine-similarity-function In the world of AI & data analytics, vector databases are emerging as a powerful tool for managing complex and high-dimensional data. In this article, we will explore the concept of vector databases, the need for vector databases in data analytics, and how Azure Data Explorer (ADX) aka Kusto can be used as a vector database.
Anshul_Sharma
Jan 08, 2024 Place Azure Data Explorer Blog
32KViews
13likes
5Comments