pgvector
15 TopicsIntroducing DiskANN Vector Index in Azure Database for PostgreSQL
We're thrilled to announce the preview of DiskANN, a leading vector indexing algorithm, on Azure Database for PostgreSQL - Flexible Server! Developed by Microsoft Research and used extensively at Microsoft in global services such as Bing and Microsoft 365, DiskANN enables developers to build highly accurate, performant and scalable Generative AI applications surpassing pgvector’s HNSW and IVFFlat in both latency and accuracy. DiskANN also overcomes a long-standing limitation of pgvector in filtered vector search, where it occasionally returns incorrect results.Fueling the Agentic Web Revolution with NLWeb and PostgreSQL
We’re excited to announce that NLWeb (Natural Language Web), Microsoft’s open project for natural language interfaces on websites now supports PostgreSQL. With this enhancement, developers can leverage PostgreSQL and NLWeb to transform any website into an AI-powered application or Model Context Protocol (MCP) server. This integration allows organizations to utilize a familiar, robust database as the foundation for conversational AI experiences, streamlining deployment and maximizing data security and scalability. Soon, autonomous agents, not just human users, will consume and interpret website content, transforming how information is accessed and utilized online. During Microsoft //Build 2025, Microsoft introduced the era of the open agentic web, in which the internet is an open agentic web a new paradigm in which autonomous agents seamlessly interact across individual, organizational, team and end-to-end business contexts. To realize the future of an open agentic web, Microsoft announced the NLWeb project. NLWeb transforms any website to an AI-powered application with just a few lines of code and by connecting to an AI model and a knowledge base. In this post, we’ll cover: What NLWeb is and how it works with vector databases How pgvector enables vector similarity search in PostgreSQL for NLWeb Get started using NLWeb with Postgres Let’s dive in and see how Postgres + NLWeb can redefine conversational web interfaces while keeping your data in a familiar, powerful database. What is NLWeb? A Quick Overview of Conversational Web Interfaces NLWeb is an open project developed by Microsoft to simplify adding conversational AI interfaces to websites. How NLWeb works under the hood: Processes existing data/website content that exists in semi-structured formats like Schema.org, RSS, and other data that websites already publish Embeds and indexes all the content in a vector store (i.e PostgreSQL with pgvector) Routes user queries through several processes which handle natural langague understanding, reranking and retrieval. Answers queries with an LLM The result is a high-quality natural language interface on top of web data, giving developers the ability to let users “talk to” web data. By default, every NLWeb instance is also a Model Context Protocol (MCP) server, allowing websites to make their content discoverable and accessible to agents and other participants in the MCP ecosystem if they choose. Importantly, NLWeb is platform-agnostic and supports many major operating systems, AI models, and vector stores and the NLWeb project is modular by design, so developers can bring their own retrieval system, model APIs, and define their own extensions. NLWeb with PostgreSQL PostgreSQL is now embedded into the NLWeb reference stack as a native retriever, creating a scalable and flexible path for deploying NLWeb instances using open-source infrastructure. Retrieval Powered by pgvector NLWeb leverages pgvector, a PostgreSQL extension for efficient vector similarity search, to handle natural language retrieval at scale. By integrating pgvector into the NLWeb stack, teams can eliminate the need for external vector databases. Web data stored in PostgreSQL becomes immediately searchable and usable for NLWeb experiences, streamlining infrastructure and enhancing security. PostgreSQL's robust governance features and wide adoption align with NLWeb’s mission to enable conversational AI for any website or content platform. With pgvector retrieval built in, developers can confidently launch NLWeb instances on their own databases no additional infrastructure required. Implementation example We are going to use NLWeb and Postgres, to create a conversational AI app and MCP server that will let us chat with content from the Talking Postgres with Claire Giordano Podcast! Prerequisites An active Azure account. Enable and configure the pg_vector extensions. Create an Azure AI Foundry project. Deploy models gpt-4.1, gpt-4.1-mini and text-embedding-3-small. Install Visual Studio Code. Install the Python extension. Install Python 3.11.x. Install the Azure CLI (latest version). Getting started All the code and sample datasets are available in this GitHub repository. Step 1: Setup NLWeb Server 1. Clone or download the code from the repo. git clone https://github.com/microsoft/NLWeb cd NLWeb 2. Open a terminal to create a virtual python environment and activate it. python -m venv myenv source myenv/bin/activate # Or on Windows: myenv\Scripts\activate 3. Go to the 'code/python' folder in NLWeb to install the dependencies. cd code/python pip install -r requirements.txt 4. Go to the project root folder in NLWeb and copy the .env.template file to a new .env file cd ../../ cp .env.template .env 5. In the .env file, update the API key you will use for your LLM endpoint of choice and update the Postgres connection string. For example: AZURE_OPENAI_ENDPOINT="https://TODO.openai.azure.com/" AZURE_OPENAI_API_KEY="<TODO>" # If using Postgres connection string POSTGRES_CONNECTION_STRING="postgresql://<HOST>:<PORT>/<DATABASE>?user=<USERNAME>&sslmode=require" POSTGRES_PASSWORD="<PASSWORD>" 6. Update your config files (located in the config folder) to make sure your preferred providers match your .env file. There are three files that may need changes. config_llm.yaml: Update the first line to the LLM provider you set in the .env file. By default it is Azure OpenAI. You can also adjust the models you call here by updating the models noted. By default, we are assuming 4.1 and 4.1-mini. config_embedding.yaml: Update the first line to your preferred embedding provider. By default it is Azure OpenAI, using text-embedding-3-small. config_retrieval.yaml: Update the first line to postgres. You should update write_endpoint to postgres and You should update postgres retrieval endpoint is enabled to 'true' in the following list of possible endpoints. Step 2: Initialize Postgres Server Go to the 'code/python/misc folder in NLWeb to run Postgres initializer. NOTE: If you are using Azure Postgres Flexible server make sure you have `vector` extension allow-listed and make sure the database has the vector extension enabled, cd code/python/misc python postgres_load.py Step 3: Ingest Data from Talk Postgres Podcast Now we will load some data in our local vector database to test with. We've listed a few RSS feeds you can choose from below. Go to the 'code/python folder in NLWeb and run the command. The format of the command is as follows (make sure you are still in the 'python' folder when you run this): python -m data_loading.db_load <RSS URL> <site-name> Talking Postgres with Claire Giordano Podcast: python -m data_loading.db_load https://feeds.transistor.fm/talkingpostgres Talking-Postgres (Optional) You can check the documents table in your Postgres database and verify the table looks like the one below. To verify all the data from the website was uploaded. Test NLWeb Server Start your NLWeb server (again from the 'python' folder): python app-file.py Go to http://localhost:8000/ Start ask questions about the Talking Postgres with Claire Giordano Podcast, you may try different modes. Trying List Mode: Sample Prompt: “I want to listen to something that talks about the advances in vector search such as DiskANN” Trying Generate Mode Sample Prompt: “What did Shireesh Thota say about the future of Postgres?” Running NLWeb with MCP 1. If you do not already have it, install MCP in your venv: pip install mcp 2. Next, configure your Claude MCP server. If you don’t have the config file already, you can create the file at the following locations: macOS: ~/Library/Application Support/Claude/claude_desktop_config.json Windows: %APPDATA%\Claude\claude_desktop_config.json The default MCP JSON file needs to be modified as shown below: macOS Example Configuration { “mcpServers”: { “ask_nlw”: { “command”: “/Users/yourname/NLWeb/myenv/bin/python”, “args”: [ “/Users/yourname/NLWeb/code/chatbot_interface.py”, “—server”, “http://localhost:8000”, “—endpoint”, “/mcp” ], “cwd”: “/Users/yourname/NLWeb/code” } } } Windows Example Configuration { “mcpServers”: { “ask_nlw”: { “command”: “C:\\Users\\yourusername\\NLWeb\\myenv\\Scripts\\python”, “args”: [ “C:\\Users\\yourusername\\NLWeb\\code\\chatbot_interface.py”, “—server”, “http://localhost:8000”, “—endpoint”, “/mcp” ], “cwd”: “C:\\Users\\yourusername\\NLWeb\\code” } } } Note: For Windows paths, you need to use double backslashes (\\) to escape the backslash character in JSON. 3. Go to the 'code/python’ folder in NLWeb and run the command. Enter your virtual environment and start your NLWeb local server. Make sure it is configured to access the data you would like to ask about from Claude. # On macOS source ../myenv/bin/activate python app-file.py # On Windows ..\myenv\Scripts\activate python app-file.py 4. Open Claude Desktop. It should ask you to trust the 'ask_nlw' external connection if it is configured correctly. After clicking yes and the welcome page appears, you should see 'ask_nlw' in the bottom right '+' options. Select it to start a query. 5. To query NLWeb, just type 'ask_nlw' in your prompt to Claude. You'll notice that you also get the full JSON script for your results. Remember, you must have your local NLWeb server started to use this option. Learn More Vector Store in Azure Postgres Flexible Server Generative AI in Azure Postgres Flexible Server NLWeb GitHub repo includes: A reference server for handling natural language queries PGvector integrationBuild AI-Ready Apps and Agents with PostgreSQL on Azure
As developers, we’re constantly looking for ways to build smarter, faster, and more scalable applications. The Microsoft Reactor series, Build AI apps with Azure Database for PostgreSQL, is a four-part livestream experience designed to help you do just that—by combining the power of PostgreSQL with Azure’s AI capabilities. Dive into the world of AI apps and agents with Azure Database for PostgreSQL in this engaging video series—your ideal starting point for building intelligent solutions and improving your workflow. Get ready to explore the fundamentals of AI and discover how vector support in databases can elevate your applications. Uncover how innovative tools like the Visual Studio Code extension for PostgreSQL and GitHub Copilot can make your database work faster and more efficient. You'll also see how to create intelligent apps and AI agents using frameworks such as LangChain and Semantic Kernel. Why This Series Matters PostgreSQL is already a favorite among developers for its flexibility and open-source strength. But when paired with Azure’s AI services, it becomes a launchpad for intelligent applications. This series walks you through how to: Orchestrate AI agents using PostgreSQL as a foundation. Enhance semantic search with vector support and indexes like DiskANN. Integrate Azure AI services to enrich your data and user experiences. Boost productivity with tools like the Visual Studio Code PostgreSQL extension and GitHub Copilot. What You'll Learn Each session is packed with practical insights: Episode 1: Laying the foundation: AI-powered apps and agents with Azure Database for PostgreSQL We introduce key AI concepts, setting the stage for a deeper understanding of Large Language Models (LLMs) and its applications, we will explore the capabilities of Azure Database for PostgreSQL, focusing on how its vector support enables advanced semantic search through technologies like DiskANN indexes. We'll also discuss the Azure AI extension, which brings powerful AI features to your data projects, helping you enrich your applications with enhanced search relevance and intelligent insights, and provide a solid foundation for leveraging these tools in your own solutions. Register here Episode 2: Accelerate your data and AI tasks with the VS Code extension for PostgreSQL and GitHub Copilot This talk will delve into how the Visual Studio Code extension for PostgreSQL can streamline your database management, while GitHub Copilot's AI-powered assistance can boost your productivity. Learn how to seamlessly integrate these tools to enhance your workflow, automate repetitive tasks, and write efficient code faster. Whether you're a developer, data scientist, or database administrator, this session will provide you with practical insights and techniques to elevate your data and AI projects. Join us to learn how to effectively use these advanced tools and take your data skills to the next level. Register here Episode 3: Build your own AI copilot for financial apps with PostgreSQL Join us to discover how to transform traditional financial applications into intelligent, AI-powered solutions with Azure Database for PostgreSQL. In this hands-on session, you'll learn to integrate generative AI for high-quality responses to financial queries using PDF-based Statements of Work and invoices, perform AI-driven data validation, apply the Azure AI extension, implement vector search with DiskANN indexes, enhance results with semantic re-ranking, use the LangChain framework, and leverage GraphRAG on Azure Database for PostgreSQL. By the end, you’ll have gained practical skills to build end-to-end AI-driven applications using your own data and projects. Register here Episode 4: Build advanced AI Agents with PostgreSQL Using a sample dataset of legal cases, we’ll show how AI technologies empower intelligent agents to provide high-quality answers to legal queries. In this session, you’ll learn to build an advanced AI agent with Azure Database for PostgreSQL, integrating generative AI for enhanced data validation, retrieval-augmented generation (RAG), semantic re-ranking, Semantic Kernel, and GraphRAG via the Apache AGE Graph extension. This practical demonstration offers insights into developing robust, intelligent solutions using your own data. Register here Join us for an inspiring and hands-on experience—don’t miss out! Get the full series details and register now: https://aka.ms/postgres-ai-reactor-seriesLangChain integration with Azure Database for PostgreSQL (Part 1)
Use LangChain to split documents into smaller chunks, generate embeddings for each chunk using Azure OpenAI, and store them in a PostgreSQL database via the pgvector extension. Then, we’ll perform a vector similarity search on the embedded documents.DiskANN on Azure Database for PostgreSQL – Now Generally Available
By Abe Omorogbe, Senior PM We’re thrilled to announce the General Availability (GA) of DiskANN for Azure Database for PostgreSQL unlocking fast, scalable, and cost-effective vector search for production workloads. Building on momentum from our private and public previews, this release brings major upgrades that directly reflect customer feedback for better performance, lower memory usage, and greater flexibility for advanced GenAI applications. Whether you're working with massive datasets or deploying on resource-constrained environments, DiskANN now offers an index that scales effortlessly. DiskANN delivers up to 10x faster speed, 4x lower costs and up to 96x lower memory footprint compared to the industry standard pgvector HNSW. In this post, we’ll highlight the following: Common pain points in large-scale vector search New features in the GA release Dive into product quantization (PQ) the main optimization that powers DiskANN’s performance Share internal testing results that demonstrate how DiskANN stacks up against alternatives like HNSW. Read on to see why DiskANN is ready for your most demanding vector search workloads. What is DiskANN? Developed by Microsoft Research and battle-tested across global services like Bing and Microsoft 365, DiskANN is a high-performance approximate nearest neighbor (ANN) search algorithm built for scalable vector search. It delivers the high recall, high throughput, and low latency required by today’s most demanding agentic AI and retrieval-augmented generation (RAG) workloads. DiskANN offers the following benefits: Low Latency: Its graph-based index structure minimizes SSD reads during search, enabling high throughput and consistently low query latency. Cost Efficiency: DiskANN’s design reduces memory usage up to 96x smaller than standard indexing methods helping lower infrastructure costs. Scalability: Optimized for massive datasets, DiskANN is built to efficiently handle millions of vectors, making it ideal for production-scale applications. Accuracy: DiskANN delivers highly accurate results without sacrificing speed or precision. Integration: DiskANN works natively with Azure Database for PostgreSQL, leveraging the power and flexibility of PostgreSQL. Breaking Through the Limits of Large-Scale Vector Search Vector search has become essential for powering AI applications from recommendation systems to agentic AI but scaling it has been anything but easy. If you've worked with large vector datasets, you've likely run into the same roadblocks: Your data is too big to fit in memory leading to slower searches. Building indexes takes forever and eats up your resources. You have no idea how long the indexing process will take or where it’s stuck. Your embedding model outputs high-dimensional vectors, but your database can’t handle them. Database bills spiral out of control due to memory intensive machines needed for efficient search on a large dataset. Sound familiar? These are not edge cases they’re the standard challenges faced by anyone trying to scale Postgres’s vector search capabilities into real-world production workloads. With the General Availability (GA) release of DiskANN for Azure Database for PostgreSQL, we’re tackling these problems head-on, bringing production-ready scale, speed, and efficiency to vector search. Let’s break down how. Product Quantization (PQ) for Lower Memory and Storage Costs (preview) One of the biggest blockers in vector search is fitting your data into memory. When using pgvector’s HNSW and your vector data doesn't fit in memory, this can lead to compute intensive I/O operations, causing degraded performance. With the GA release, DiskANN introduces a preview version of Product Quantization (PQ)—a powerful vector compression technique that makes it possible to store and search massive datasets with a dramatically smaller memory footprint. With PQ enabled, you get: Reduced memory usage — enabling datasets that previously couldn’t fit in RAM. Lower memory costs — compressed vectors mean smaller indexes and cheaper monthly bills. Faster performance — less I/O pressure means lower latency and higher throughput. Example results In our internal testing, we use pg_diskann on Azure Postgres to build an index of 35 million 768D vectors and ran benchmarking queries on an 8-core 32GB machine. The results were: 32x lower memory footprint than using pgvector’s HNSW and 4x lower cost due to significantly less resources needed to run vector search queries effectively compared to HNSW. Also, compared to standard HSNW, pg_diskann delivers up to 10x lower latency @ 95% recall especially in large scale scenarios with millions of vectors. When testing higher quality embedding such as OpenAI v3-large (3072 dimensions), we saw up to 96x lower memory footprint, due to extremely efficient compressing. In this scenario PQ compresses each vector from 12KB (3072 D, 4 bytes/D) to just 128B per quantized vector. Sign up for the preview today! To get access. Go Big: Supports vectors up to 16,000 dimensions Another big blocker for customers developing advanced GenAI applications with pgvector is that HNSW only supports indexing vectors up to 2,000 dimensions a limit that constrains the development of applications using high-dimensional embedding models which deliver high accuracy (i.e. text-embedding-large). With this release, DiskANN now supports vectors up to 16,000 dimensions. When you have product quantization enabled. Popular embedding models with over 2000 dimensions (text-embedding-large, E5-mistral-7b-instruct and NV-embed-v2) Faster Index Builds, Smarter Memory Usage Index creation has historically been a pain point, especially in previous versions of pg_diskann—especially for large datasets. In this GA release, we’ve significantly accelerated the build process through: Improved memory management using `maintenance_work_mem` more efficiently. Optimized algorithms that reduce disk I/O and CPU usage during indexing We’ve also published detailed documentation to guide you through best practices for faster index builds. The result? Index builds that are not only faster but more predictable and resource friendly. When indexing 1 millions vectors, the DiskANN GA version is ~2x faster. It took 696.0630 seconds vs 1172.3314 seconds in our DiskANN preview build. Real-Time Index Progress Tracking Previously, with pg_diskann building large indexes felt like working in the dark. Now, with the addition of improved progress reporting support, you can track exactly how far along your index build is—making it easier to monitor, plan, and troubleshoot during creation. Checking index build progress with PSQL in VSCode Use the following command in PSQL to check pg_diskann index build progress. SELECT phase, round(100.0 * blocks_done / nullif(blocks_total, 0), 1) AS "%" FROM pg_stat_progress_create_index; Using DiskANN on Azure Database for PostgreSQL Using DiskANN on Azure Database for PostgreSQL is easy. Enable the pgvector & diskann Extension: Allowlist the pgvector and diskann extension within your server configuration. Activating DiskANN in Azure Database for PostgreSQL Create Extension in Postgres: Create the pg_diskann extension on your database along with any dependencies. CREATE EXTENSION IF NOT EXISTS pg_diskann CASCADE; Create a Vector Column: Define a table to store your vector data, including a column of type vector for the vector embeddings. CREATE TABLE demo ( id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, embedding public.vector(3) ); INSERT INTO demo (embedding) VALUES ('[1.0, 2.0, 3.0]'), ('[4.0, 5.0, 6.0]'), ('[7.0, 8.0, 9.0]'); Index the Vector Column: Create an index on the vector column to optimize search performance. The pg_diskann PostgreSQL extension is compatible with pgvector, it uses the same types, distance functions and syntactic style. To use Product Quanatization sign up for the preview today! CREATE INDEX demo_embedding_diskann_idx ON demo USING diskann (embedding vector_cosine_ops) Perform Vector Searches: Use SQL queries to search for similar vectors based on various distance metrics (cosine similarity in the example below). SELECT id, embedding FROM demo ORDER BY embedding <=> '[2.0, 3.0, 4.0]' LIMIT 5; Ready to Dive In? DiskANN’s GA release transforms PostgreSQL into a fully capable vector search platform for production AI workloads. It delivers: Support for millions of compressed vectors Compatibility with pgvector Reduced memory and storage costs Faster index creation Support for high-dimensional vectors Real-time indexing progress visibility Whether you’re building an enterprise-scale retrieval system or optimizing costs in a lean AI application, Use the DiskANN today and explore the future of AI-driven applications with the power of Azure Database for PostgreSQL! Run our end-to-end sample RAG app with DiskANN Learn More DiskANN on Azure Database for PostgreSQL is ready for production workloads. With Product Quantization, support for high-dimensional vectors, faster index creation, and clearer operational visibility, you can now scale your vector search applications even further — all while keeping costs low. To learn more, check out our documentation and start building today!