postgresql
208 TopicsGraphRAG and PostgreSQL integration in docker with Cypher query and AI agents (Version 2*)
This is update from previous blog (version 1): GraphRAG and PostgreSQL integration in docker with Cypher query and AI agents | Microsoft Community Hub Review the business needs of this solution from version 1 What's new in version 2? MCP tools for GraphRAG and PostgreSQL with Apache AGE This solution now includes MCP tools for GraphRAG and PostgreSQL. There are five MCP tools exposed: [graphrag_search] Used to run query (local or global) with runtime-tunable API parameters. One important aspect is that query behavior can be tuned at runtime, without changing the underlying index. [age_get_schema_cached] Used for schema inspection and diagnostics. It returns the graph schema (node labels and relationship types) from cache by default; and can optionally refresh the cache by re‑querying the database. This tool is typically used for introspection or debugging, not for answering user questions about data. [age_entity_lookup] Used for quick entity discovery and disambiguation. It performs a simple substring match on entity names or titles and is especially useful for questions like “Who is X?” or as a preliminary step before issuing more complex graph queries. [age_cypher_query] Executes a user‑provided Cypher query directly against the AGE graph. This is intended for advanced users who already know the graph structure and want full control over traversal logic and filters. [age_nl2cypher_query] Bridges natural language and Cypher. This tool converts a natural‑language question into a Cypher query (using only Entity nodes and RELATED_TO edges), executes it, and returns the results. It is most effective for multi‑hop or structurally complex questions where semantic interpretation is needed first, but execution must remain deterministic. Besides that, This solution now uses Microsoft agent framework. It enables clean orchestration over MCP tools, allowing the agent to dynamically select between GraphRAG and graph query capabilities at runtime, with a looser coupling and clearer execution model than traditional Semantic Kernel function plugins. The new Docker image includes graphRAG3.0.5. This version stabilizes the 3.x configuration‑driven, API‑based architecture and improves indexing reliability, making graph construction more predictable and easier to integrate into real workflows. New architecture Updated Step 7 - run query in Jupyter notebook This step runs Jupyter notebook in docker, which is the same as stated in previous blog. > docker compose up query-notebook After clicking the link highlighted in the above screen shot, you can explore all files within the project in the docker, then find the query-notebook.ipynb. https://github.com/Azure-Samples/postgreSQL-graphRAG-docker/blob/main/project_folder/query-notebook.ipynb But in this new version of notebook, the graphRAG3.0.5 uses different library for local Search and global Search. New Step 8 - run agent and MCP tools in Jupyter notebook This step runs Jupyter notebook in docker. > docker compose up mcp-agent Click on the highlighted URL, you can start working on agent-notebook.ipynb. https://github.com/Azure-Samples/postgreSQL-graphRAG-docker/blob/main/project_folder/agent-notebook.... Multiple scenarios of agents with MCP tools are included in the notebook: GraphRAG search: local search and global search examples with direct mcp call. GraphRAG search: local search and global search examples with agent and include mcp tools. Cypher query in direct mcp call. Agent to query in natural language, and mcp tool included to convert NL2Cypher. Agent with unified mcp (all five mcp tools), and based on the question route to the corresponding tool. ['graphrag_search', 'age_get_schema_cached', 'age_cypher_query', 'age_entity_lookup', 'age_nl2cypher_query'] Router agent: selecting the right MCP tool The notebook also includes a router agent that has access to all five MCP tools and decides which one to invoke based on the user’s question. Rather than hard‑coding execution paths, the agent reasons about intent and selects the most appropriate capability at runtime. General routing guidance used in this solution Use [graphrag_search] when the question requires: full dataset understanding, themes, patterns, or trends across documents, exploratory or open‑ended analysis, global understanding or evaluation where we have a corpus of many tokens. In these cases, GraphRAG’s semantic retrieval and aggregation are a better fit than explicit graph traversal. Use AGE‑based tools [age_get_schema_cached, age_entity_lookup, age_cypher_query, age_nl2cypher_query] when the question involves: specific entities or explicit relationships, deterministic graph traversal or filtering, questions that depend on graph structure rather than document semantics, complex graph queries involving multiple entities or multi‑hop paths. Within the AGE toolset: [age_entity_lookup] is typically used for quick entity discovery or disambiguation. [age_cypher_query] is used when a precise Cypher query is already known. [age_nl2cypher_query] is used when the question is expressed in natural language but requires a non‑trivial Cypher query to answer. [age_get_schema_cached] is reserved for schema inspection and diagnostics. The router agent dynamically selects between semantic search and deterministic graph tools based on question intent, keeping retrieval, graph execution, and orchestration clearly separated and extensible. Note: The repository also includes [age_get_schema] and [age_get_schema_details] MCP tools for debugging and development purposes. These are not exposed to agents by default and are superseded by [age_get_schema_cached] for normal use. Key takeaways GraphRAG and postgreSQL AGE querying serve different purposes and each has its advantages. MCP tools provide a uniform interface to both semantic search and deterministic graph operations. Microsoft Agent Framework enables tool‑centric orchestration, where agents select the right capability at runtime instead of hard‑coding logic in prompts. The Jupyter‑based agent workflow makes it easy to experiment with different interaction patterns, from direct tool calls to fully routed agent execution. What's next In this solution, the MCP server and agent runtime are architecturally separated but deployed together in a single Docker container to demonstrate how MCP tools work and to keep local experimentation simple. There are other deployment options, such as running MCP servers remotely, where tools can be hosted and operated independently of the agent runtime. Contributions and enhancements are welcome.132Views1like0CommentsUnderstanding Hash Join Memory Usage and OOM Risks in PostgreSQL
Background: Why Memory Usage May Exceed work_mem work_mem is commonly assumed to be a hard upper bound on per‑query memory usage. However, for Hash Join operations, memory consumption depends not only on this parameter but also on: ✅ Data cardinality ✅ Hash table internal bucket distribution ✅ Join column characteristics ✅ Number of batches created ✅ Parallel workers involved Under low‑cardinality conditions, a Hash Join may place an extremely large number of rows into very few buckets—sometimes a single bucket. This causes unexpectedly large memory allocations that exceed the nominal work_mem limit. Background: What work_mem really means for Hash Joins work_mem controls the amount of memory available per operation (e.g., a sort or a hash) per node (and per parallel worker) before spilling to disk. Hash operations can additionally use hash_mem_multiplier×work_mem for their hash tables. [postgresql.org], [postgresqlco.nf] The Hash Join algorithm builds a hash table for the “build/inner” side and probes it with the “outer” side. The table is split into buckets; if it doesn’t fit in memory, PostgreSQL partitions work into batches (spilling to temporary files). Skewed distributions (e.g., very few distinct join keys) pack many rows into the same bucket(s), exploding memory usage even when work_mem is small. [postgrespro.com], [interdb.jp] In EXPLAIN (ANALYZE) you’ll see Buckets:, Batches:, and Memory Usage: on the Hash node; Batches > 1 indicates spilling/partitioning. [postgresql.org], [thoughtbot.com] The default for hash_mem_multiplier is version‑dependent (introduced in PG13; 1.0 in early versions, later 2.0). Tune with care; it scales the memory that hash operations may consume relative to work_mem. [pgpedia.info] A safe, reproducible demo (containerized community PostgreSQL) The goal is to show that data distribution alone can drive order(s) of magnitude difference in hash table memory, using conservative settings. In order to simulate the behavior we´ll use pg_hint_plan extension to guide the execution plans and create some data distribution that may not have a good application logic, just to force and show the behavior. Start PostgreSQL 16 container docker run --name=postgresql16.8 -p 5414:5432 -e POSTGRES_PASSWORD=<password> -d postgres:16.8 docker exec -it postgresql16.8 /bin/bash -c "apt-get update -y;apt-get install procps -y;apt-get install postgresql-16-pg-hint-plan -y;apt-get install vim -y;apt-get install htop -y" docker exec -it postgresql16.8 /bin/bash vi /var/lib/postgresql/data/postgresql.conf -- Adding pg_hint_plan to shared_preload_libraries psql -h localhost -U postgres create extension pg_hint_plan; docker stop postgresql16.8 docker start postgresql16.8 To connect to our docker container we use: psql -h localhost -p 5414 -U postgres Connect and apply conservative session-level settings We’ll discourage other join methods so the planner prefers Hash Join—without needing any extension. set hash_mem_multiplier=1; set max_parallel_workers=0; set max_parallel_workers_per_gather=0; set enable_parallel_hash=off; set enable_material=off; set enable_sort=off; set pg_hint_plan.debug_print=verbose; set client_min_messages=notice; set pg_hint_plan.enable_hint_table=on; Create tables and load data We´ll create two tables for the join, table_1, with a single row, table_h initially with 10mill rows drop table table_s; create table table_s (column_a text); insert into table_s values ('30020'); vacuum full table_s; drop table table_h; create table table_h(column_a text,column_b text); INSERT INTO table_h(column_a,column_b) SELECT i::text, i::text FROM generate_series(1, 10000000) AS t(i); vacuum full table_h; Run Hash Join (high cardinality) We´ll run the join using column_a in both tables, that was created previously having high cardinality in table_h explain (analyze,buffers,costs,verbose) SELECT /*+ HashJoin(s h) Leading((s h)) */ COUNT(*) FROM table_s s JOIN table_h h ON s.column_a= h.column_a; You should see a Hash node with small Memory Usage (a few MB) and Batches: 256 or similar due to our tiny work_mem, but no ballooning. Exact numbers vary by hardware/version/stats. (EXPLAIN fields and interpretation are documented here.) [postgresql.org] QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=280930.01..280930.02 rows=1 width=8) (actual time=1902.965..1902.968 rows=1 loops=1) Output: count(*) Buffers: shared read=54055, temp read=135 written=34041 -> Hash Join (cost=279054.00..280805.01 rows=50000 width=0) (actual time=1900.539..1902.949 rows=1 loops=1) Hash Cond: (s.column_a = h.column_a) Buffers: shared read=54055, temp read=135 written=34041 -> Seq Scan on public.table_s s (cost=0.00..1.01 rows=1 width=32) (actual time=0.021..0.022 rows=1 loops=1) Output: s.column_a Buffers: shared read=1 -> Hash (cost=154054.00..154054.00 rows=10000000 width=32) (actual time=1896.895..1896.896 rows=10000000 loops=1) Output: h.column_a Buckets: 65536 Batches: 256 Memory Usage: 2031kB Buffers: shared read=54054, temp written=33785 -> Seq Scan on public.table_h h (cost=0.00..154054.00 rows=10000000 width=32) (actual time=2.538..638.830 rows=10000000 loops=1) Output: h.column_a Buffers: shared read=54054 Query Identifier: 334721522907995613 Planning: Buffers: shared hit=10 Planning Time: 0.302 ms JIT: Functions: 11 Options: Inlining false, Optimization false, Expressions true, Deforming true Timing: Generation 0.441 ms, Inlining 0.000 ms, Optimization 0.236 ms, Emission 2.339 ms, Total 3.017 ms Execution Time: 1903.472 ms (25 rows) Findings (1) When we have the data totally distributed with high cardinality it takes only 2031kB of memory usage (work_mem), shared hit/read=54055 Force low cardinality / skew and re‑run We´ll update table_h to have column_a all values to '30020', so, having only 1 distinct value for all the rows in the table update table_h set column_a='30020', column_b='30020'; vacuum full table_h; Checking execution plan: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=279056.04..279056.05 rows=1 width=8) (actual time=3568.936..3568.938 rows=1 loops=1) Output: count(*) Buffers: shared read=54056, temp read=63480 written=63480 -> Hash Join (cost=279055.00..279056.03 rows=1 width=0) (actual time=2650.696..3228.610 rows=10000000 loops=1) Hash Cond: (s.column_a = h.column_a) Buffers: shared read=54056, temp read=63480 written=63480 -> Seq Scan on public.table_s s (cost=0.00..1.01 rows=1 width=32) (actual time=0.007..0.008 rows=1 loops=1) Output: s.column_a Buffers: shared read=1 -> Hash (cost=154055.00..154055.00 rows=10000000 width=7) (actual time=1563.987..1563.989 rows=10000000 loops=1) Output: h.column_a Buckets: 131072 (originally 131072) Batches: 512 (originally 256) Memory Usage: 371094kB Buffers: shared read=54055, temp written=31738 -> Seq Scan on public.table_h h (cost=0.00..154055.00 rows=10000000 width=7) (actual time=2.458..606.422 rows=10000000 loops=1) Output: h.column_a Buffers: shared read=54055 Query Identifier: 334721522907995613 Planning: Buffers: shared hit=6 read=1 dirtied=1 Planning Time: 0.237 ms JIT: Functions: 11 Options: Inlining false, Optimization false, Expressions true, Deforming true Timing: Generation 0.330 ms, Inlining 0.000 ms, Optimization 0.203 ms, Emission 2.311 ms, Total 2.844 ms Execution Time: 3584.439 ms (25 rows) Now, the Hash node typically reports hundreds of MB of Memory Usage, with more/larger temp spills (higher Batches, more temp_blks_*). What changed? Only the distribution (cardinality). (Why buckets/batches behave this way is covered in the algorithm references.) [postgrespro.com], [interdb.jp] Findings (2) When we have the data distributed with LOW cardinality it takes 371094kB of memory usage (work_mem), shared hit/read=54056 So, same amount of data being handled by the query in terms of shared memory, but totally different work_mem usage pattern due to the low cardinality and the join method (Hash Join) that put most of those rows in a single bucket and that is not limited by default, so, it can cause OOM errors at any time. Scale up rows to observe linear growth We´ll add same rows in table_h repeat times so we can play with more data (low cardinality) insert into table_h select * from table_h; vacuum full table_h; You’ll see Memory Usage and temp I/O scale with rowcount under skew. (Beware: this can become I/O and RAM heavy—do this incrementally.) [thoughtbot.com] NumRows table_h Shared read/hit Dirtied Written Temp read/written Memory Usage (work_mem) 10M 54056 0 63480+63480 371094kB 20M 108110 0 126956+126956 742188kB 80M 432434 (1,64GB) 0 0 253908+253908 2968750kB (2,8GB) Observability: what you will (and won’t) see EXPLAIN is your friend EXPLAIN (ANALYZE, BUFFERS) exposes Memory Usage, Buckets:, Batches: in the Hash node and temp block I/O. Batches > 1 is a near‑certain sign of spilling. [postgresql.org], [thoughtbot.com] Query Store / pg_stat_statements limitations Azure Database for PostgreSQL – Flexible Server Query Store aggregates runtime and (optionally) wait stats over time windows and stores them in azure_sys, with views under query_store.*. It’s fantastic to find which queries chew CPU/I/O or wait, but it doesn’t report per‑query transient memory usage (e.g., “how many MB did that hash table peak at?”) you can estimate reviewing the temporary blocks. [learn.microsoft.com] Under the hood, what you do get—whether via Query Store or vanilla PostgreSQL pg_stat_statements—are cumulative counters like shared_blks_read, shared_blks_hit, temp_blks_read, temp_blks_written, timings, etc. Those help confirm buffer/temp activity, yet no direct hash table memory metric exists. Combine them with EXPLAIN and server metrics to triangulate. [postgresql.org] Tip (Azure Flexible Server) Enable Query Store in Server parameters via pg_qs.query_capture_mode and (optionally) wait sampling via pgms_wait_sampling.query_capture_mode, then use query_store.qs_view to correlate temp block usage and execution times across intervals. [learn.microsoft.com] Typical OOM symptom in logs In extreme skew with concurrent executions, you may encounter: ERROR: out of memory DETAIL: Failed on request of size 32800 in memory context "HashBatchContext". This is a classic signature of hash join memory pressure. [postgresql.org], [thisistheway.wiki] What to do about it (mitigations & best practices) Don’t force Hash Join unless required If you used planner hints (e.g., pg_hint_plan) or GUCs (Grand Unified Configuration) to force Hash Join, remove them and let the planner re‑evaluate. (If you must hint, be aware pg_hint_plan is a third‑party extension and not available in all environments.) [pg-hint-pl...thedocs.io], [pg-hint-pl...thedocs.io] Fix skew / cardinality at the source Re‑model data to avoid low‑NDV (Number of Distinct Values in a column) joins (e.g., pre‑aggregate, filter earlier, or exclude degenerate keys). Ensure statistics are current so the planner estimates are realistic. (Skew awareness is limited; poor estimates → risky sizing.) [postgresql.org] Pick a safer join strategy when appropriate If distribution is highly skewed, Merge Join (with supporting indexes/sort order) or Nested Loop (for selective probes) might be more memory‑predictable. Let the planner choose, or enable alternatives by undoing GUCs that disabled them. [postgresql.org] Bound memory consciously Keep work_mem modest for mixed/OLTP workloads; remember it’s per operation, per node, per worker. Adjust hash_mem_multiplier judiciously (introduced in PG13; default now commonly 2.0) if you understand the spill trade‑offs. [postgresqlco.nf], [pgpedia.info] Observe spills and tune iteratively Use EXPLAIN (ANALYZE, BUFFERS) to see Batches (spills) and Memory Usage; use Query Store/pg_stat_statements to find which queries generate the most temp I/O. Raise work_mem for a session only when justified. [postgresql.org], [postgresql.org] Parallelism awareness Each worker can perform its own memory‑using operations; parallel hash join has distinct behavior. If you aren’t sure, temporarily disable parallelism to simplify analysis, then re‑enable once you understand the footprint. [postgresql.org] Validating on Azure Database for PostgreSQL – Flexible Server The behavior is not Azure‑specific, but you can reproduce the same sequence on Flexible Server (e.g., General Purpose). A few notes: Confirm/adjust work_mem, hash_mem_multiplier, enable_* planner toggles as session settings. (Azure exposes standard PostgreSQL parameters.) [learn.microsoft.com] Use Query Store to confirm stable shared/temporary block patterns across executions, then use EXPLAIN (ANALYZE, BUFFERS) per query to spot hash table memory footprints. [learn.microsoft.com], [postgresql.org] Changing some default parameters We´ll repeat previous steps in Azure Database for PostgreSQL Flexible server: set hash_mem_multiplier=1; set max_parallel_workers=0; set max_parallel_workers_per_gather=0; set enable_parallel_hash=off; set enable_material=off; set enable_sort=off; set pg_hint_plan.debug_print=verbose; set client_min_messages=notice; set pg_hint_plan.enable_hint_table=on; Creating and populating tables drop table table_s; create table table_s (column_a text); insert into table_s values ('30020'); vacuum full table_s; drop table table_h; create table table_h(column_a text,column_b text); INSERT INTO table_h(column_a,column_b) SELECT i::text, i::text FROM generate_series(1, 10000000) AS t(i); vacuum full table_h; vacuum full table_s; Query & Execution plan explain (analyze,buffers,costs,verbose) SELECT /*+ HashJoin(s h) Leading((s h)) */ COUNT(*) FROM table_s s JOIN table_h h ON s.column_a= h.column_a; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=279052.88..279052.89 rows=1 width=8) (actual time=3171.186..3171.191 rows=1 loops=1) Output: count(*) Buffers: shared hit=33 read=54023, temp read=135 written=34042 I/O Timings: shared read=184.869, temp read=0.278 write=333.970 -> Hash Join (cost=279051.84..279052.88 rows=1 width=0) (actual time=3147.288..3171.182 rows=1 loops=1) Hash Cond: (s.column_a = h.column_a) Buffers: shared hit=33 read=54023, temp read=135 written=34042 I/O Timings: shared read=184.869, temp read=0.278 write=333.970 -> Seq Scan on public.table_s s (cost=0.00..1.01 rows=1 width=32) (actual time=0.315..0.316 rows=1 loops=1) Output: s.column_a Buffers: shared read=1 I/O Timings: shared read=0.018 -> Hash (cost=154053.04..154053.04 rows=9999904 width=7) (actual time=3109.278..3109.279 rows=10000000 loops=1) Output: h.column_a Buckets: 131072 Batches: 256 Memory Usage: 2551kB Buffers: shared hit=32 read=54022, temp written=33786 I/O Timings: shared read=184.851, temp write=332.059 -> Seq Scan on public.table_h h (cost=0.00..154053.04 rows=9999904 width=7) (actual time=0.019..1258.472 rows=10000000 loops=1) Output: h.column_a Buffers: shared hit=32 read=54022 I/O Timings: shared read=184.851 Query Identifier: 5636209387670245929 Planning: Buffers: shared hit=37 Planning Time: 0.575 ms Execution Time: 3171.375 ms (26 rows) Findings (3) In Azure Database for PostgreSQL Flexible server, when we have the data totally distributed with high cardinality it takes only 2551kB of memory usage (work_mem), shared hit/read=54056 Skew it to LOW cardinality As we did previously, we change column_a to having only one different value in all rows in table_h update table_h set column_a='30020', column_b='30020'; vacuum full table_h; In this case we force the join method with pg_hint_plan: explain (analyze,buffers,costs,verbose) SELECT /*+ HashJoin(s h) Leading((s h)) */ COUNT(*) FROM table_s s JOIN table_h h ON s.column_a= h.column_a; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=279056.04..279056.05 rows=1 width=8) (actual time=4397.556..4397.560 rows=1 loops=1) Output: count(*) Buffers: shared hit=2 read=54055, temp read=63480 written=63480 I/O Timings: shared read=89.396, temp read=90.377 write=300.290 -> Hash Join (cost=279055.00..279056.03 rows=1 width=0) (actual time=3271.145..3987.154 rows=10000000 loops=1) Hash Cond: (s.column_a = h.column_a) Buffers: shared hit=2 read=54055, temp read=63480 written=63480 I/O Timings: shared read=89.396, temp read=90.377 write=300.290 -> Seq Scan on public.table_s s (cost=0.00..1.01 rows=1 width=32) (actual time=0.006..0.008 rows=1 loops=1) Output: s.column_a Buffers: shared hit=1 -> Hash (cost=154055.00..154055.00 rows=10000000 width=7) (actual time=1958.729..1958.731 rows=10000000 loops=1) Output: h.column_a Buckets: 262144 (originally 262144) Batches: 256 (originally 128) Memory Usage: 371094kB Buffers: shared read=54055, temp written=31738 I/O Timings: shared read=89.396, temp write=149.076 -> Seq Scan on public.table_h h (cost=0.00..154055.00 rows=10000000 width=7) (actual time=0.159..789.449 rows=10000000 loops=1) Output: h.column_a Buffers: shared read=54055 I/O Timings: shared read=89.396 Query Identifier: 8893575855188549861 Planning: Buffers: shared hit=5 Planning Time: 0.157 ms Execution Time: 4414.268 ms (25 rows) NumRows table_h Shared read/hit Dirtied Written Temp read/written Memory Usage (work_mem) 10M 54056 0 63480+63480 371094kB 20M 108110 0 126956+126956 742188kB 80M 432434 0 0 253908+253908 2968750kB We observe the same numbers compared with our docker installation. See the extension docs for installation/usage and the hint table for cases where you want to force a specific join method. [pg-hint-pl...thedocs.io], [pg-hint-pl...thedocs.io] FAQ Q: I set work_mem = 4MB. Why did my Hash Join report ~371MB Memory Usage? A: Hash joins can use up to hash_mem_multiplier × work_mem per hash table, and skew can cause large per‑bucket chains. Multiple nodes/workers multiply usage. work_mem is not a global hard cap. [postgresqlco.nf], [pgpedia.info] Q: How do I know if a Hash Join spilled to disk? A: In EXPLAIN (ANALYZE), Hash node shows Batches: N. N > 1 indicates partitioning and temp I/O; you’ll also see temp_blks_read/written in buffers and Temp I/O timings. [postgresql.org], [thoughtbot.com] Q: Can Query Store tell me per‑query memory consumption? A: Not directly. It gives time‑sliced runtime and wait stats (plus temp/shared block counters via underlying stats), but no “peak MB used by this hash table” metric. Use EXPLAIN and server metrics. [learn.microsoft.com], [postgresql.org] Q: I hit “Failed on request … in HashBatchContext.” What’s that? A: That’s an OOM raised by the executor while allocating memory. Reduce skew, avoid forced hash joins, or review per‑query memory and concurrency. [postgresql.org] Further reading Server parameters & memory (official docs): guidance on work_mem, shared_buffers, parallelism. [postgresql.org] Hash joins under the hood: deep dives into buckets, batches, and memory footprints. [postgrespro.com], [pgcon.org] hash_mem_multiplier: history and defaults by version. [pgpedia.info] EXPLAIN primer: how to read Hash node details, Batches, Memory Usage. [postgresql.org], [thoughtbot.com] Query Store (Azure Flexible): enable, query, and interpret. [learn.microsoft.com] Ready‑to‑use mitigation checklist (DBA quick wins) Remove joins hints/GUC overrides that force Hash Join; re‑plan. [pg-hint-pl...thedocs.io] Refresh stats; confirm realistic rowcount/NDV estimates. [postgresql.org] Consider alternate join strategies (Merge/Index‑Nested‑Loop) when skew is high. [postgresql.org] Keep work_mem conservative for OLTP; consider session‑scoped bumps only for specific analytic queries. [postgresql.org] Tune hash_mem_multiplier carefully only after understanding spill patterns. [postgresqlco.nf] Use EXPLAIN (ANALYZE, BUFFERS) to verify Batches and Memory Usage. [postgresql.org] Use Query Store/pg_stat_statements to find heavy temp/shared I/O offenders over time.Nasdaq builds thoughtfully designed AI for board governance with PostgreSQL on Azure
Authored by: Charles Federssen, Partner Director of Product Management for PostgreSQL at Microsoft and Mohsin Shafqat, Senior Manager, Software Engineering at Nasdaq When people think of Nasdaq, they usually think of markets, trading floors, and financial data moving at extraordinary speed. But behind the scenes, Nasdaq also plays an equally critical role in how boards of directors govern, deliberate, and make decisions. Nasdaq Boardvantage® is the company’s governance platform, used by more than 4,400 organizations worldwide—including nearly half of the Fortune 100. It’s where directors review board books, collaborate in an environment designed with robust security, and prepare for meetings that often involve some of the most sensitive information a company has. In recent years, Nasdaq set out to modernize Nasdaq Boardvantage with AI, without compromising security and reliability. That journey was featured in a Microsoft Ignite session, “Nasdaq Boardvantage: AI-Driven Governance on PostgreSQL and Foundry.” It offers a practical look at how Azure Database for PostgreSQL can support AI-driven applications where precision, isolation, and data control are non-negotiable. Introducing AI where trust is everything Board governance isn’t a typical productivity workload. Board packets can run 400 to 600 pages, meeting minutes are legal records, and any AI-generated insight must be confined to a customer’s own data. “Our customers trust us with some of their most strategic, sensitive data,” said Mohsin Shafqat, Senior Manager of Software Development at Nasdaq. That trust meant tackling several core challenges upfront, including: How do you minimize AI hallucinations in a governance context? How do you guarantee tenant isolation at scale? How do you keep data regional across a global customer base? A cloud foundation built for governance Before adding intelligence, Nasdaq decided to re-architect Nasdaq Boardvantage on Microsoft Azure, using Azure Kubernetes Service (AKS) to run containerized, multi-tenant workloads with strong isolation boundaries. Microsoft Foundry provides the managed foundation for deploying, governing, and operating AI models across this architecture, adding consistency, security, and control as intelligence is introduced. At the data layer, Azure Database for PostgreSQL and Azure Database for MySQL became the backbone for governance data. PostgreSQL, in particular, plays a central role in managing structured governance information alongside vector embeddings that support AI-driven features. Together, these services give Nasdaq the performance, security, and operational control required for a highly regulated, multi-tenant environment, while still moving quickly. Key architectural choices included: Tenant isolation by design, with separate databases and storage Regional deployments to align with data residency requirements High availability and managed operations, so teams could focus on product innovation instead of infrastructure maintenance PostgreSQL and pgvector: Powering context-aware AI With that foundation in place, Nasdaq was ready to carefully introduce AI. One of the first AI capabilities was intelligent document summarization. Board materials that once took hours to review could now be condensed into concise, contextually accurate summaries. Under the hood, this required more than just calling an LLM. Nasdaq uses pgvector, natively supported in Azure Database for PostgreSQL, to store and query embeddings generated from board documents. This allows the platform to perform hybrid searches that combine traditional SQL queries with vector similarity to retrieve the most relevant context before sending anything to a language model. Instead of treating AI as a black box, the team built a pipeline where: Documents are processed with Azure Document Intelligence to preserve structure and meaning Content is chunked and embedded Embeddings are stored in PostgreSQL with pgvector Vector similarity searches retrieve precise context for each AI task Because this runs inside PostgreSQL, the same database benefits from Azure’s built-in high availability, security controls, and operational tooling – delivering tangible results, including a 25% reduction in overall board preparation time and internal testing shows 91–97% accuracy for AI-generated summaries and meeting minutes. From summaries to an AI Board Assistant With summarization working in production, Nasdaq expanded further. The team is now building an AI-powered Board Assistant that will help directors prepare for upcoming meetings by surfacing trends, risks, and insights from prior discussions. This introduces a new level of scale. Years of board data across thousands of customers translate into millions of embeddings. PostgreSQL continues to anchor this architecture, storing vectors for semantic retrieval while MySQL supports complementary non-vector workloads. Across Nasdaq Boardvantage, users are advised to always review AI outputs, and no customer data is shared or used to train external models. “We designed AI for governance, not the other way around,” Shafqat said. More importantly, customers trust the system because security, isolation, and data control were engineered in from day one. Looking ahead Nasdaq’s work shows how Azure Database for PostgreSQL can support AI workloads that demand both intelligence and integrity. With PostgreSQL at the core, Nasdaq has built a governance platform that scales globally, respects regulatory boundaries, and introduces AI in a way that feels dependable and not experimental. What started as a modernization of Nasdaq Boardvantage is now influencing how Nasdaq approaches AI across the enterprise. To dive deeper into the architecture and hear directly from the engineers behind it, watch the Ignite session and check out these resources: Watch the Ignite breakout session for a technical walkthrough of how Nasdaq Boardvantage is built, including PostgreSQL on Azure, pgvector, and Microsoft Foundry in production. Read the case study to see how Nasdaq introduced AI into board governance and what changed for directors, administrators, and decision-making. Watch the Ignite broadcast for a candid discussion on Azure Database for PostgreSQL, Azure HorizonDB, and what it takes to scale AI-driven governance.Microsoft at PGConf India 2026
I’m genuinely excited about PGConf India 2026. Over the past few editions, the conference has continued to grow year over year—both in size and in impact—and it has firmly established itself as one of the key events on the global PostgreSQL calendar. That momentum was very evident again in the depth, breadth, and overall quality of the program for PGConf India 2026. Microsoft is proud to be a diamond sponsor for the conference again this year. At Microsoft, we continue our contributions to the upstream PostgreSQL open-source project—as well as to serve our customers with our Postgres managed service offerings, both Azure Database for PostgreSQL and our newest Postgres offering, Azure HorizonDB . On the open-source front, Microsoft had 540 commits in PG18, including major features like Asynchronous IO. We’re also excited to grow our Postgres open-source contributors team, and so happy to welcome Noah Misch to our team. Noah is a Postgres committer who has deep expertise in PostgreSQL security and is focused on correctness and reliability in PostgreSQL’s core. Microsoft at PGConf India 2026: Highlights from Our Speakers PGConf India has several tracks, all of which have some great talks I am looking forward to. First, the plug. 😊 Microsoft has some amazing talks this year, and we have 8 different talks spread across all the tracks. Postgres on Azure : Scaling with Azure HorizonDB, AI, and Developer Workflows, by Aditya Duvuri & Divya Bhargov Resizing shared buffer pool in a running PostgreSQL server: important, yet impossible, by Ashutosh Bapat Ten Postgres Hacker Journeys—and what they teach us, by Claire Giordano How Postgres can leverage disk bandwidth for better TPS, by Nikhil Chawla AWSM FSM! Free Space Maps Decoded by Nikhil Sontakke Journey of developing a Performance Optimization Feature in PostgreSQL, by Rahila Syed Build Agentic AI with Semantic Kernel and Graph RAG on PostgreSQL, by Shriram Muthukrishnan & Palak Chaturvedi All things Postgres @ Microsoft (2026 edition) by Sumedh Pathak Claire is an amazing speaker and has done a lot of work over the last several years documenting and understanding PostgreSQL committers and hackers. Her talk will definitely have some key insights and nuggets of information. Rahila’s talk will go in depth on performance optimization features and how best to test and benchmark them, and all the tools and tricks she has used as part of the feature development. This should be a must-see talk for anyone doing performance work. Diving Deep: Case Studies & Technical Tracks One of the tracks I’m really excited about is the Case Study track. I see these as similar to ‘Experience’ papers in academia. An experience paper documents what actually happened when applying a technique or system in the real world, what worked, what didn’t, and why. One of the talks I’m looking forward to is ‘Operating Postgres Logical Replication at Massive Scale’ by Sai Srirampur from Clickhouse. Logical Replication is an extremely useful tool, and I’m curious to learn more about pitfalls and lessons learnt when running this at large scale. Another interesting one I’m curious to hear is ‘Understanding the importance of the commit log through a database corruption’ by Amit Kumar Singh from EDB. The Database Engine Developers track allows us to go deep into the PostgreSQL code base and get a better understanding of how PostgreSQL is built. Even if you are not a database developer, this track is useful to understand how and why PostgreSQL does things, helping you be a better user of the database. With the rise of larger machines and memory available in the Cloud, different and newer memory architectures/tiers and serverless product offerings, there is a lot of deep dive in PostgreSQL’s memory architecture. There are some great talks focused on this area, which should be must-see for anyone interested in this topic: Resizing shared buffer pool in a running PostgreSQL server: important, yet impossible by Ashutosh Bapat from Microsoft From Disk to Data: Exploring PostgreSQL's Buffer Management by Lalit Choudhary from PurnaBIT Beyond shared_buffers: On-Demand Memory in Modern PostgreSQL by Vaibhav Popat from Google Finally, the Database Administration and Application Developer tracks have some really great content as well. They cover a wide range of topics, from PII data, HA/DR, Query Tuning to connection pooling and understanding conflict detection and resolution. PostgreSQL in India: A Community Effort Worth Celebrating Conferences like these are a rich source of information, dramatically increasing my personal understanding of the product and the ecosystem. Separately, they are also a great way to meet other practitioners in the space and connect with people in the industry. For people in Bangalore, another great option is the PostgreSQL Bangalore Meetup, and I’m super happy that Microsoft was able to join the ranks of other companies to host the eighth iteration of this meetup. Finally, I would be remiss in not mentioning the hard work done by the PGConf India organizing team including Pavan Deolasse, Ashish Mehra, Nikhil Sontakke, Hari Kiran, and Rushabh Lathia who are making all of this happen. Also, a big shout out to the PGConf India Program Committee (Amul Sul, Dilip Kumar, Marc Linster, Thomas Munro, Vigneshwaran C) for putting together an amazing set of talks. I look forward to meeting all of you in Bangalore! Be sure to drop by the Microsoft booth to say hello (and to snag a free pair of our famous socks). I’d love to learn more about how you’re using Postgres.269Views3likes0CommentsAlphaLife Sciences powers regulatory-compliant AI workflows with PostgreSQL on Azure
by: Maxim Lukiyanov, PhD, Principal PM Manager and Sharon Chen, CEO and Founder at AlphaLife Sciences In life sciences, every document is deeply interconnected and highly regulated. Each clinical trial, regulatory submission, safety report, or protocol amendment is expected to stand up to rigorous audit. For AlphaLife Sciences, that challenge became an opportunity to rethink how AI could support expert human judgment. At Microsoft Ignite, AlphaLife Sciences CEO and Founder Sharon Chen shared how her team is building an AI-powered content authoring platform on top of Azure Database for PostgreSQL, designed specifically for the demands of regulated life sciences workflows. She also explained why the team is excited about Azure HorizonDB as a new PostgreSQL service that is built to meet the needs of modern enterprise workloads. This post explores how AlphaLife Sciences uses PostgreSQL as more than a data store. It’s a semantic foundation for compliant, auditable AI agents. Bringing AI into regulated workflows Life sciences organizations are under constant pressure. R&D pipelines are growing and patent windows are shrinking. A single clinical study report can take six months or more to complete, involving multiple teams and hundreds of source documents. Building efficiency into these processes is critical, but only if it doesn’t compromise accuracy, traceability, or compliance. That’s where many AI solutions fall short. Generating text is one thing, but generating verifiable, version-controlled, regulation-aware content is another. AlphaLife Sciences needed agents that could: Work across massive volumes of structured and unstructured data (Word, PDF, Excel, PowerPoint) Maintain full traceability from generated content back to source documents Support audits, amendments, and regulatory review Minimize hallucinations in a zero-tolerance environment Integrate naturally into the tools writers already use Bringing data, search, and AI together in one system At the core of AlphaLife Sciences’ platform is Azure Database for PostgreSQL. The team chose it for flexibility, extensibility, and for how well it supports modern AI workloads. Instead of stitching together separate systems for SQL queries, vector search, text indexing, and metadata tracking, AlphaLife Sciences consolidated everything into PostgreSQL. One of its flagship use cases is clinical trial protocol authoring, a process that typically involves: Designing trial objectives and endpoints Pulling references from previous studies Writing and revising hundreds of pages of structured content Managing multiple rounds of amendments and regulatory feedback With AI agents backed by PostgreSQL, that workflow changes dramatically. When a writer generates a protocol section, the system can automatically retrieve relevant references from a centralized document pool, using semantic search rather than manual lookup. Writers select the sources they want, apply rules or prompts, and let AI draft the section - complete with citations tied back to the original documents. Reviewers can inspect the source, adjust the output, or insert it directly into the document. For protocol amendments, the platform allows teams to upload inputs (Word or Excel), analyze which sections are affected, and generate structured suggestions. Changes are clearly highlighted, compared against previous versions, and summarized in amendment tables. AI agents that respect the rules A recurring theme in Chen’s talk was restraint. “We don’t just need AI that can write,” she said. “We need intelligent agents that understand data structures, follow regulatory laws, and manage version control.” This is where PostgreSQL-backed AI agents shine. By grounding AI behavior in structured schemas, controlled access, and auditable records, automation works hand-in-hand with human experts. AI accelerates first drafts, consistency checks, discrepancy detection, and cross-document analysis, but final accountability stays firmly with professionals. In some cases, the time to complete processes has been reduced by more than 50%. Azure Database for PostgreSQL has become more than a database for AlphaLife Sciences. It’s a semantic knowledge base that supports: Structured and unstructured data Vector similarity search Metadata-driven traceability Compliance, security, and auditability AI agents operating safely inside enterprise constraints By grounding AI agents directly in the database, reasoning, retrieval, and generation all operate against the same governed source of truth. “AI agents are not here to replace human beings,” said Chen. “They extend structured, compliant, and auditable thinking.” What’s next for AlphaLife Sciences with PostgreSQL on Azure Looking ahead, Chen shared her excitement about Azure HorizonDB and the capabilities it brings to PostgreSQL on Azure. Features like in-database AI model management, semantic operators for classification and summarization, and faster vector search with DiskANN align closely with AlphaLife Sciences’ needs as their platform continues to scale. “We’re extremely happy to see the launch of Azure HorizonDB and the more powerful tools coming with it,” Chen said. “By putting everything together in PostgreSQL, we don’t have to rely on different systems for vector search, text indexing, or SQL queries. Everything happens in one streamlined system. The code becomes cleaner, efficiency improves, and the AI agents perform much more elegantly.” Learn more AlphaLife Sciences’ journey was featured during the Microsoft Ignite session “The Blueprint for Intelligent AI Agents Backed by PostgreSQL.” Watch the session to learn more and see a demo of how Azure Database for PostgreSQL transforms the protocol and protocol amendment process. When AI is anchored in a strong PostgreSQL foundation, innovation and compliance don’t have to compete - they can reinforce each other.220Views4likes0CommentsJanuary 2026 Recap: Azure Database for PostgreSQL
We just dropped the 𝗝𝗮𝗻𝘂𝗮𝗿𝘆 𝟮𝟬𝟮𝟲 𝗿𝗲𝗰𝗮𝗽 for Azure Database for PostgreSQL and this one’s all about developer velocity, resiliency, and production-ready upgrades. January 2026 Recap: Azure Database for PostgreSQL • PostgreSQL 18 support via Terraform (create + upgrade) • Premium SSD v2 (Preview) with HA, replicas, Geo-DR & MVU • Latest PostgreSQL minor version releases • Ansible module GA with latest REST API features • Zone-redundant HA now configurable via Azure CLI • SDKs GA (Go, Java, JS, .NET, Python) on stable APIs Read the full January 2026 recap here and see what’s new (and what’s coming) - January 2026 Recap: Azure Database for PostgreSQLSupporting ChatGPT on PostgreSQL in Azure
Affan Dar, Vice President of Engineering, PostgreSQL at Microsoft Adam Prout, Partner Architect, PostgreSQL at Microsoft Panagiotis Antonopoulos, Distinguished Engineer, PostgreSQL at Microsoft The OpenAI engineering team recently published a blog post describing how they scaled their databases by 10x over the past year, to support 800 million monthly users. To do so, OpenAI relied on Azure Database for PostgreSQL to support important services like ChatGPT and the Developer API. Collaborating with a customer experiencing rapid user growth has been a remarkable journey. One key observation is that PostgreSQL works out of box for very large-scale points. As many in the public domain have noted, ChatGPT grew to 800M+ users before OpenAI started moving new and shardable workloads to Azure Cosmos DB. Nevertheless, supporting the growth of one of the largest Postgres deployments was a great learning experience for both of our teams. Our OpenAI friends did an incredible job at reacting fast and adjusting their systems to handle the growth. Similarly, the Postgres team at Azure worked to further tune the service to support the increasing OpenAI workload. The changes we made were not limited to OpenAI, hence all our Azure Database for PostgreSQL customers with demanding workloads have benefited. A few of the enhancements and the work that led to these are listed below. Changing the network congestion protocol to reduce replication lag Azure Database for PostgreSQL used the default CUBIC congestion control algorithm for replication traffic to replicas both within and outside the region. Leading up to one of the OpenAI launch events, we observed that several geo-distributed read replicas occasionally experienced replication lag. Replication from the primary server to the read replicas would typically operate without issues; however, at times, the replicas would unexpectedly begin falling behind the primary for reasons that were not immediately clear. This lag would not recover on its own and would grow to a point when, eventually, automation would restart the read replica. Once restarted, the read replica would once again catch up, only to repeat this cycle again within a day or less. After an extensive debugging effort, we traced the root cause to how the TCP congestion control algorithm handled a higher rate of packet drops. These drops were largely a result of high point-to-point traffic between the primary server and its replicas, compounded by the existing TCP window settings. Packet drops across regions are not unexpected; however, the default congestion control algorithm (CUBIC) treats packet loss as a sign of congestion and does an aggressive backoff. In comparison, the Bottleneck Bandwidth and Round-trip propagation time (BBR) congestion control algorithm is less sensitive to packet drops. Switching to BBR, adding SKU specific TCP window settings, and switching to fair queuing network discipline (which can control pacing of outgoing packets at hardware level) resolved this issue. We’ll also note that one of our seasoned PostgreSQL committers provided invaluable insights during this process, helping us pinpoint the issue more effectively. Scaling out with Read replicas PostgreSQL primaries, if configured properly, work amazingly well in supporting a large number of read replicas. In fact, as noted in the OpenAI engineering blog, a single primary has been able to power around 50+ replicas across multiple regions. However, going beyond this increases the chance of impacting the primary. For this reason, we added the cascading replica support to scale out reads even further. But this brings in a number of additional failure modes that need to be handled. The system must carefully orchestrate repairs around lagging and failing intermediary nodes, safely repointing replicas to new intermediary nodes while performing catch up or rewind in a mission critical setup. Furthermore, disaster recovery (DR) scenarios can require a fast rebuild of a replica and as data movement across regions is a costly and time-consuming operation, we developed the ability to create a geo replica from a snapshot of another replica in the same region. This feature avoids the traditional full data copy process, which may take hours or even days depending on the size of the data, by leveraging data for that cluster that already exists in that region. This feature will soon be available for all our customers as well. Scaling out Writes These improvements solved the read replica lag problems and read scale but did not help address the growing write scale for OpenAI. At some point, the balance tipped and it was obvious that the IOPs limits of a single PostgreSQL primary instance will not cut it anymore. As a result OpenAI decided to move new and shardable workloads to Azure Azure Cosmos DB, which is our default recommended NoSQL store for fully elastic workloads. However, some workloads, as noted in the OpenAI blog are much harder to shard. While OpenAI is using Azure Database for PostgreSQL flexible server, several of the write scaling requirements that came up have been baked into our new Azure HorizonDB offering, which entered private preview in November 2025. Some of the architectural innovations are described in the following sections. Azure HorizonDB scalability design To better support more demanding workloads, Azure HorizonDB introduces a new storage layer for Postgres that delivers significant performance and reliability enhancements: More efficient read scale out. Postgres read replicas no longer need to maintain their own copy of the data. They can read pages from the single copy maintained by the storage layer. Lower latency Write-Ahead Logging (WAL) writes and higher throughput page reads via two purpose-built storage services designed for WAL storage and Page storage. Durability and high availability responsibilities are shifted from the Postgres primary to the storage layer, allowing Postgres to dedicate more resources to executing transactions and queries. Postgres failovers are faster and more reliable. To understand how Azure HorizonDB delivers these capabilities, let’s look at its high‑level architecture as shown in Figure 1. It follows a log-centric storage model, where the PostgreSQL writeahead log (WAL) is the sole mechanism used to durably persist changes to storage. PostgreSQL compute nodes never write data pages to storage directly in Azure HorizonDB. Instead, pages and other on-disk structures are treated as derived state and are reconstructed and updated from WAL records by the data storage fleet. Azure HorizonDB storage uses two separate storage services for WAL and data pages. This separation allows each to be designed and optimized for the very different patterns of reads and writes PostgreSQL does against WAL files in contrast to data pages. The WAL server is optimized for very low latency writes to the tail of a sequential WAL stream and the Page server is designed for random reads and writes across potentially many terabytes of pages. These two separate services work together to enable Postgres to handle IO intensive OLTP workloads like OpenAI’s. The WAL server can durably write a transaction across 3 availability zones using a single network hop. The typical PostgreSQL replication setup with a hot standby (Figure 2) requires 4 hops to do the same work. Each hop is a component that can potentially fail or slow down and delay a commit. Azure HorizonDB page service can scale out page reads to many hundreds of thousands of IOPs for each Postgres instance. It does this by sharding the data in Postgres data files across a fleet of page servers. This spreads the reads across many high performance NVMe disks on each page server. 2 - WAL Writes in HorizonDB Another key design principle for Azure HorizonDB was to move durability and high availability related work off PostgreSQL compute allowing it to operate as a stateless compute engine for queries and transactions. This approach gives Postgres more CPU, disk and network to run your application’s business logic. Table 1 summarizes the different tasks that community PostgreSQL has to do, which Azure HorizonDB moves to its storage layer. Work like dirty page writing and checkpointing are no longer done by a Postgres primary. The work for sending WAL files to read replicas is also moved off the primary and into the storage layer – having many read replicas puts no load on the Postgres primary in Azure HorizonDB. Backups are handled by Azure Storage via snapshots, Postgres isn’t involved. Task Resource Savings Postgres Process Moved WAL sending to Postgres replicas Disk IO, Network IO Walsender WAL archiving to blob storage Disk IO, Network IO Archiver WAL filtering CPU, Network IO Shared Storage Specific (*) Dirty Page Writing Disk IO background writer Checkpointing Disk IO checkpointer PostgreSQL WAL recovery Disk IO, CPU startup recovering PostgreSQL read replica redo Disk IO, CPU startup recovering PostgreSQL read replica shared storage Disk IO background, checkpointer Backups Disk IO pg_dump, pg_basebackup, pg_backup_start, pg_backup_stop Full page writes Disk IO Backends doing WAL writing Hot standby feedback Vacuum accuracy walreceiver Table 1 - Summary of work that the Azure HorizonDB storage layer takes over from PostgreSQL The shared storage architecture of Azure HorizonDB is the fundamental building block for delivering exceptional read scalability and elasticity which are critical for many workloads. Users can spin up read replicas instantly without requiring any data copies. Page Servers are able to scale and serve requests from all replicas without any additional storage costs. Since WAL replication is entirely handled by the storage service, the primary’s performance is not impacted as the number of replicas changes. Each read replica can scale independently to serve different workloads, allowing for workload isolation. Finally, this architecture allows Azure HorizonDB to substantially improve the overall experience around high availability (HA). HA replicas can now be added without any data copying or storage costs. Since the data is shared between the replicas and continuously updated by Page Servers, secondary replicas only replay a portion of the WAL and can easily keep up with the primary, reducing failover times. The shared storage also guarantees that there is a single source of truth and the old primary never diverges after a failover. This prevents the need for expensive reconciliation, using pg_rewind, or other techniques and further improves availability. Azure HorizonDB was designed from the ground up with learnings from large scale customers, to meet the requirements of the most demanding workloads. The improved performance, scalability and availability of the Azure HorizonDB architecture make Azure a great destination for Postgres workloads.4.2KViews11likes0CommentsAzure PostgreSQL Lesson Learned #14: Hitting the Max Storage Limits Blocking Further Scale‑Up
Co‑authored with HaiderZ-MSFT Case Overview A customer attempted to increase storage for their Azure Database for PostgreSQL Flexible Server but was unable to exceed 32 TiB. Investigation confirmed that the server was deployed in a region where the maximum supported storage is capped at 32 TiB, meaning no additional scale‑up was possible. This limitation required exploring alternative storage options and potential redeployment strategies to support growing workload demands. Symptoms: How the Problem Appears Storage size maxes out at 32 TiB Portal does not allow selecting any higher value This behavior is expected because the platform enforces the maximum storage supported by the region and tier. Root Cause: Regional Storage Limit of 32 TiB Reached Azure Database for PostgreSQL Flexible Server supports up to 32 TiB of storage on Premium SSD within the customer’s region. Once this limit is reached: Further storage growth is not possible Storage cannot be downgraded or changed in-place The customer must migrate to a tier that supports larger disks (e.g., Premium SSD v2, where available) Premium SSD v2 supports: Up to 64 TiB 1 GiB granular sizing Higher throughput and IOPS However, its availability and supported capabilities can vary by region. Step‑By‑Step Troubleshooting & Migration Guidance STEP 1 — Validate Current Storage Azure Portal → Server → Compute + Storage Confirms storage is already at the maximum the region can offer STEP 2 — Confirm Tier Limitations Check documentation for storage caps by tier and region from Storage options | Microsoft Learn STEP 3 — Attempt a PITR‑Based Redeployment Migration from Premium SSD → Premium SSD v2 is possible via following these steps: https://learn.microsoft.com/en-us/azure/postgresql/compute-storage/concepts-storage-migrate-ssd-to-ssd-v2?tabs=portal-restore-custom-point ⚠ Important: The PITR wizard always deploys the restored server in the same region as the source server. There is no option to change regions during PITR. Therefore, customers must check if Premium SSD v2 becomes selectable during PITR within that same region. If SSD v2 is not listed in the dropdown, it means: The region does not support SSD v2 for PostgreSQL Flexible Server The customer cannot exceed 32 TiB on Flexible Server in that region Final Outcome The server reached the maximum supported storage (32 TiB) for its region and storage could not be increased further. To exceed this limit, the customer needs to move to Premium SSD v2 (if supported) Migration must be done via following these steps: https://learn.microsoft.com/en-us/azure/postgresql/compute-storage/concepts-storage-migrate-ssd-to-ssd-v2?tabs=portal-restore-custom-point Tip: Alternative Path — Use Dump & Restore to a New Server With Larger Storage If Premium SSD v2 does not appear as an available storage option during the Migration from Premium SSD → Premium SSD v2 workflow, our customer still has another viable path to exceed the 32 TiB limit: ➡ Option: Perform a pg_dump / pg_restore to a New Server with Higher Storage Capacity Data can be migrated by creating a brand‑new Flexible Server in a region that supports higher storage tiers (including Premium SSD v2), then migrating the data using standard PostgreSQL backup tools Best Practices Add proactive storage alerts (70%, 80%, 90%). Validate regional storage limits before provisioning servers. Architect for growth by selecting a region/tier that aligns with future capacity needs. Request Premium SSD v2 quota increases early when planning large workloads. Helpful References Premium SSD v2 for Azure PostgreSQL Flexible Server https://learn.microsoft.com/azure/postgresql/compute-storage/concepts-storage-premium-ssd-v2 How to Migrate from Premium SSD → Premium SSD v2 (PITR) https://learn.microsoft.com/en-us/azure/postgresql/compute-storage/concepts-storage-migrate-ssd-to-ssd-v2?tabs=portal-restore-custom-point205Views0likes0Comments