Microsoft Blog for PostgreSQL articles

Potential Consequences of Using Postgres as a Job Queue

richyen — Thu, 30 Apr 2026 15:10:50 GMT

Introduction

At small scale, using Postgres as a job queue is totally fine, and I’d even say it’s the right call. Fewer moving parts, one less system to manage, ACID guarantees on your jobs. What’s not to love?

The problem is that “small scale” has a ceiling, and the ceiling is lower than most people expect. When you’ve got thousands of concurrent workers hammering a jobs table with SELECT ... FOR UPDATE SKIP LOCKED, things start to behave in ways that aren’t obvious from the application layer. CPU usage creeps up. Also vacuum sometimes can’t keep up. Finally, in the wait event stats, you start seeing ominous entries like LWLock:MultiXactSLRU stacking up across many backends.

This pattern has tripped up teams more than a few times, and it usually plays out the same way: everything works fine in dev and staging, then goes off a cliff in production once the concurrency gets real. So let’s dig into why this happens, and what the alternatives look like.

The Typical Pattern

When using Postgres as a job queue, the standard approach looks something like this:

CREATE TABLE job_queue ( id bigserial PRIMARY KEY, status text NOT NULL DEFAULT 'pending', payload jsonb NOT NULL, created_at timestamptz NOT NULL DEFAULT now(), locked_by text, locked_at timestamptz ); CREATE INDEX idx_job_queue_status ON job_queue (status) WHERE status = 'pending';

Workers grab jobs with:

UPDATE job_queue SET status = 'processing', locked_by = 'worker-42', locked_at = now() WHERE id = ( SELECT id FROM job_queue WHERE status = 'pending' ORDER BY created_at LIMIT 1 FOR UPDATE SKIP LOCKED ) RETURNING *;

And then mark them done:

UPDATE job_queue SET status = 'completed' WHERE id = $1;

Some users may DELETE the row entirely. Either way, the lifecycle is: insert, lock-and-update, update-or-delete. Repeated thousands of times per second.

At low concurrency, this works very smoothly. SKIP LOCKED means workers don’t block each other waiting for the same row. Postgres handles the locking, visibility, and ordering. It’s elegant.

So where does it break?

The MultiXact SLRU Problem

When multiple transactions hold locks on the same row, Postgres stores the set of lockers as a MultiXact ID – a pointer into a side structure under pg_multixact/.

With SELECT ... FOR UPDATE SKIP LOCKED, users might think MultiXacts aren’t involved – after all, SKIP LOCKED is supposed to avoid contention. But in practice, with many concurrent workers all racing to lock rows, there are brief windows where multiple transactions reference the same row before one of them “wins” and the others skip. If you combine this with any FOR SHARE or FOR KEY SHARE locks (which are commonly created implicitly by foreign key checks), MultiXact IDs start accumulating quickly.

The MultiXact data lives in SLRU buffers (Simple Least Recently Used) – a small, fixed-size shared memory cache. When backends need to read or write MultiXact data, they acquire LWLocks to access these buffers. Under high concurrency, this becomes a bottleneck:

 wait_event_type | wait_event 
-----------------+------------------- 
 LWLock          | MultiXactMemberSLRU 
 LWLock          | MultiXactOffsetSLRU

You’ll see dozens or hundreds of backends piled up on these waits. The SLRU cache is small (by design – it’s a fixed number of pages in shared memory), and when the working set of MultiXact lookups exceeds what fits in the cache, you get constant eviction and re-reads from disk. Every lock acquisition and release on a job row potentially triggers a MultiXact SLRU lookup, and at thousands of concurrent sessions, those lookups serialize on LWLocks.

The result: CPU gets pegged, throughput collapses, and latency spikes – not because the queries are expensive, but because the locking infrastructure itself is overwhelmed.

Bloat: The Silent Killer

The other side of this coin is table and index bloat. Every job row goes through multiple updates (and possibly a delete), and each of those operations creates a new tuple version in the heap. The old versions stick around until VACUUM cleans them up.

On a busy job queue table:

Dead tuples accumulate faster than autovacuum can clean them. By the time autovacuum finishes one pass, tens of thousands of new dead tuples have appeared. The table grows and grows.

Index bloat compounds the problem. Every index on the table also accumulates dead entries. The partial index on status = 'pending' gets thrashed especially hard, since rows constantly enter and leave that condition.

Sequential scans get slower. As the table bloats, even index scans start doing more I/O because the heap pages are sparsely populated. Vacuum reclaims space at the end of the table, but can’t reclaim space in the middle (unless the pages are completely empty).

Job queue tables can grow to tens of gigabytes when the actual “live” data was only a few megabytes. It makes everything slower: scans, vacuum, even pg_dump.

You can mitigate this by running vacuum more aggressively (lower autovacuum_vacuum_scale_factor, higher autovacuum_vacuum_cost_limit), or by partitioning the table and dropping old partitions. But at some point, you’re fighting the fundamental mismatch between MVCC’s design goals and the write pattern of a job queue.

CPU and Lock Overhead

Beyond the SLRU contention and bloat, there’s just the raw overhead of using Postgres’s full transactional machinery for what is essentially a FIFO dispatch operation:

Every lock/unlock is a full WAL-logged transaction. Grabbing a job writes WAL. Marking it complete writes WAL. Deleting it writes WAL. On a system processing thousands of jobs per second, the WAL volume from the job queue alone can saturate your wal_writer and checkpoint processes.

SKIP LOCKED still touches rows. The name suggests rows are skipped, but Postgres still has to find them, check their lock status, and move on. With high concurrency, many workers end up scanning past the same locked rows before finding one they can claim. This is wasted CPU.

Snapshot management overhead also becomes an issue. Each transaction needs a consistent snapshot, and with thousands of concurrent transactions, the ProcArray (the structure that tracks active transactions) becomes a contention point itself. You might see LWLock:ProcArrayLock waits alongside the MultiXact ones.

Vacuum contention. While vacuum is cleaning up dead tuples, it needs locks too. On a table under constant write pressure, vacuum can interfere with the workers and vice versa. I’ve seen systems where disabling autovacuum on the job queue table improved throughput in the short term.

Better Alternatives

So what should you use instead? It depends on your requirements, but there are several options that handle high-throughput job dispatch more gracefully than a Postgres table.

Advisory Locks (Staying in Postgres)

If you want to stay within Postgres and avoid adding infrastructure, advisory locks are worth considering for certain queue patterns. Instead of locking rows, you lock on an abstract numeric key:

-- Worker tries to acquire a lock on the job ID SELECT pg_try_advisory_lock(id) FROM job_queue WHERE status = 'pending' ORDER BY created_at LIMIT 1;

Advisory locks are lightweight – they don’t touch the heap, don’t create MultiXact entries, and don’t generate dead tuples. They live entirely in shared memory. The trade-off is that you lose the atomicity of FOR UPDATE SKIP LOCKED: you need to handle the case where a lock is acquired but the job processing fails, and you need to release the lock explicitly (or rely on session-end cleanup).

This approach works well when the queue depth is manageable and you want to avoid the MVCC overhead. But it’s still Postgres, so you’re still subject to connection limits, ProcArray overhead, and general resource contention at very high session counts.

pgq (Skytools)

pgq is purpose-built for exactly this problem. It’s a queue implementation that sits inside Postgres but uses a batching model that avoids most of the row-level locking and MVCC pitfalls. Events are written to a queue table, but consumers read them in batches and the queue maintenance is done via a ticker process that manages rotation.

The key advantages:

No row-level contention. Consumers don’t lock individual rows.

Built-in batch processing. Events are consumed in chunks, reducing transaction overhead.

Efficient cleanup. Old events are rotated out rather than vacuumed row-by-row.

The downside is that pgq is not as actively maintained as it once was, and it adds operational complexity (the ticker daemon, consumer registration, etc.). But for teams already deep in the Postgres ecosystem, it’s a battle-tested option.

Redis

For many teams, Redis is the natural choice for job queues. Using Redis lists (BRPOPLPUSH or the Streams API), you get:

Sub-millisecond dispatch latency. No disk I/O, no MVCC, no vacuum.

Atomic pop operations. Workers grab jobs without any locking protocol.

Simple scaling. Redis handles thousands of concurrent consumers trivially.

The trade-off is durability. Redis can persist to disk, but it’s not ACID. If Redis crashes between a pop and the job completing, you might lose or duplicate work (though Redis Streams with consumer groups mitigate this significantly). For most job queue use cases, at-least-once delivery is acceptable, and Redis does that well.

Kafka

For truly high-throughput, distributed workloads, Apache Kafka is the heavyweight option. Kafka partitions give you parallel consumption with ordering guarantees per partition, durable storage, and replay capability. It’s the right tool when:

You need to process thousands of events per second

Multiple consumers need to read the same events

You want event replay or audit trails

Your architecture is already event-driven

The operational overhead is nontrivial – ZooKeeper (or KRaft), brokers, topic management, consumer group coordination. But for teams already running Kafka for other reasons, adding a job queue topic is practically free.

Choosing the Right Tool

Here’s a rough decision guide:

Under 100 concurrent workers, simple jobs, Postgres with SKIP LOCKED is fine

Moderate concurrency, want to stay in Postgres, Advisory locks or pgq

High throughput, low-latency dispatch, Redis (Lists or Streams)

Massive scale, distributed, event replay, Kafka

Many teams that start with Postgres (reasonably) hit scaling problems and then try to fix Postgres rather than recognizing that the workload has outgrown the tool. They throw more autovacuum workers at it, increase max_connections, add connection poolers – all of which help at the margins, but don’t address the fundamental issue: Postgres’s MVCC and locking machinery wasn’t designed for this access pattern at high concurrency.

Conclusion

Postgres is great, but it can’t be the best tool for every job. Using it as a job queue is a perfectly valid choice when your scale is modest. But when you’re running thousands of concurrent workers, the combination of MultiXact SLRU contention, heap bloat, vacuum pressure, and raw locking overhead will eventually push you toward a purpose-built solution.

The good news is that you don’t have to rip out everything. Advisory locks can buy you headroom without adding infrastructure. Redis can handle dispatch while Postgres keeps owning the data. And if you’re already using Kafka, a job topic is a natural fit. Take your pick – there are many queueing options out there!

Connection Scaling in Elastic Clusters

EbruAydin — Wed, 22 Apr 2026 14:31:00 GMT

As your applications grow, you need to manage database connections carefully to keep performance predictable and reliable. Azure Database for PostgreSQL Elastic Clusters, powered by Citus, support horizontal scaling by distributing data and queries across multiple nodes. This raises a key question: how do connections behave as you add more clients, grow the cluster, or upgrade node specifications?

In this post, you’ll see how connection handling behaves in Elastic Clusters through controlled benchmarks across a small set of configurations.

Core count: 2 and 4
Cluster node count: 4 and 8

Using these setups, we measured how throughput, latency, and resource usage change as connection counts increase for different workloads. The goal is not just to show numbers, but to explain why those numbers behave the way they do. We deliberately chose small SKUs (2 cores/8 GB and 4 cores/32 GB) because it is easier to observe how a smaller compute responds as connection counts grow.

These results help you understand:

What influences connection capacity
When scaling up helps more than scaling out
How to tune your cluster to match real workloads

Future posts will expand this analysis to more configurations. For now, these insights can help you make informed decisions about cluster sizing and connection management.

What you'll learn:

How single-shard and multi-shard queries behave under increasing connection loads
The impact of scaling up (more cores per node) vs. scaling out (more nodes)
When PgBouncer helps—and when it hurts
Practical limits imposed by memory, CPU, and connection parameters
Configuration guidelines for different workload patterns

How Elastic Clusters Handle Client and Internal Connections

Before we dive into the performance results, let’s first look at how Elastic Clusters handle connections and execute queries.

Cluster Architecture: Where Connections Go

In Elastic Clusters, data is distributed across multiple nodes using Citus. Each node can accept client connections and execute queries. This is true even when the data lives on another node. This design allows you to scale horizontally while staying compatible with PostgreSQL.

When your application opens a connection, the flow looks like this:

The client connects through a load balancer on port 7432
The load balancer routes the connection to any available node
That node executes the query and talks to other nodes if needed

From the client’s point of view, this looks like a single server. Behind the scenes, however, the node can open additional internal connections to efficiently fetch distributed data.

Query Types Determine Connection Use

How many connections a query consumes depends on what kind of query you run.

Single-Shard Queries

Single-shard queries target data that lives on exactly one node. A typical example is a lookup by the distribution key.

If you connect to the node that owns the data, the query runs locally
If you connect to a different node, Citus uses an internal connection to fetch the data

Multi-Shard (Fan-Out) Queries

Multi-shard queries need data from multiple nodes. Aggregations and analytical queries often fall into this category. For these queries:

The connected node has internal connections to many nodes
Each node processes its local shards in parallel
Results are sent back and combined

As a result, one client connection can fan out into many internal connections.

In both cases, when Citus needs to fetch data from another node, it first attempts to reuse a cached internal connection. If none is available, it creates a new connection and caches it.

Important note: Explicit transactions change the behavior of single-shard queries. When a query runs inside a BEGIN … END block, Citus sends extra commands (BEGIN TRANSACTION + assign ID, COMMIT) to the worker node alongside the actual query. These extra network roundtrips add some coordination work compared to auto‑commit mode.

Key Configuration Parameters

Elastic Clusters expose configuration parameters that control how connections are created, cached, and limited. To understand connection scaling—or to tune it effectively—you need to know what these parameters do and when they matter. This section focuses on a subset; see the Citus blog posts and documentation for more details.

Parameter	Scope	Description	Typical use case	Default
max_connections	Cluster wide	Total connections allowed	Set based on resources	Varies by SKU
citus.max_client_connections	Per-node	Connections allowed from clients	Limit client load per node	Varies by SKU
citus.max_cached_conns_per_worker	Per client connection	Cached internal connections	Control parallelism in fan-out queries	1
citus.max_adaptive_executor_pool_size	Per client connection	Max connections for parallel multi-shard execution	Control parallelism in fan-out queries	Varies by SKU, 1 or 16

In PostgreSQL, Memory Plays a Key Role in Connection Scaling:

Idle connections typically use 2–5 MB
Active connections often use 10–20 MB, depending on work_mem and query behavior

Some Parameters Only Affect Certain Query Types:

Parameters like citus.max_adaptive_executor_pool_size only affect multi-shard queries.

Internal Connection Caching Is Critical:

Internal connection caching has a large impact on performance.

Setting citus.max_cached_conns_per_worker to 0 forces Citus to open new connections repeatedly
This leads to additional connection setup work and reduces overall throughput
Values greater than 1 only help multi-shard workloads by allowing more parallelism

In practice, disabling internal caching is usually a bad idea.

SKU Choice Changes Which Limits You Hit First

Smaller nodes tend to reach memory thresholds sooner
Larger nodes may hit connection limits before CPU or memory saturation

How We Benchmarked

To understand how connections scale in Elastic Clusters, we ran a set of controlled benchmarks using standard PostgreSQL tooling.

Test Environment

Tool: pgbench (PostgreSQL's standard benchmarking utility), running on a VM in the same region and zone as the cluster

Dataset: Distributed pgbench_accounts table with a scale factor of 3000 (~38 GB) and distribution column aid (account ID).

Setup commands:

# Create table structure pgbench -i -I "dt" # Distribute the accounts table psql -c "SELECT create_distributed_table('pgbench_accounts', 'aid');" # Populate with data pgbench -i -I "gvp" -s 3000 # Add index for multi-shard queries CREATE INDEX index_bid ON pgbench_accounts(bid);

Workloads

Single-Shard Query: The single-shard workload targets one shard using the distribution key.

\set aid random(1, 100000 * scale) BEGIN; SELECT abalance FROM pgbench_accounts WHERE aid = :aid; END;

Multi-Shard Query: The multi-shard workload aggregates data across multiple shards.

\set bid random(1, scale) BEGIN; SELECT sum(abalance) FROM pgbench_accounts WHERE bid = :bid; END;

How the Benchmarks Run

Each benchmark:

Ran for 600 seconds
Used 16 pgbench worker threads
Varied the number of client connections (-c) per run

For each cluster configuration, we gradually increased the client count. We stopped when we observed connection or out-of-memory errors.

Test Matrix

We evaluated combinations of core counts, node counts, and query types:

Workload	Configuration tested
Single-shard	2-core/4-node, 2-core/8-node, 4-core/4-node, 4-core/8-node
Multi-shard	2-core/4-node, 2-core/8-node, 4-core/4-node, 4-core/8-node
Single-shard with PgBouncer	2-core/4-node, 2-core/8-node

Single-Shard Query Performance

Single‑shard queries are a good model for OLTP workloads. They represent indexed lookups that target a single row (or a small set of rows).

Massive Throughput Gains with Scaling

Peak throughput climbed from ~11.4k TPS (2c4n) to ~48.3k TPS (4c8n), a 4.3× increase with quadruple cores and double the nodes.

Higher Concurrency = Saturation Point

Each configuration has an optimal operating point for throughput.

Near-Linear Scaling

Doubling cores roughly doubled throughput (≈2.4–2.5× gains), and doubling nodes added ~70–75% more TPS.

Key Takeaways from Single-shard Experiments

1. Near-Linear Scaling with Resources

Doubling CPU cores approximately doubled throughput, and doubling node count added 70-75% more throughput:

Configuration	Peak TPS	Peak Clients	CPU at Peak
2-core, 4-node	11,400	64	~95%
2-core, 8-node	19,400	192	~93%
4-core, 4-node	27,500	128	~97%
4-core, 8-node	48,300	224	~90%

What this means: Adding compute resources effectively increases capacity. The slight sub-linearity when adding nodes (1.7× instead of 2×) is because the extra round-trips from explicit transactions only affect remote shard queries and the fraction for remote queries grows with the cluster size. In a 4-node cluster, 75% of queries hit a remote shard; in a 8-node cluster, 87.5% do. Since each remote query inside a BEGIN … END block pays the extra round-trip cost, the aggregate overhead increases as more nodes are added.

2.Saturation Points Vary by Configuration

Each cluster configuration reaches peak throughput at a specific client count. Beyond this point, additional clients lead to:

Throughput plateau
Latency rises as the system operates near maximum capacity
Increased contention and context switching

Smaller configurations reach peak throughput sooner (64 clients for 2c4n), while larger ones handle several hundred concurrent clients before reaching peak.

3. Memory Constrains Maximum Connections, Not Node Count

A key insight is that adding nodes does not necessarily increase total client capacity when the workload is constrained by memory limits.

2-core clusters: Maximum ~500 concurrent clients (both 4-node and 8-node)
4-core clusters: Maximum ~1000 concurrent clients (both 4-node and 8-node)

Why? Each node maintains roughly client_count total connections:

client_count / node_count external client connections (via load balancer)
Remaining connections are cached internal Citus connections

Once memory capacity is fully utilized, adding nodes alone does not always raise client capacity. External connections drop per node, but internal connections replace them, so each node still carries roughly the same load—and still can run out of memory.

4. CPU Utilization Reaches 90-97% at Peak

All configurations fully utilized available CPU at peak throughput, confirming CPU as the primary throughput bottleneck (with memory limiting connection capacity separately).

5. Latency Characteristics

Latency remains low (<5-10ms) until the system approaches saturation. Larger clusters maintain better latency under load:

4c8n cluster: Sub-5ms latency up to ~200 clients
2c4n cluster: Latency exceeds 10-20ms once saturated (~64 clients)

Practical implication: Right-sizing your cluster provides headroom for traffic spikes without latency degradation.

Single-Shard Query Performance with PgBouncer

On Elastic Clusters, you can enable PgBouncer using server parameters. After enabling it, you connect to PgBouncer instances through the load balancer on port 8432. This connection pooling layer allows the cluster to handle far more concurrent client connections

PgBouncer instances: One per node, behind load balancer
Pool size: 50 connections per node (default)
Pooling mode: Transaction pooling

Eliminated Connection Limits

Handled thousands of concurrent clients without out-of-memory errors

Slightly Lower Peak TPS

~10% reduction in maximum throughput due to pooling overhead

Linear Latency Growth

Predictable queuing behavior once pool saturates

Key Takeaways from Single-shard Experiments with PgBouncer

1. Graceful handling of high connection counts

Without PgBouncer, 2‑core clusters reached memory capacity around 500 clients. With PgBouncer enabled, tests successfully ran with 1000+ concurrent clients. Throughput plateaued as the pool saturated, but the system remained stable.

2. Throughput-Latency Trade-Off

Once the connection pool fills (~50 active connections per node), additional clients queue:

Throughput stabilizes at the pool's processing capacity
Latency increases with queue depth
Predictable, graceful behavior under high load

3. When to use PgBouncer

Recommended for:

Applications with bursty connection patterns (many short-lived connections)
High connection counts that exceed node memory capacity
Workloads where occasional queuing latency is acceptable

Not recommended for:

Applications requiring maximum throughput from steady workloads
Long-running transactions (incompatible with transaction pooling)
Scenarios where every millisecond of latency matters

Multi-Shard Query Performance

Multi-shard (fan-out) queries aggregate or join data across multiple nodes, representing analytical or reporting workloads.

Massive Throughput Gains with Scaling

Scaling up and out dramatically improved fan-out query throughput – from ~52 TPS to ~1008 TPS on the largest (≈20× gain).

Low Concurrency Saturation

Multi-shard queries peaked at low client counts—just 8 clients for 2-core clusters and ~96 for 4-core, 8-node setups.

Latency Improves with Scale

Larger clusters maintained sub-100 ms latency under higher concurrency, while smaller ones degraded quickly.

Memory & I/O Bottlenecks

Sufficient memory is crucial for fan-out queries, as memory starvation causes throughput to plateau well before CPU is fully utilized.

Key Takeaways from Multi-shard Experiments

1. Lower Absolute Throughput Compared to Single-shard Workloads

Even the best-performing configuration (4c8n at ~1000 TPS) achieves ~2% of single-shard throughput (~48k TPS). This reflects the inherent complexity of the analytical fan-out queries: cross-node data aggregation, significantly large amount of data retrieval.

2. Scaling Provides Dramatic Gains

While absolute TPS remains modest, the 20× improvement from smallest to largest demonstrates that multi-shard workloads benefit enormously from scaling.

Configuration	Peak TPS	Peak Clients	Latency at Peak
2-core, 4-node	52	8	~150ms
2-core, 8-node	168	8	~50ms
4-core, 4-node	489	32	~65ms
4-core, 8-node	1,008	96	~95ms

3. Saturates at Low Concurrency

Multi-shard queries reach peak throughput at fewer concurrent clients than single-shard queries:

2-core clusters: Saturate at just 8 clients
4-core, 8-node: Saturates around 96 clients

After these points, the system maintains stable TPS, with latency rising as load increases.

4. Memory and I/O Bottlenecks Dominate Small Configurations

The 2-core configurations (8 GB RAM per node) showed clear resource pressure:

Memory pressure: Working sets exceeded available RAM, causing paging
High IOPS: Thousands of disk operations per second indicated swapping to disk
Throughput ceiling: Memory availability capped TPS before CPU was fully utilized

In contrast, 4-core configurations (32 GB RAM per node) kept working sets in memory, achieving much higher throughput with minimal I/O.

Key insight: For multi-shard workloads, sufficient memory is more important than CPU cores. Adequate memory provisioning is essential to unlock full performance.

5. Latency Escalates Rapidly Under Overload

All configurations delivered fast response times (<50ms) at low loads. Once saturated:

2c4n cluster: Latency increased noticeably under sustained overload
4c8n cluster: Remained under 100ms until approaching the 96-client saturation point

Practical implication: For multi-shard query workloads, over-provision resources to maintain consistent and predictable latency.

Conclusion

Connection scaling in Azure Database for PostgreSQL Elastic Clusters is a multifaceted challenge that depends on workload characteristics, cluster configuration, and resource constraints. Key takeaways from our benchmarking on read workloads:

For Single-Shard OLTP-like Workloads:

Scaling provides near-linear throughput gains (4.3× from 2c4n to 4c8n)
Memory, not node count, determines maximum concurrent connections
CPU becomes the throughput bottleneck at peak load
PgBouncer trades ~10% throughput for almost unlimited connection scalability

For Multi-Shard OLAP-like Workloads:

Throughput operates at a different scale than single‑shard workloads
Relative gains from scaling are massive (20× improvement observed)
Memory sufficiency is critical—adequate RAM is essential to maintain strong performance
Saturation occurs at low concurrency; keep client counts conservative

General Principles:

Scale up (higher SKU) to support more connections and memory-intensive queries
Scale out (more nodes) to increase aggregate throughput and data capacity
Use PgBouncer to manage connection bursts and exceed node memory limits
Monitor continuously and adjust based on actual workload patterns

By understanding these dynamics and applying the decision frameworks provided, you can architect Elastic Clusters that deliver optimal performance, reliability, and cost-efficiency for your specific application requirements.

References and Resources

General Availability Refresh: Mirroring Azure Database for PostgreSQL in Microsoft Fabric

scoriani — Wed, 22 Apr 2026 07:21:19 GMT

Introduction

We are excited to announce the General Availability Refresh of Mirroring Azure Database for PostgreSQL in Microsoft Fabric. This update introduces several new capabilities designed to reduce friction, improve transparency, and increase trust in PostgreSQL data for analytics and AI workloads. Database professionals, DBAs, and developers can now leverage a more robust, flexible, and transparent solution for integrating PostgreSQL data in Microsoft Fabric.

Mirroring Azure Database for PostgreSQL enables seamless integration of transactional data into Microsoft Fabric, supporting advanced analytics and AI scenarios. Previously, users faced limitations in data type support, operational requirements, and troubleshooting transparency. The General Availability Refresh directly addresses these challenges, delivering a more reliable and user-friendly experience.

Native Data Type Support

One of the most significant enhancements is support for PostgreSQL native data types, including JSON and JSONB. Users can now mirror complex, semi-structured data without conversion, preserving the full fidelity of source data. Additionally, transparent replication to varchar(max) or varbinary is now supported, ensuring that even custom or less common types can be accurately mirrored in Fabric.

Previously, data type limitations often required workarounds or manual transformations, introducing complexity and risk. With this update, data flows more naturally from PostgreSQL to Fabric, streamlining analytics and AI pipelines and reducing the time spent on data preparation.

Mirroring now preserves PostgreSQL-native types—including JSON and JSONB—without requiring schema transformations or type coercion. This ensures:

Full fidelity replication of semi-structured data
Elimination of intermediate serialization (e.g., string encoding)

Tables containing JSON/JSONB columns are mirrored into Fabric with their structure intact, enabling:

Direct querying of nested JSON fields
Consistent schema representation across source and analytical layers

Once mirrored, JSON content can be queried natively within Fabric workloads, allowing:

Hybrid analytical scenarios (relational + semi-structured)
Use of familiar SQL patterns for JSON traversal and extraction

Flexible Server High Availability Support for PG versions earlier than 17

The refresh expands support for Azure Database for PostgreSQL Flexible Server with high availability enabled on versions earlier than 17. This means organizations can leverage mirroring with HA configurations without needing to upgrade their database engine, improving operational flexibility and minimizing disruption.

In the past, mirroring required specific PostgreSQL versions, limiting adoption for organizations with established HA deployments. Now, the broader compatibility allows teams to maintain their preferred configurations and benefit from seamless data integration in Fabric.

Improved Transparency: Enhanced Error Messaging and Dedicated PostgreSQL UDFs

Transparency is critical for troubleshooting and maintaining trust in data pipelines. The General Availability Refresh introduces enhanced error messaging, providing clear and actionable insights into replication issues. Dedicated PostgreSQL user-defined functions (UDFs) further support monitoring and diagnostics, enabling users to quickly identify and resolve problems.

--Validates that all system and configuration requirements are met before starting CDC mirroring. SELECT * FROM azure_cdc.check_prerequisites(); -- Quickly verify which extension version is deployed — critical for troubleshooting SELECT azure_cdc.azure_cdc_version(); -- Returns a detailed list of errors and issues detected during CDC operations SELECT * FROM azure_cdc.get_health_status('', ''); -- Scans every eligible user table in the database and returns the mirroring readiness status for each SELECT * FROM azure_cdc.get_all_tables_mirror_status();

These features streamline troubleshooting, reduce downtime, and empower teams to maintain robust mirroring operations. The improvements are a substantial step forward from previous releases, which offered limited visibility into replication health and errors.

Unblocking Mirroring for servers with Read Replica enables

We also removed existing block for creating Mirrored databases on Flexible Servers with Read Replicas created. Now primary servers with one or more Read Replicas can serve as source for Mirroring their databases in Fabric with no limitations.

Conclusion

The new capabilities significantly reduce friction for analytics and AI workloads in Microsoft Fabric. By removing technical barriers and increasing operational simplicity, the General Availability Refresh empowers organizations to unlock the full potential of their PostgreSQL data for advanced analytics and AI applications.

The General Availability Refresh of Mirroring Azure Database for PostgreSQL in Microsoft Fabric delivers critical enhancements: native data type support, expanded Flexible Server HA compatibility, replication identity improvements, and better transparency. Together, these features make mirroring more robust, flexible, and trustworthy for analytics and AI workloads.

We encourage technical professionals, DBAs, and developers to adopt these new capabilities and explore how they can transform data integration in Microsoft Fabric. For more information, visit the official documentation and resources:

March 2026 Recap: Azure Database for PostgreSQL

gauri-kasar — Wed, 15 Apr 2026 18:58:45 GMT

Hello Azure community,

March was packed with major feature announcements for Azure Database for PostgreSQL. From the general availability of SSDv2, cascading read replicas, to online migration and new monitoring capabilities for logical replication slots to help ensure slots are preserved, this update brings a range of improvements to performance, scale, and reliability.

Features

SSDv2 - Generally Available
Cascading Read replica - Generally Available
Online migration using PgOutput plugin - Generally Available
Google AlloyDB as a migration source - Generally Available
EDB Extended Server as a migration source - Generally Available
Logical replication slot synchronization metrics - Preview
Defender Security Assessments - Preview
New enhancements in the PostgreSQL VS Code Extension
Latest PostgreSQL minor versions: 18.3, 17.9, 16.13, 15.17, 14.22
New extension support for PostgreSQL 18 on Azure Database for PostgreSQL
Guide on PostgreSQL Buffer Cache Analysis, query rewriting and elastic clusters

SSDv2 - Generally Available

Premium SSD v2 is now generally available for Azure Database for PostgreSQL Flexible Server, delivering significant performance and cost-efficiency improvements for I/O‑intensive workloads. It offers up to 4× higher IOPS, lower latency, and improved price‑performance.

With independent scaling of storage and performance, you only pay for what you need. Premium SSD v2 supports storage scaling up to 64 TiB, with performance reaching 80,000 IOPS and 1,200 MiB/s throughput, without tying performance to disk size. IOPS and throughput can be adjusted instantly, with no downtime.

Additionally, built‑in baseline performance at no additional cost ensures consistent performance even for smaller deployments, making Premium SSD v2 a strong choice for modern, high‑demand PostgreSQL applications.

For details about the Premium SSD v2 release, see the GA Announcement Blog and documentation

Cascading read replica - Generally available

Cascading read replicas are now generally available, giving customers greater flexibility to create read replicas from existing read replicas. This capability supports up to two levels of replication and up to 30 read replicas in total, with each read replica able to host up to five cascading replicas.

With cascading read replicas, you can more effectively distribute read traffic across multiple replicas, deploy regional or hierarchical read replicas closer to end users, reduce read latency, and improve overall query performance for read‑heavy workloads. In addition, we’ve rolled out switchover support for both intermediate and cascading read replicas, making it easier to manage replica topologies. Learn more about cascading read replicas through our documentation and a detailed blog walkthrough.

Online migration using PgOutput plugin - Generally Available

The new addition of the PgOutput plugin helps make your Online migration to Azure more robust and seamless. The native "Out-of-the-Box" support that PgOutout offers is more suited for Online Production migrations compared to other logical decoding plugins. PgOutput offers higher throughput and superior performance compared to other logical decoding plugins ensuring your Online migration has very limited downtime. PgOutput also offers fine-grained filtering using Publications where you can migrate specific tables and filter by specific operations.

For more details about this update, see the documentation.

Google AlloyDB as a migration source - Generally Available

Google AlloyDB is now supported as a source in Azure Database for PostgreSQL Migration Service. You can use this capability to migrate your AlloyDB workloads directly to Azure Database for PostgreSQL, using either offline or online migration options. This support helps you move your PostgreSQL databases to Azure with confidence, while taking advantage of Azure’s flexibility and scalability.

To know more about this feature, visit our documentation.

EDB Extended Server as a migration source - Generally Available

Azure Database for PostgreSQL Migration Service now supports EDB Extended Server as a migration source. This enables you to migrate EDB Extended Server workloads to Azure Database for PostgreSQL using both offline and online migration methods. With this addition, you can transition PostgreSQL databases to Azure smoothly and benefit from the scale and flexibility of the Azure platform.

For more details about this update, see the documentation.

Logical replication slot sync status metric - Preview

You can now monitor whether your logical replication slots are failover‑ready using the new logical_replication_slot_sync_status metric, now in preview. This metric provides a simple binary signal indicating whether logical replication slots are synchronized across High availability (HA) primary and standby nodes. It helps you quickly assess failover readiness without digging into replication internals especially valuable for CDC pipelines such as Debezium and Kafka, where data continuity during failover is critical.

Learn more about logical replication metrics in the documentation.

Defender Security Assessments - Preview

In March, we introduced two new Microsoft Defender for Cloud CSPM security recommendations for Azure Database for PostgreSQL Flexible Server, now available in public preview:

Geo-redundant backups should be enabled for PostgreSQL Servers
require_secure_transport should be set to "on" for PostgreSQL Servers

These integrated assessments continuously evaluate database configuration settings against security best practices, helping customers proactively identify and manage security posture risks for their Azure PostgreSQL servers while maintaining alignment with internal and industry standards.

Additional security posture assessments for Azure PostgreSQL will be introduced as they become available.

To learn more, refer to the reference table for all data security recommendations in Microsoft Defender for Cloud.

New enhancements in the PostgreSQL VS Code Extension

The March release (v1.20) of the PostgreSQL VS Code extension delivers new server management capabilities, enhanced query plan analysis, visual improvements, and a batch of bug fixes.

Clone Server: You can now clone an Azure PostgreSQL Flexible Server directly from within the extension. The clone operation is available from the server management UI, allowing you to duplicate a server configuration including region, SKU, and settings without leaving VS Code.
Entra ID Authentication for AI-Powered Schema Conversion: The Oracle-to-PostgreSQL migration experience now supports Microsoft Entra ID authentication for Azure OpenAI connectivity, replacing API key–based authentication. This enables enterprise-grade identity management and access control for AI-powered schema conversion workflows.
Query Plan Visualization Improvements: The Copilot-powered “Analyze with Copilot” feature for query plans has been improved with more relevant optimization recommendations and smoother SQL attachment handling during plan analysis.
Apache AGE Graph Visualizer Enhancements: The graph visualizer received a visual refresh with modernized edge rendering, a color-coded legend, and a new properties pane for exploring element details.
Object Explorer Deep Refresh: The Object Explorer now supports refreshing expanded nodes in place, so newly created tables and objects appear immediately without needing to disconnect and reconnect.
Settings Management: The extension now supports both global user settings and local .vscode/settings.json, providing more robust connection settings management across configuration sources.
Bug Fixes: This release includes numerous bug fixes across script generation (DDL for triggers, materialized views, and functions), IntelliSense (foreign table support), JSON data export, query execution, and server connectivity.

Latest PostgreSQL minor versions: 18.3, 17.9, 16.13, 15.17, 14.22

Azure PostgreSQL now supports the latest PostgreSQL minor versions: 18.3, 17.9, 16.13, 15.17, and 14.22. These updates are applied automatically during planned maintenance windows, ensuring your databases stay up to date with critical fixes and reliability improvements, with no manual action required. This is an out-of-cycle release that addresses regressions identified in the previous update. The release includes fixes across replication, JSON functions, query correctness, indexing, and extensions like pg_trgm, improving overall stability and correctness of database operations.

For details about the minor release, see the PostgreSQL announcement.

New extension support for PostgreSQL 18 on Azure Database for PostgreSQL

Azure Database for PostgreSQL running PostgreSQL 18 now supports extensions that enable graph querying, in‑database AI integration, external storage access, and scalable vector similarity search, expanding the types of workloads that can be handled directly within PostgreSQL.

Newly supported extensions include:

AGE (Apache AGE v1.7.0): Adds native graph data modeling and querying capabilities to PostgreSQL using openCypher, enabling hybrid relational–graph workloads within the same database.
azure_ai: Enables direct invocation of Microsoft Foundry models from PostgreSQL using SQL, allowing AI inference and embedding generation to be integrated into database workflows.
azure_storage: Provides native integration with Azure Blob Storage, enabling PostgreSQL to read from and write to external storage for data ingestion, export, and hybrid data architectures.
pg_diskann: Introduces disk‑based approximate nearest neighbor (ANN) indexing for high-performance vector similarity search at scale, optimized for large vector datasets with constrained memory.

Together, these extensions allow PostgreSQL on Azure to support multi-model, AI‑assisted, and data‑intensive workloads while preserving compatibility with the open‑source PostgreSQL ecosystem.

Guide on PostgreSQL buffer cache analysis, query rewriting

We have rolled out two new blogs on PostgreSQL buffer cache analysis and PostgreSQL query rewriting and subqueries. These blogs help you better understand how PostgreSQL behaves under the hood and how to apply practical performance optimizations whether you’re diagnosing memory usage, reducing unnecessary disk I/O, or reshaping queries to get more efficient execution plans as your workloads scale.

PostgreSQL Buffer Cache Analysis

This blog focuses on understanding PostgreSQL memory behavior through shared_buffers, the database’s primary buffer cache. Using native statistics and the pg_buffercache extension, it provides a data‑driven approach to evaluate cache efficiency, identify when critical tables and indexes are served from memory, and detect cases where disk I/O may be limiting performance. The guide offers a repeatable methodology to support informed tuning decisions as workloads scale.

PostgreSQL Query Rewriting and Subqueries

This blog explores how query structure directly impacts PostgreSQL execution plans and performance. It walks through common anti‑patterns and practical rewrites such as replacing correlated subqueries with set‑based joins, using semi‑joins, and pre‑aggregating large tables to reduce unnecessary work and enable more efficient execution paths. Each scenario includes clear explanations, example rewrites, and self‑contained test scripts you can run.

Azure Postgres Learning Bytes 🎓

How to create and store vector embeddings in Azure Database for PostgreSQL

Vector embeddings sit at the core of many modern AI applications from semantic search and recommendations to RAG‑based experiences. But once you generate embeddings, an important question follows: how do you generate and store them in your existing database server?

With Azure Database for PostgreSQL, you can generate and store vector embeddings directly alongside your application data. By using the `azure_ai` extension, PostgreSQL can seamlessly integrate with Azure OpenAI to create embeddings and store them in your database. This learning byte walks you through a step‑by‑step guide to generating and storing vector embeddings in Azure Database for PostgreSQL.

Step 1: Enable the Azure AI extension

Azure Database for PostgreSQL supports the azure_ai extension, which allows you to call Azure OpenAI service.

Connect to your database and run:

CREATE EXTENSION IF NOT EXISTS azure_ai;

Step 2: Create (or use existing) Azure OpenAI resource

You need an Azure OpenAI resource in your subscription with an embedding model deployed.

In the Azure portal, create an Azure OpenAI resource.
Deploy an embedding model (for example, text-embedding-3-small).

Azure OpenAI provides the endpoint URL and API key

Step 3: Get endpoint and API key

Go to your Azure OpenAI resource in the Azure portal.
Select Keys and Endpoint.
Copy:
- Endpoint
- API Key (Key 1 or Key 2)

Step 4: Configure Azure AI extension with OpenAI details

Store the endpoint and key securely inside PostgreSQL

SELECT azure_ai.set_setting( 'azure_openai.endpoint', 'https://<your-endpoint>.openai.azure.com' ); SELECT azure_ai.set_setting( 'azure_openai.subscription_key', '<your-api-key>' );

Step 5: Generate an embedding

SELECT LEFT( azure_openai.create_embeddings( 'text-embedding-3-small', 'Sample text for PostgreSQL Lab' ):: text, 100 ) AS vector_preview;

Step 6: Add a vector column

Add a vector column to store embeddings (example uses 1536‑dimensional vectors):

ALTER TABLE < table - name > ADD COLUMN embedding VECTOR(1536);

Step 7: Store the embedding

Update your table with the generated embedding:

UPDATE < table - name > SET embedding = azure_openai.create_embeddings( 'text-embedding-3-small', content );

Conclusion

That’s a wrap for our March 2026 recap. This month brought a set of meaningful updates focused on making Azure Database for PostgreSQL more performant, reliable, and scalable whether you’re modernizing workloads, scaling globally, or strengthening your security posture.

We’ll be back soon with more exciting announcements and key feature enhancements for Azure Database for PostgreSQL, so stay tuned! Your feedback is important to us, have suggestions, ideas, or questions? We’d love to hear from you: https://aka.ms/pgfeedback.

Combining pgvector and Apache AGE - knowledge graph & semantic intelligence in a single engine

Raunak — Wed, 15 Apr 2026 11:59:07 GMT

Inspired by GraphRAG and PostgreSQL Integration in Docker with Cypher Query and AI Agents, which demonstrated how Apache AGE brings Cypher based graph querying into PostgreSQL for GraphRAG pipelines. This post takes that idea further combining AGE's graph traversal with pgvector's semantic search to build a unified analytical engine where vectors and graphs reinforce each other in a single PostgreSQL instance.

This post targets workloads where entity types, relationship semantics, and schema cardinality are known before ingestion. Embeddings are generated from structured attribute fields; graph edges are typed and written by deterministic ETL. No LLM is involved at any stage. You should use this approach when you have structured data and need operational query performance, and deterministic, auditable, sub-millisecond retrieval.

The problem nobody talks about the multi database/ multi hop tax

If you run technology for a large enterprise, you already know the data problem. It is not that you do not have enough data. It is that your data lives in too many places, connected by too many fragile pipelines, serving too many conflicting views of the same reality.

Here is a pattern that repeats across industries. One team needs to find entities "similar to" a reference item — not by exact attribute match, but by semantic meaning derived from unstructured text like descriptions, reviews, or specifications. That is a vector similarity problem.

Another team needs to traverse relationships trace dependency chains, map exposure paths, or answer questions like "if this node is removed, what downstream nodes are affected?" That is a graph traversal problem.

Meanwhile, the authoritative master data of IDs, attributes, pricing, transactional history already lives in Postgres.

Now you are operating three databases. Three bills. Three sets of credentials. Three backup strategies. A fragile ETL layer stitching entity IDs across systems, breaking silently whenever someone adds a new attribute to the master table. And worst of all, nobody can ask a question that spans all three systems without custom application code.

Azure PostgreSQL database can already do all three jobs. Two extensions pgvector for vector similarity search and Apache AGE extension for graph traversal bringing these capabilities natively into the database. No new infrastructure. No sync pipelines. No multi database tax!

This post walks through exactly how to combine them, why each piece matters at scale, and what kinds of queries become possible when you stop treating vectors and graphs as separate concerns.

The architecture: Two extensions, One engine

pgvector adds a native vector data type and distance operators (<=>, <->, <#>) with HNSW and IVFFlat index support.

pg_diskann adds a third index type that keeps the index on disk instead of in memory, enabling large scale vector search without proportional RAM.

example 1 - to run a product similarity query such as the one below which corelates products sold across multiple markets which are related (cosine similarity).

- The limit clause in sub query limits the similarity search to closest 1 product recommendation

- High similarity score of > 0.75 (aka 75% similarity in embeddings)

-- Table DDL - for illuatration purposes only CREATE TABLE IF NOT EXISTS products ( id SERIAL PRIMARY KEY, sku TEXT UNIQUE NOT NULL, name TEXT NOT NULL, brand TEXT NOT NULL, category TEXT NOT NULL, subcategory TEXT, market TEXT NOT NULL, region TEXT, description TEXT, ingredients TEXT, avg_rating FLOAT DEFAULT 0.0, review_count INT DEFAULT 0, price_usd FLOAT, launch_year INT, status TEXT DEFAULT 'active', embedding vector(384) );

SELECT us.name AS us_product, us.brand AS us_brand, in_p.name AS india_match, in_p.brand AS india_brand, Round((1 - (us.embedding <=> in_p.embedding))::NUMERIC, 4) AS similarity FROM products us cross join lateral ( SELECT name, brand, embedding FROM products WHERE market = 'India' AND category = us.category ORDER BY embedding <=> us.embedding limit 1 ) in_p WHERE us.market = 'US' AND us.category = 'Skincare' AND us.avg_rating >= 4.0 AND round((1 - (us.embedding <=> in_p.embedding))::NUMERIC, 4)> 0.75 ORDER BY similarity DESC limit 20;

AGE adds a cypher() function that executes cypher queries against a labeled property graph stored in the database managed and maintained under the ag_catalog schema. Vertices and edges become first class PostgreSQL rows with agtype properties.

The age extension supports MATCH, CREATE, MERGE, WITH, and aggregations.

example 2 - to run a product similarity query such as the one below which returns common products sold via multiple retail channels.

SET search_path = ag_catalog, "$user", public; SELECT * FROM cypher('cpg_graph', $$ MATCH (p:Product)-[:SOLD_AT]->(walmart:RetailChannel {name: 'Walmart'}) MATCH (p)-[:SOLD_AT]->(target:RetailChannel {name: 'Target'}) MATCH (b:Brand)-[:MANUFACTURES]->(p) RETURN b.name AS brand, p.name AS product, p.category AS category, p.market AS market, p.price_usd AS price ORDER BY p.category, b.name $$) AS (brand agtype, product agtype, category agtype, market agtype, price agtype);

The critical point and takeaway here is that both extensions participate in the same query planner and executor. A CTE that calls pgvector's <=> operator can feed results into a cypher() call in the next CTE all within a single transaction, sharing all available processes and control the database has to offer.

Finally, you are looking at code that looks like -

CREATE EXTENSION IF NOT EXISTS vector; CREATE EXTENSION IF NOT EXISTS age; SET search_path = ag_catalog, "$user", public; SELECT create_graph('knowledge_graph');

The bridge: pgvector → Apache AGE

This is the architectural centrepiece where the mechanism that turns vector similarity scores into traversable graph edges. Without this “bridge” pgvector and AGE are two isolated extensions.

Why bridge at all?

pgvector answers: "What is similar to X?" AGE answers: "What is connected to Y, and how?"

These are fundamentally different questions operating on fundamentally different data structures. pgvector works on a flat vector space and every query is a distance calculation against an ANN index.

AGE works on a labelled property graph where every query is a pattern match across typed nodes and edges.

What if now the question is – What is like X and connected to Y and how?

This is where the bridge gets activated comes into life.

This takes cosine similarity distance scores from pgvector and writes them as SIMILAR_TO edges in the AGE property graph turning a distance computation into a traversable relationship.

Once similarity is an edge, cypher queries can then combine it with structural edges in a single declarative pattern.

for ind_prod_id, us_prod_id, similarity in pairs: execute_cypher(cur, f""" MATCH (a:Product {{product_id: { ind_prod_id }}}), (b:Product {{product_id: { us_prod_id }}}) CREATE (a)-[:SIMILAR_TO {{score: {score:.4f}, method: 'pgvector_cosine'}}]->(b) CREATE (b)-[:SIMILAR_TO {{score: {score:.4f}, method: 'pgvector_cosine'}}]->(a) """)

The cypher() function translates Cypher into DML against ag_catalog tables under the hood, these are plain PostgreSQL heap inserts just like another row.

The score property is the edge weight on the SIMILAR_TO relationship. Its value is the similarity score computed from pgvector using cosine similarity, so a higher score means the two products are more semantically similar.

The method property is metadata on that same edge. It records how the score was produced. In this case, pgvector_cosine is just a string label indicating that the relationship was derived using pgvector based cosine similarity.

Cosine similarity is symmetric, but property graph traversal is directional i.e. MATCH (a)-[:SIMILAR_TO]->(b) won't find the reverse path unless both directional edges exist.

Why this combination matters

One backup strategy. One monitoring stack. One connection pool. One failover target. One set of credentials. One database restore considerations - for teams already running Az PostgreSQL databases in production this adds capabilities without adding any net new infrastructure.

Unified cost model

The planner assigns cost estimates to index scan for both execution engines using the same cost framework it uses for B-tree lookups and sequential scans. It can decide whether to use the HNSW index or fall back to a sequential scan based on table statistics and server parameters.

As you have learnt so far, there is no separate storage or database engine to learn.

Bringing all this knowledge together

Examples 1 and 2 were all about native vector search and native graph search example in a classic product catalog scenario, respectively. Now, let’s bring this to life - What if now the question is – What is like X and connected to Y and how?

In this use case - pgvector finds the cross market matches (as shown in example 1), then Cypher checks which of those matches are sold at both Walmart and Target:

SET search_path = ag_catalog, "$user", public; -- Cross-market matching (pgvector) → Retail channel overlap (graph) WITH cross_market AS ( SELECT us.id AS us_id, us.name AS us_product, us.brand AS us_brand, in_p.id AS india_id, in_p.name AS india_match, in_p.brand AS india_brand, ROUND((1 - (us.embedding <=> in_p.embedding))::numeric, 4) AS similarity FROM products us CROSS JOIN LATERAL ( SELECT id, name, brand, embedding FROM products WHERE market = 'India' AND category = us.category ORDER BY embedding <=> us.embedding LIMIT 1 ) in_p WHERE us.market = 'US' AND us.category = 'Skincare' AND us.avg_rating >= 4.0 AND ROUND((1 - (us.embedding <=> in_p.embedding))::numeric, 4) > 0.75 ), dual_channel AS ( SELECT (pid::text)::int AS product_id, brand::text AS brand FROM cypher('cpg_graph', $$ MATCH (p:Product)-[:SOLD_AT]->(w:RetailChannel {name: 'Walmart'}) MATCH (p)-[:SOLD_AT]->(t:RetailChannel {name: 'Target'}) MATCH (b:Brand)-[:MANUFACTURES]->(p) RETURN p.product_id AS pid, b.name AS brand $$) AS (pid agtype, brand agtype) ) SELECT cm.us_product, cm.us_brand, cm.india_match, cm.india_brand, cm.similarity, CASE WHEN dc.product_id IS NOT NULL THEN 'Yes' ELSE 'No' END AS india_match_at_walmart_and_target FROM cross_market cm LEFT JOIN dual_channel dc ON dc.product_id = cm.india_id ORDER BY cm.similarity DESC LIMIT 20;

Conclusion

The Azure PostgreSQL database ecosystem has quietly assembled the components for a unified semantic + structural analytics engine in form of extensions.

pgvector with pg_diskann delivers production grade approximate nearest-neighbour search with ANN indexes.

Apache AGE delivers cypher based property graph traversal. Together with a “bridge,” they enable query patterns that are impossible in either system alone and they do it within the ACID guarantees, operational tooling, and SQL vocabulary knowledge you already have.

Stop paying for three databases when one will do!

Cascading Read Replicas Now Generally Available!

gauri-kasar — Mon, 13 Apr 2026 17:23:08 GMT

We’re excited to announce the General Availability of cascading read replicas in Azure Database for PostgreSQL. This capability allows you to create read replicas for your Azure Database for PostgreSQL instance not only from a primary server, but also from existing read replicas, enabling multi‑level replication chains.

Coordinating read‑heavy database workloads across multiple regions can be challenging, especially when you’re trying to deliver low‑latency read response experiences to users spread across different geographic locations. One effective way to address this is by placing read replicas closer to where your users are, allowing applications to serve read requests with significantly reduced latency and improved performance.

What are cascading read replicas?

With cascading read replicas, you can scale read‑intensive workloads more effectively, distribute read traffic efficiently, and support advanced deployment topologies such as globally distributed applications. Each read replica can act as a source for additional replicas, forming a tree‑like replication structure. For example, if your primary server is deployed in one region, you can create direct replicas in nearby regions and then cascade additional replicas to more distant locations. This approach helps spread read traffic evenly while minimizing latency for users around the world. We support up to 2 levels of replication with this feature. Level 1 will be all the read replicas and level 2 will be cascading read replicas.

Why use cascading read replicas?

Improved scalability
Cascading read replicas support multi‑level replication, making it easier to handle high volumes of read traffic without overloading a single instance by scaling up to 30 read replicas.
Geographic distribution
By placing replicas closer to your global user base, you can significantly reduce read latency and deliver faster, more responsive application experiences.
Efficient read traffic distribution
Distributing read workloads across multiple replicas helps balance load, improving overall performance and reliability.

Additionally, cascading read replicas offer operational flexibility. If you observe replication lag, you can perform a switchover operation between a cascading read replica with its source or intermediate replica, helping you maintain optimal performance and availability for your replicas.

How does replication work with cascading read replicas?

The primary server acts as a source for the read replica. Data is asynchronously replicated to these replicas. When we add cascading replicas, the previous replicas act as a data source for replication.

In the diagram above, “primary-production-server” is the primary server with three read replicas. One of these replicas, “readreplica01”, serves as the source for another read replica, “readreplica11” which is a cascading read replica.

With cascading read replicas, you can add up to five read replicas per source and replicate data across two levels, as shown in the diagram. This allows you to create up to 30 read replicas in total five read replicas directly from the primary server, and up to 25 additional replicas at the second level (each second-level replica can have up to five read replicas).

If you notice replication lag between an intermediate read replica and a cascading read replica, you can use a switchover operation to swap “readreplica01” and “readreplica11”, helping reduce the impact of lag.

To learn more about cascading read replicas, please refer to our documentation: Cascading read replicas

Deploying cascading read replicas on Azure portal

Navigate to the “Replication” tab and then click on “Create replica” highlighted in red as shown below:
After creating a read replica as the below screenshot shows that you have 1 read replica that is attached to the primary instance.
Click on the created replica and navigate to the replication tab, source server is “read-replica-01” and we will be creating a cascading read replica under this.
Once cascading read replica is created you can see the role of “read-replica-01” has now changed to Source, Replica. You can perform site swap operation by clicking on the promote button for cascading read replica.

Deploy cascading read replica with terraform:

Before you start, make sure you have:

An existing primary PostgreSQL Flexible Server
At least one read replica already created from the primary
AzureRM provider with latest version
Proper permissions on the Azure subscription and resource group

Configure the AzureRM Provider: Start by configuring the AzureRM provider in your Terraform project.terraform { required_providers { azurerm = { source = "hashicorp/azurerm" version = "~> 3.80" } } } provider "azurerm" { features {} }
Reference the existing read replica server using the data block to reference the replica server.
data "azurerm_postgresql_flexible_server" "source_replica" { name = "my-read-replica-1" resource_group_name = "my-resource-group" }
Now create a new PostgreSQL Flexible Server and point it to the replica using create_source_server_id.
resource "azurerm_postgresql_flexible_server" "cascading_replica" { name = "my-cascading-replica" resource_group_name = "my-resource-group" location = data.azurerm_postgresql_flexible_server.source_replica.location version = data.azurerm_postgresql_flexible_server.source_replica.version delegated_subnet_id = data.azurerm_postgresql_flexible_server.source_replica.delegated_subnet_id private_dns_zone_id = data.azurerm_postgresql_flexible_server.source_replica.private_dns_zone_id create_mode = "Replica" create_source_server_id = data.azurerm_postgresql_flexible_server.source_replica.id storage_mb = 32768 sku_name = "Standard_D4s_v3" depends_on = [ data.azurerm_postgresql_flexible_server.source_replica ] }
Apply the Terraform Configuration
terraform init terraform plan terraform apply

Key Considerations

Cascading read replicas allow for up to 5 read replicas and two levels of replication.
Creating cascading read replicas is supported in PostgreSQL version 14 and above.
Promote operation is not supported for intermediate read replicas with cascading read replicas.

Conclusion

Cascading read replicas in Azure Database for PostgreSQL offer a scalable way to distribute your read traffic across the same and different regions, reducing the read workload on primary database. For globally distributed applications, this can improve read latency as well as resilience and performance. This design supports horizontal scaling as your application demand grows, ensuring you can handle a high volume of read requests without compromising speed. Get started with this feature today and scale your read workloads.

Premium SSD v2 Is Now Generally Available for Azure Database for PostgreSQL

kabharati — Tue, 07 Apr 2026 17:29:16 GMT

We are excited to announce the General Availability (GA) of Premium SSD v2 for Azure Database for PostgreSQL flexible server. With Premium SSD v2, you can achieve up to 4× higher IOPS, significantly lower latency, and better price-performance for I/O-intensive PostgreSQL workloads. With independent scaling of storage and performance, you can now eliminate overprovisioning and unlock predictable, high-performance PostgreSQL at scale.

This release is especially impactful for OLTP, SaaS, and high‑concurrency applications that require consistent performance and reliable scaling under load.

In this post, we will cover:

Why Premium SSD v2: Core capabilities such as flexible disk sizing, higher performance, and independent scaling of capacity and I/O.
Premium SSD v2 vs. Premium SSD: A side‑by‑side overview of what’s new and what’s improved.
Pricing: Pricing estimates.
Performance: Benchmarking results across two workload scenarios.
Migration options: How to move from Premium SSD to Premium SSD v2 using restore and read‑replica approaches.
Availability and support: Regional availability, supported features, current limitations, and how to get started.

Why Premium SSD v2?

Flexible Disk Size - Storage can be provisioned from 32 GiB to 64 TiB in 1 GiB increments, allowing you to pay only for required capacity without scaling disk size for performance.
High Performance -Achieve up to 80,000 IOPS and 1,200 MiB/s throughput on a single disk, enabling high-throughput OLTP and mixed workloads.
Adapt instantly to workload changes: With Premium SSD v2, performance is no longer tied to disk size. Independently tune IOPS and throughput without downtime, ensuring your database keeps up with real-time demand.
Free baseline performance: Premium SSD v2 includes built-in baseline performance at no additional cost. Disks up to 399 GiB automatically include 3,000 IOPS and 125 MiB/s, while disks sized 400 GiB and larger include up to 12,000 IOPS and 500 MiB/s.

Premium SSD v2 vs. Premium SSD: What’s new?

Pricing

Pricing for Premium SSD v2 is similar to Premium SSD, but will vary depending on the storage, IOPS, and bandwidth configuration set for a Premium SSD v2 disk. Pricing information is available on the pricing page or pricing calculator.

Performance

Premium SSD v2 is designed for IO‑intensive workloads that require sub‑millisecond disk latencies, high IOPS, and high throughput at a lower cost. To demonstrate the performance impact, we ran pgbench on Azure Database for PostgreSQL using the test profile below.

Test Setup

To minimize external variability and ensure a fair comparison:

Client virtual machines and the database server were deployed in the same availability zone in the East US region.
Compute, region, and availability zones were kept identical.
The only variable changed was the storage tier.
TPC-B benchmark using pgbench with a database size of 350 GiB.

Test Scenario 1: Breaking the IOPS Ceiling with Premium SSD v2

Premium SSD v2 eliminates the traditional storage bottleneck by scaling linearly up to 80,000 IOPS, while Premium SSD plateaus early due to fixed performance limits. To demonstrate this, we configured each storage tier with its maximum supported IOPS and throughput while keeping all other variables constant. Premium SSD v2 achieves up to 4x higher IOPS at nearly half the cost, without requiring large disk sizes.

Note: Premium SSD requires a 32 TiB disk to reach 20K IOPS, while SSD v2 achieves 80K IOPS even on a 160 GiB disk though we used 1 TiB disk in this test for a bigger scaling factor for pgbench test.

We ran pgbench across five workload profiles, ranging from 32 to 256 concurrent clients, with each test running for 20 minutes. The results go beyond incremental improvements and highlight a material shift in how applications scale with Premium SSD v2.

Throughput Scaling

As concurrency increases, Premium SSD quickly reaches its IOPS limits while Premium SSD v2 continues to scale.

At 32 clients: Premium SSD v2 achieved 10,562 TPS vs 4,123 TPS on Premium SSD representing a 156% performance improvement.
At 256 clients: At higher load, Premium SSD v2 achieved over 43,000 TPS representing a 279% improvement compared to the 11,465 TPS observed on Premium SSD.

Latency Stability

Throughput is an indication of how much work is done while latency reflects how quickly users experience it. Premium SSD v2 maintains consistently low latency even as workload increases.

Reduced Wait Times: 61–74% lower latency across all test phases.
Consistency under Load: Premium SSD latency increased to 22.3 ms, while Premium SSD v2 maintained a latency of 5.8 ms, remaining stable even under peak load.

IOPS Behavior

The table below illustrates the IOPS behavior observed during benchmarking for both storage tiers.

Dimension	Premium SSD	Premium SSD v2
IOPS	Lower baseline performance, Hits limits early	~2× higher IOPS at low concurrency,
IOPS	Lower baseline performance, Hits limits early	Up to 4× higher IOPS at peak load
IOPS Plateau	Throughput stalls at ~20k IOPS for 64 clients -256 clients	Scales from ~29k IOPS (32 clients) to ~80k IOPS (256 clients)
Additional Clients	Adding clients does not increase throughput	Additional clients continue to drive higher throughput
Primary Bottleneck	Storage becomes the bottleneck early	No single bottleneck observed
Scaling Behavior	Stops scaling early	True linear scaling with workload demand
Resource Utilization	Disk saturation leaves CPU and memory underutilized	Balanced utilization across IOPS, CPU, and memory
Key Takeaway	Storage limits performance before compute is fully used	Unlocks higher throughput and lower latency by fully utilizing compute resources

Test Scenario 2: Better Performance at same price

At the same price point, Premium SSD v2 delivers higher throughput and lower latency than Premium SSD without requiring any application changes. To demonstrate this, we ran multiple pgbench tests using two workload configurations 8 clients / 8 threads and 32 clients / 32 threads with each run lasting 20 minutes. Results were consistent across all runs, with Premium SSD v2 consistently outperforming Premium SSD. Both configurations cost $578/month, the only difference is storage performance.

Results:

Moderate concurrency (8 clients)
Premium SSD v2 delivered approximately 154% higher throughput (Transactions Per Second) than Premium SSD (1,813 TPS vs. 715 TPS), while average latency decreased by about 60% (from ~11.1 ms to ~4.4 ms).

High concurrency (32 clients)
The performance gap increases as concurrency grows, Premium SSD v2 delivered about 169% higher throughput than Premium SSD (3,643 TPS vs. ~1,352 TPS) and reduced average latency by around 67% (from ~26.3 ms to ~8.7 ms).

IOPS Behavior

In the 8‑client, 8‑thread test, Premium SSD reached its IOPS ceiling early, operating at 100% utilization, while Premium SSD v2 retained approximately 30% headroom under the same workload delivering 8,037 IOPS vs 3,761 IOPS with Premium SSD.
When the workload increased to 32 clients and 32 threads, both tiers approached their IOPS limits however, Premium SSD v2 sustained a significantly higher performance ceiling, delivering approximately 2.75x higher IOPS (13,620 vs. 4,968) under load.

Key Takeaway: With Premium SSD v2, you do not need to choose between cost and performance you get both. At the same price, applications run faster, scale further, and maintain lower latency without any code changes.

Migrate from Premium SSD to Premium SSD v2

Migrating is simple and fast. You can migrate from Premium SSD to Premium SSD v2 using the two strategies below with minimal downtime. These methods are generally quicker than logical migration strategies, such as exporting and restoring data using pg_dump and pg_restore.

When migrating from Premium SSD to Premium SSD v2, using a virtual endpoint helps keep downtime to a minimum and allows applications to continue operating without requiring configuration changes after the migration.

After the migration completes, you can stop the original server until your backup requirements are met. Once the required backup retention period has elapsed and all new backups are available on the new server, the original server can be safely deleted.

Region Availability & Features Supported

Premium SSD v2 is available in 48 regions worldwide for Azure Database for PostgreSQL – Flexible Server. For the most up‑to‑date information on regional availability, supported features, and current limitations, refer to the official Premium SSD v2 documentation.

Getting Started:

To learn more, review the official documentation for storage configuration available with Azure Database for PostgreSQL. Your feedback is important to us, have suggestions, ideas, or questions? We would love to hear from you: https://aka.ms/pgfeedback.

Handling Unique Constraint Conflicts in Logical Replication

gauri-kasar — Thu, 02 Apr 2026 16:33:58 GMT

Authors: Ashutosh Sharma, Senior Software Engineer, and Gauri Kasar, Product Manager

Logical replication can keep your PostgreSQL environments in sync, helping replicate selected tables with minimal impact on the primary workload. But what happens when your subscriber hits a duplicate key error and replication grinds to a halt? If you’ve seen a unique‑constraint violation while replicating between Azure Database for PostgreSQL servers, you’re not alone. This blog covers common causes, prevention tips, and practical recovery options.

In PostgreSQL logical replication, the subscriber can fail with a unique-constraint error when it tries to apply a change that would create a duplicate key.

duplicate key value violates unique constraint

Understanding why this happens?

When an INSERT or UPDATE would create a value that already exists in a column (or set of columns) protected by a UNIQUE constraint (including a PRIMARY KEY). In logical replication, this most commonly occurs because of local writes on the subscriber or if the table is being subscribed from multiple publishers. These conflicts are resolved on the subscriber side.

Local writes on the subscriber: a row with the same primary key/unique key is inserted on the subscriber before the apply worker processes the corresponding change from the publisher.
Multi-origin / multi-master without conflict-free keys: two origins generate (or replicate) the same unique key.
Initial data synchronization issues: the subscriber already contains data when the subscription is created with initial copy enabled, resulting in duplicate inserts during the initial table sync.

How to avoid this?

Avoid local writes on subscribed tables (treat the subscriber as read-only for replicated relations).
Avoid subscribing to the same table from multiple publishers unless you have explicit conflict handling and a conflict-free key design.

Enabling server logs can help you identify and troubleshoot unique‑constraint conflicts more effectively. Refer to the official documentation to configure and access PostgreSQL logs.

How to handle conflicts (recovery options)

Option 1: Delete the conflicting row on the subscriber

Use the subscriber logs to identify the key (or row) causing the conflict, then delete the row on the subscriber with a DELETE statement. Resume apply and repeat if more conflicts appear.

Option 2: Use conflict logs and skip the conflicting transaction (PostgreSQL 17+)

Starting with PostgreSQL 17, logical replication provides detailed conflict logging on the subscriber, making it easier to understand why replication stopped and which transaction caused the failure. When a replicated INSERT would violate a non‑deferrable unique constraint on the subscriber for example, when a row with the same key already exists the apply worker detects this as an insert_exists conflict and stops replication. In this case, PostgreSQL logs the conflict along with the transaction’s finish LSN, which uniquely identifies the failing transaction.

ERROR: conflict detected on relation "public.t2": conflict=insert_exists ... in transaction 754, finished at 0/034F4090 ALTER SUBSCRIPTION <subscription_name> SKIP (lsn = '0/034F4090');

Option 3: Rebuild (re-sync) the table

Rebuilding (re‑syncing) a table is the safest and most deterministic way to resolve logical replication conflicts caused by pre‑existing data differences or local writes on the subscriber. This approach is especially useful when a table repeatedly fails with unique‑constraint violations and it is unclear which rows are out of sync.

Step 1 (subscriber): Disable the subscription.

ALTER SUBSCRIPTION <subscription_name> DISABLE;

Step 2 (subscriber): Remove the local copy of the table so it can be re-copied.

TRUNCATE TABLE <conflicting_table>;

Step 3 (publisher): Ensure the publication will (re)send the table (one approach is to recreate the publication entry for that table).

ALTER PUBLICATION <pub_with_conflicting_table> DROP TABLE <conflicting_table>; CREATE PUBLICATION <pub_with_conflicting_table_rebuild> FOR TABLE <conflicting_table>;

Step 4 (subscriber): Create a new subscription (or refresh the existing one) to re-copy the table.

CREATE SUBSCRIPTION <sub_rebuild> CONNECTION '<connection_string>' PUBLICATION <pub_with_conflicting_table_rebuild>;

Step 5 (subscriber): Re-enable the original subscription (if applicable).

ALTER SUBSCRIPTION <subscription_name> ENABLE;

Conclusion

In most cases, these conflicts occur due to local changes on the subscriber or differences in data that existed before logical replication was fully synchronized. It is recommended to avoid direct modifications on subscribed tables and ensure that the replication setup is properly planned, especially when working with tables that have unique constraints.

No code left behind: How AI streamlines Oracle-to-PostgreSQL migration

TeneilLawrence — Tue, 31 Mar 2026 15:00:00 GMT

Coauthored by Jonathon Frost, Aditya Duvuri and Shriram Muthukrishnan

More and more organizations are choosing PostgreSQL over proprietary database platforms such as Oracle, and for good reasons. It’s fully open source and community supported with a steady pace of innovation. It’s also preferred by developers for its extensibility and flexibility, often being used for vector data along with relational data to support modern applications and agents. Still, organizations considering a shift from Oracle to PostgreSQL, may hesitate due to the complexity that often accompanies an enterprise-scale migration project. Challenges such as incompatible data types, language mismatches, and the risk of breaking critical applications are hard to ignore.

Recently, the Azure Postgres team released a new, free tool for migrations from Oracle to PostgreSQL that was designed to address these challenges, making the decision to migrate a lot less risky. The new AI-assisted Oracle-to-PostgreSQL migration tool, available in public preview via the PostgreSQL extension for Visual Studio Code, brings automation, validation, and AI-powered migration assistance into a single, user-friendly interface.

Meet your new migration assistant

The AI-assisted Oracle to PostgreSQL migration tool dramatically simplifies moving off Oracle databases. Accessible through VS Code, the tool uses intelligent automation, powered by GitHub Copilot, to convert Oracle database schemas and PL/SQL code into PostgreSQL-compatible formats. It can analyze Oracle schema, and automatically translate table definitions, data types, and even stored procedures/triggers into PostgreSQL equivalents speeding up migrations that once took months of manual effort.

By handling the heavy lifting of schema and code conversion, this tool allows teams to focus on higher-level testing and optimization rather than tedious code rewrites. Users are already reporting that migrations are now faster, safer, and more transparent. The tool is simple, free, and ready for you to use today. Let’s take a look at how it works by covering the following:

Creating the migration project
Setting up the connections
AI-assisted schema migration
Reviewing schema migration report
AI-assisted application migration
Reviewing application migration report

Step by step with the AI-assisted Oracle-to-PostgreSQL migration tool

Step 1 – Create the project in VS Code

Start by installing or updating the PostgreSQL extension for VS Code from the marketplace. Open the PostgreSQL extension panel and click “Create Migration Project.” You’ll name your project, which will create a folder to store all migration artifacts. This folder will house extracted and converted files, organized for version control and collaboration.

Step 2 - Connect to your databases and AI model

Before beginning the migration, you’ll need to connect to the Oracle databases and select an OpenAI model to leverage during the process. Enter the connection details for your source Oracle database, credentials, and the schema to migrate. Then, select a PostgreSQL scratch database. This temporary environment is used to validate converted DDL in real time. Next, you will be prompted to select an OpenAI model.

Step 3 – Begin schema migration

Once you’ve set up your connections, click the button to start the schema migration. The tool performs an extraction of all relevant Oracle database objects: tables, views, packages, procedures, and more. The extracted DDL is saved as files in your project folder. This file-based approach functions like a software project, enabling change tracking, collaboration, and source control.

Enter - AI assistance

This is where the AI takes over. The tool breaks the extracted schema into manageable chunks, and each chunk is processed by a multi-agent orchestration system:

The Migration Specialist Agent converts Oracle DDL to PostgreSQL.
The Migration Critic Agent validates the conversion by executing it in the PostgreSQL scratch database.
The Documentation Agent captures follow up review tasks, metadata, and coding notes for later integration with the application code migration process.

Each chunk is converted, validated, and deployed. If validation fails, the agents auto correct and retry. This self-healing loop ensures high conversion accuracy. Essentially, the tool conducts compile-time validation against a live PostgreSQL instance to catch issues early and reduce downstream surprises.

Checkpoint - review the schema migration report

Some complex objects, like Oracle packages with intricate PL/SQL, may not convert cleanly on the first pass. These are flagged as “review tasks.” You can invoke GitHub Copilot’s agent mode directly from VS Code to assist. The tool constructs a composite prompt with the original Oracle DDL, the partially converted PostgreSQL version, and any validation errors. This context-rich prompt enables Copilot to generate more accurate fixes.

With the schema fully converted, you can compare the original Oracle and new PostgreSQL versions side by side. Right-click any object in the project folder and select “Compare File Pair." You can also use the “Visualize Schema” feature to see a graphical representation of the converted schema. This is ideal for verifying tables, relationships, and constraints.

Once the schema migration is complete, the tool generates a detailed report that includes:

Total number of objects converted
Conversion success rate
PostgreSQL version and extensions used
List of converted objects by type
Any flagged review tasks

This report serves as both a validation summary and an audit artifact. It helps confirm success and identify any follow-up actions. If you have compliance or change management requirements you need to meet, this documentation is essential.

Step 4 – Begin application migration

The next phase that the tool supports is updating the application code that interacts with the schema. Migrations often stall when code is overlooked or when traditional tools treat SQL statements as simple strings rather than part of a cohesive system. The AI-assisted Oracle-to-PostgreSQL migration tool’s application conversion feature takes a more holistic, context-aware approach.

Before starting, you’ll need to configure GitHub Copilot Agent Mode with a capable AI model. Then, navigate to the ‘application_code’ directory typically found in .github/postgres-migration/<project_name>/application_code, and copy your source code into this directory. Keeping your application and converted schema together provides the AI with the structural context it needs to refactor your code accurately. To start the app migration, this time you’d select the "Migrate Application" button. Then select the folder containing your source code and the converted schema.

Enter - AI assistance

The AI orchestrator will analyze your application’s database interactions against the new Postgres schema and generate a series of transformation tasks. These tasks address SQL dialect changes, data access modifications, and library updates. This process goes beyond a simple search-and-replace operation. The AI queries your migrated PostgreSQL database to gain grounded context of your converted schema, and ensures that things like function signatures, data types, and ORM models are migrated correctly in the application code.

Checkpoint - review the app migration report

When the AI finishes converting your application, it produces a detailed summary. The report lists which files were migrated, notes any unresolved tasks, and outlines how the changes map to the database schema. This audit-ready document can help DBAs and developers collaborate effectively on follow-up actions and integration testing.

You can use VS Code’s built-in diff viewer to compare each migrated file with its original. Right-click on a migrated file and select "Compare App Migration File Pairs" to open a side-by-side view. This comparison highlights differences in SQL queries, driver imports, and other code changes, allowing you to verify the updates.

Wrapping up the migration project

During schema migration, the tool created detailed coding notes summarizing data-type mappings, constraints, and package transformations. These notes are essential for understanding why specific changes were made and for guiding the application conversion. Use them as reference points when validating and refining the AI-generated application code.

Destination - PostgreSQL on Azure

The AI-assisted Oracle-to-PostgreSQL migration tool brings together automation, validation, and AI to make Oracle-to-PostgreSQL migrations faster, safer, and more transparent. With schema extraction, multi-agent orchestration, app conversion, real-time validation, and detailed reporting, it provides a clear, confident path to modernization so you can start taking advantage of the benefits of open-source Postgres.

What’s in store

On the other side of a successful migration project to PostgreSQL on Azure, you get:

First-class support in Azure
Significantly lower total cost of ownership from eliminating license fees and reducing vendor lock-in
Unmatched extensibility, with support for custom data types, procedural languages and powerful extensions like PostGIS, TimescaleDB, pgvector, Azure AI, and DiskANN
Frequent updates and cutting-edge features delivered via a vibrant open-source community

Whether you’re migrating a single schema or leading a broader replatforming initiative, the AI-assisted Oracle-to-PostgreSQL migration tool helps you move forward with confidence without sacrificing control or visibility.

Learn more about starting your own migration project.

PostgreSQL Buffer Cache Analysis

Gayathri_Paderla — Tue, 31 Mar 2026 12:51:22 GMT

PostgreSQL performance is often dictated not just by query design or indexing strategy, but by how effectively the database leverages memory. At the heart of this memory usage lies shared_buffers—PostgreSQL’s primary buffer cache. Understanding how well this cache is utilized can make the difference between a system that scales smoothly and one that struggles under load.

In this post, we’ll walk you through a practical, data-driven approach to analyzing PostgreSQL buffer cache behavior using native statistics and the pg_buffercache extension. The goal is to answer a few critical questions:

Is the current shared_buffers configuration sufficient?
Are high-value tables and indexes actually being served from memory?
Is PostgreSQL spending too much time going to disk when it shouldn’t?

By the end, you’ll have a repeatable methodology to assess cache efficiency and make informed tuning decisions.

Why Buffer Cache Analysis Matters

PostgreSQL relies heavily on its buffer cache to minimize disk I/O. Every time a query needs a data or index page, PostgreSQL first checks whether that page already exists in shared_buffers. If it does, the page is served directly from memory—fast and efficient. If not, PostgreSQL must fetch it from disk (or the OS page cache), which is significantly slower.

While metrics like query latency and IOPS can tell you that performance is degraded, buffer cache analysis helps explain why. It allows you to:

Validate whether frequently accessed objects stay hot in cache
Identify cache pollution caused by large, low-value tables
Determine whether increasing shared_buffers would provide real benefits or just waste memory

Inspecting Shared Buffers with pg_buffercache

The pg_buffercache extension provides a real-time view into PostgreSQL’s shared buffers. Unlike cumulative statistics, it shows what is in memory right now—which relations are cached, how many blocks they occupy, and how frequently those buffers are reused.

Enabling the Extension

pg_buffercache is not enabled by default and requires superuser privileges:

CREATE EXTENSION pg_buffercache;

Once enabled, you can directly query the contents of shared buffers across databases, tables, and indexes.

Analyzing Cache Distribution

Understanding where your shared buffers are being consumed is the first step toward meaningful tuning.

Database-Level Cache Distribution

This query shows how shared buffers are distributed across databases in the server:

SELECT CASE WHEN c.reldatabase IS NULL THEN '' WHEN c.reldatabase = 0 THEN '' ELSE d.datname END AS database, count(*) AS cached_blocks FROM pg_buffercache AS c LEFT JOIN pg_database AS d ON c.reldatabase = d.oid WHERE datname NOT LIKE 'template%' GROUP BY d.datname, c.reldatabase ORDER BY d.datname, c.reldatabase;

This is particularly useful in multi-database environments where one workload may be evicting cache pages needed by another.

Table and Index-Level Cache Consumption

To understand which relations, dominate the cache, the following query breaks buffer usage down by tables and indexes:

SELECT c.relname, c.relkind, count(*) FROM pg_database AS a, pg_buffercache AS b, pg_class AS c WHERE c.relfilenode = b.relfilenode AND b.reldatabase = a.oid GROUP BY 1, 2 ORDER BY 3 DESC, 1;

This helps answer an important question: Are your most business-critical tables and indexes actually resident in memory, or are they constantly being evicted?

If large, rarely used tables consume a disproportionate share of buffers, it may indicate cache churn or the need for workload isolation.

Understanding Buffer Usage Count (Hot vs Cold Data)

Each buffer in shared memory carries a usage count, which reflects how frequently it has been accessed before eviction. Higher values indicate hotter data.

SELECT c.relname, c.relkind, usagecount, count(*) AS buffers FROM pg_database AS a, pg_buffercache AS b, pg_class AS c WHERE c.relfilenode = b.relfilenode AND b.reldatabase = a.oid AND a.datname = current_database() GROUP BY 1, 2, 3 ORDER BY 3 DESC, 1;

A healthy system typically shows a meaningful number of buffers with higher usage counts (for example, 4–5), indicating frequently reused data that benefits from caching.

Buffer Cache Percentages: Putting Numbers in Context

Raw buffer counts are useful, but percentages make interpretation easier. The following query shows:

How much of shared_buffers each relation occupies
What percentage of the relation itself is cached

SELECT c.relname, pg_size_pretty(count(*) * 8192) AS buffered, round(100.0 * count(*) / (SELECT setting FROM pg_settings WHERE name='shared_buffers')::integer, 1) AS buffers_percent, round(100.0 * count(*) * 8192 / pg_relation_size(c.oid), 1) AS percent_of_relation FROM pg_class c JOIN pg_buffercache b ON b.relfilenode = c.relfilenode JOIN pg_database d ON b.reldatabase = d.oid AND d.datname = current_database() GROUP BY c.oid, c.relname ORDER BY 3 DESC LIMIT 10;

This view is especially powerful when validating whether performance-critical objects are adequately cached relative to their size.

Complementing Cache Views with I/O Statistics

While pg_buffercache shows the current state of memory, I/O statistics reveal long-term trends. PostgreSQL exposes these via pg_statio_user_tables and pg_statio_user_indexes.

Table Heap Hit Ratios

SELECT relname, heap_blks_hit::numeric / (heap_blks_hit + heap_blks_read) AS hit_pct, heap_blks_hit, heap_blks_read FROM pg_catalog.pg_statio_user_tables WHERE (heap_blks_hit + heap_blks_read) > 0 ORDER BY hit_pct;

Hit ratios close to 1 indicate that table data is largely served from memory rather than disk.

Index Hit Ratios

SELECT relname, idx_blks_hit::numeric / (idx_blks_hit + idx_blks_read) AS hit_pct, idx_blks_hit, idx_blks_read FROM pg_catalog.pg_statio_user_tables WHERE (idx_blks_hit + idx_blks_read) > 0 ORDER BY hit_pct;

Poor index hit ratios often point to insufficient cache or inefficient query patterns that bypass indexes.

Including TOAST and Index Reads

For large objects, TOAST activity can significantly impact I/O. This query provides a more holistic view:

SELECT *, (heap_blks_read + toast_blks_read + tidx_blks_read) AS total_blocks_read, (heap_blks_hit + toast_blks_hit + tidx_blks_hit) AS total_blocks_hit FROM pg_catalog.pg_statio_user_tables;

This helps identify indexes that are frequently read from disk and may benefit from better caching or query rewrites.

How to Interpret the Results

When reviewing buffer cache and I/O metrics, keep the following guidelines in mind:

Validate cache residency of critical objects: If business-critical tables and indexes occupy a meaningful share of shared_buffers, your cache sizing is likely reasonable.
Correlate buffer data with hit ratios: High hit ratios in pg_statio_user_tables and pg_statio_user_indexes confirm effective caching. Persistently low ratios may justify increasing shared_buffers.
Analyze usage count distribution: A healthy number of buffers with higher usage counts indicates hot data benefiting from cache reuse.
Avoid over-tuning: If most buffers have low usage counts but hit ratios remain high, increasing shared_buffers further may not yield measurable gains.

Conclusion

Buffer cache analysis bridges the gap between theory and reality in PostgreSQL performance tuning. By combining real-time cache inspection with long-term I/O statistics, you gain a clear picture of how memory is actually used—and whether changes to shared_buffers will deliver tangible benefits.

Rather than tuning memory blindly, this approach lets you optimize with confidence, grounded in data that reflects your real workload.

Bidirectional Replication with pglogical on Azure Database for PostgreSQL - a VNET guide

pberenguel — Mon, 30 Mar 2026 15:30:26 GMT

Editor’s Note: This article was written by Raunak Jhawar, a Chief Architect. Paula Berenguel and Guy Bowerman assisted with the final review, formatting and publication.

Overview

Bidirectional replication is one of the most requested topologies requiring writes in multiple locations, selective sync, geo-distributed active-active, or even accepting eventual consistency.

This is a deep technical walkthrough for implementing bidirectional (active‑active) replication on private Azure Database for PostgreSQL Server using pglogical, with a strong emphasis on VNET‑injected architectures. It explains the underlying networking and execution model covering replication worker placement, DNS resolution paths, outbound connectivity, and conflict resolution mechanics to show why true private, server‑to‑server replication is only achievable with VNET injection and not with Private Endpoints. It also analyzes the operational and architectural trade‑offs needed to safely run geo distributed, multi write PostgreSQL workloads in production.

This blog post focus on pglogical however, if you are looking for steps to implement it with logical replication or pros and cons of which approach, please refer to my definitive guid to bi-directional replication in Azure Database for PostgreSQL blog post

Why this is important?

This understanding prevents fundamental architectural mistakes (such as assuming Private Endpoints provide private outbound replication), reduces deployment failures caused by hidden networking constraints, and enables teams to design secure, compliant, low‑RPO active/active or migration architectures that behave predictably under real production conditions. It turns a commonly misunderstood problem into a repeatable, supportable design pattern rather than a trial‑and‑error exercise.

Active-Active bidirectional replication between instances

Architecture context

This scenario targets a multi-region active-active write topology where both nodes are injected into the same Azure VNET (example - peered VNETs on Azure or even peered on-premises), both accept writes.

Common use case: Geo distributed OLTP with regional write affinity.

Step 1: Azure Infrastructure Prerequisites

Both server instances must be deployed with VNET injection. This is a deploy time decision and you cannot migrate a publicly accessible instance (with or without private endpoint) to VNET injection post creation without rebuilding it.

Each instance must live in a delegated subnet: Microsoft.DBforPostgreSQL/Servers. The subnet delegation is non-negotiable and prevents you from placing other resource types in the same subnet, so plan your address space accordingly.

If nodes are in different VNETs, configure VNET peering before continuing along with private DNS integration. Ensure there are no overlapping address spaces amongst the peered networks.

NSG rules must allow port 5432 between the two delegated subnets, both inbound and outbound. You may choose to narrow down the NSG rules to meet your organization requirements and policies to a specific source/target combination allow or deny list.

Step 2: Server Parameter Configuration

On both nodes, configure the following server parameters via the Azure Portal (Server Parameters blade) or Azure CLI. These cannot be set via ALTER SYSTEM SET commands.

wal_level = logical -- This setting enables logical replication, which is required for pglogical to function.

max_worker_processes = 16 -- This setting allows for more worker processes, which can help with replication performance.

max_replication_slots = 10 -- This setting allows for more replication slots, which are needed for pglogical to manage replication connections.

max_wal_senders = 10 -- This setting allows for more WAL sender processes, which are responsible for sending replication data to subscribers.

track_commit_timestamp = on -- This setting allows pglogical to track commit timestamps, which can be useful for conflict resolution and monitoring replication lag.

shared_preload_libraries = pglogical -- This setting loads the pglogical extension at server startup, which is necessary for it to function properly.

azure.extensions = pglogical -- This setting allows the pglogical extension to be used in the Azure Postgres PaaS environment.

Both nodes require a restart after shared_preload_libraries and wal_level changes.

Note that max_worker_processes is shared across all background workers in the instance. Each pglogical subscription consumes workers. If you are running other extensions, account for their worker consumption here or you will hit startup failures for pglogical workers.

Step 3: Extension and Node Initialization

Create a dedicated replication user on both nodes. Do not use the admin account for replication.

CREATE ROLE replication_user WITH LOGIN REPLICATION PASSWORD 'your_password';

GRANT USAGE ON SCHEMA public TO replication_user;

GRANT SELECT ON ALL TABLES IN SCHEMA public TO replication_user;

ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO replication_user;

Log into Server A either via a VM in the specified VNET or Azure Bastion Host and run the following which creates the extension, a replication set and policies.

CREATE EXTENSION IF NOT EXISTS pglogical;

SELECT pglogical.create_node(node_name := 'node_a', dsn := 'host.fqdn-for-server-a port=5432 dbname=preferred-database user=replication_user password=<strong_password>');

-- Define the replication set for Server A, specifying which tables to replicate and the types of operations to include (inserts, updates, deletes).

SELECT pglogical.create_replication_set(set_name := 'node_a_set', replicate_insert := true, replicate_update := true, replicate_delete := true, replicate_truncate := false);

-- Add sales_aus_central table explicitly

SELECT pglogical.replication_set_add_table(set_name := 'node_a_set', relation := 'public.sales_aus_central', synchronize_data := true);

-- Add purchase_aus_central table explicitly

SELECT pglogical.replication_set_add_table(set_name := 'node_a_set', relation := 'public.purchase_aus_central', synchronize_data := true);

-- OR add all tables in the public schema

SELECT pglogical.replication_set_add_all_tables('default', ARRAY['public']); -- This command adds all tables in the public schema to the default replication set.

-- Now, repeat this on Server B using the same method above i.e. via a VM in the specified VNET or Azure Bastion Host

CREATE EXTENSION IF NOT EXISTS pglogical;

-- Define the replication set for Server B, specifying which tables to replicate and the types of operations to include (inserts, updates, deletes)

SELECT pglogical.create_node(node_name := 'node_b', dsn := 'host-fqdn-for-server-b port=5432 dbname=preferred-database user=replication_user password=<strong_password>');

SELECT pglogical.create_replication_set( set_name := 'node_b_set', replicate_insert := true, replicate_update := true, replicate_delete := true,

replicate_truncate := false);

-- Add sales_aus_east table explicitly

SELECT pglogical.replication_set_add_table( set_name := 'node_b_set', relation := 'public.sales_aus_east', synchronize_data := true);

-- Add purchase_aus_east table explicitly

SELECT pglogical.replication_set_add_table( set_name := 'node_b_set', relation := 'public.purchase_aus_east', synchronize_data := true);

-- OR add all tables in the public schema

SELECT pglogical.replication_set_add_all_tables('default', ARRAY['public']); -- This command adds all tables in the public schema to the default replication set.

It is recommended that you confirm the DNS resolution on all server’s involved as part of the replication process. For a VNET injected scenarios – you must get back the private IP.

As a sanity check, you can run the nslookup on the target server’s FQDN or even use the \conninfo command to see the connection details. One such example is here:

Step 4: Configuring the subscribers

SELECT pglogical.create_subscription ( -- Create a subscription on Server A to receive changes from Server B

subscription_name := 'node_a_to_node_b',

replication_sets := array['default'],

synchronize_data := true,

forward_origins := '{}',

provider_dsn := 'host=fqdn-for-server-b port=5432 dbname=preferred-database user=replication_user password=<strong_password>');

-- Run this on Server B to subscribe to changes from Server A

SELECT pglogical.create_subscription ( -- Create a subscription on Server B to receive changes from Server A

subscription_name := 'node_b_to_node_a',

replication_sets := array['default'],

synchronize_data := true,

forward_origins := '{}',

provider_dsn := 'host=fqdn-for-server-a port=5432 dbname=preferred-database user=replication_user password=<strong_password>');

For most OLTP workloads, last_update_wins using the commit timestamp is the most practical choice. It requires track_commit_timestamp = on, which you must set as a server parameter.

The FQDN must be used rather than using the direct private IP of the server itself.

Bidirectional replication between server instances with private endpoints – does this work and will this make your server security posture weak?

Where do pglogical workers run?

With VNET injection, the server's network interface lives inside your delegated subnet which is a must do. The PostgreSQL process including all pglogical background workers starts connections from within your VNET (delegated subnet). The routing tables, NSGs, and peering apply to both inbound and outbound traffic from the server.

With Private Endpoint, the architecture is fundamentally different:

Private endpoint is a one-way private channel for your clients or applications to reach the server securely. It does not give the any of server’s internal processes access to your VNET for outbound connectivity.

pglogical subscription workers trying to connect to another server are starting those connections from Microsoft's managed infrastructure and not from your VNET.

What works?

Scenario A: Client connectivity via private endpoint

Here you have application servers or VMs in your VNET connecting to a server configured with a private endpoint, your app VM connects to 10.0.0.15 (the private endpoint NIC), traffic flows over Private Link to the server, and everything stays private. This is not server-to-server replication.

Scenario B: Two servers, both with private endpoints

Here both servers are in Microsoft's managed network. They can reach each other's public endpoints, but not each other's private endpoints (which are in customer VNETs). The only path for bidirectional replication worker connections is to enable public network access on both servers with firewall rules locked down to Azure service IP.

Here you have private endpoints deployed alongside public access. Inside your VNET, SERVER A resolves to the private endpoint IP via the privatelink.postgres.database.azure.com private DNS zone. But the pglogical worker running in Microsoft's network does not have access to your private DNS zone and it resolves via public DNS, which returns the public IP.

This means if you are using the public FQDN for replication, the resolution path is consistent from the server's perspective (always public DNS, always public IP using the allow access to Azure services flag as shown above). Your application clients in the VNET will still resolve to the private endpoint.

If your requirement is genuinely private replication with no public endpoint exposure, VNET injection is the correct answer, and private endpoint cannot replicate that capability for pglogical.

Conclusion

The most compelling benefit in the VNET-injected topology is network isolation without sacrificing replication capability. You get the security posture of private connectivity i.e. no public endpoints, NSG controlled traffic, private DNS resolution all while keeping a live bidirectional data pipeline. This satisfies most enterprise compliance requirements around data transit encryption and network boundary control.

The hub/spoke migration (specifically, on-premises or external cloud to Azure) scenarios are where this approach shines. The ability to run both systems in production simultaneously, with live bidirectional sync during the cutover window, reduces migration risk when compared to a hard cutover.

From a DR perspective, bidirectional pglogical gives you an RPO measured in seconds (replication lag dependent) without the cost of synchronous replication. For workloads that can tolerate eventual consistency and have well-designed conflict avoidance this is a compelling alternative to synchronous streaming replication via read replicas, which are strictly unidirectional.

PostgreSQL Query Rewriting and Subqueries

Gayathri_Paderla — Thu, 26 Mar 2026 20:40:00 GMT

In PostgreSQL, the way a query is written has a direct impact on the execution plan the optimizer selects—and consequently on the overall performance of the system. Although the optimizer evaluates multiple plan alternatives, it can only consider strategies that the SQL structure logically allows. Suboptimal query patterns often force PostgreSQL into performing unnecessary work, such as processing more rows than needed, repeating lookups row‑by‑row or applying expensive deduplication or aggregation operations after an overly large join.

By rewriting queries in more optimizer‑friendly forms, we can drastically reduce the amount of data scanned, improve join efficiency, eliminate redundant operations, and enable PostgreSQL to choose faster, more scalable execution paths. The scenarios in this document demonstrate how simple transformations—such as replacing joins with semi‑joins, pre‑aggregating large tables, or converting correlated subqueries into set‑based joins—allow PostgreSQL to leverage more efficient physical operators and achieve significant performance gains.

This document explains common PostgreSQL scenarios, where rewriting a query into a subquery (or rewriting a subquery into a different form) can significantly improve performance. Each scenario includes an explanation, example rewrite, and a self-contained test script you can run.

The below test was performed on a 4 vcore SKU with 128 GB storage disk.

Test setup/ script:

DROP TABLE IF EXISTS orders; DROP TABLE IF EXISTS customers; CREATE TABLE customers ( customer_id int PRIMARY KEY, region text NOT NULL ); CREATE TABLE orders ( order_id bigserial PRIMARY KEY, customer_id int NOT NULL REFERENCES customers(customer_id), created_at timestamptz NOT NULL ); INSERT INTO customers SELECT g, CASE WHEN g % 5 = 0 THEN 'E' ELSE 'W' END FROM generate_series(1,200000) g; INSERT INTO orders(customer_id, created_at) SELECT 1 + ((g*37) % 200000), now() - ((g % 365) || ' days')::interval FROM generate_series(1,5000000) g; CREATE INDEX orders_customer_id_idx ON orders(customer_id); ANALYZE customers; ANALYZE orders;

--------------------------------------------------

Scenario 1: Semi-join filtering (JOIN → EXISTS)

--------------------------------------------------

When it helps:

Use this when you only need to check for the existence of related rows and do not need columns from the joined table. EXISTS allows PostgreSQL to use a semi-join strategy and avoids row multiplication.

Before:

SELECT DISTINCT c.customer_id FROM customers c JOIN orders o ON o.customer_id = c.customer_id WHERE c.region = 'E';

Query plan:

Explanation:

This query forces PostgreSQL to:
• Perform a full join between customers and orders
• Produce multiple rows per customer (one for each matching order)
• Apply a Unique/Aggregate step to remove duplicates

This results in:

More rows flowing through the join
Higher memory usage
An expensive deduplication step
Possible large hash tables or repeated nested loop iterations

After:

SELECT c.customer_id FROM customers c WHERE c.region = 'E' AND EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id );

Query plan:

Explanation:

The rewritten query using EXISTS, PostgreSQL can switch to a Semi Join, which has two crucial advantages:

Stops scanning orders after the first match
A semi‑join checks only for the existence of at least one matching row — it does not need all of them. Early termination dramatically reduces I/O.
No need for DISTINCT
Because no row multiplication occurs, the result is naturally unique.

Why performance improves

Fewer rows scanned
No row explosion
No sort/aggregate for DISTINCT
Semi‑join is cheaper both in CPU and memory

Net effect: Large reduction in join work and downstream processing.

--------------------------------------------------

Scenario 2: Pre-aggregate using a subquery

--------------------------------------------------

When it helps:

If you only need aggregated results from a large fact table, aggregate first in a subquery and then join. This reduces join cardinality and memory usage.

Before:

SELECT c.region, count(*) AS order_ct FROM customers c JOIN orders o ON o.customer_id = c.customer_id GROUP BY c.region;

Query plan:

Explanation:

The original query aggregates at the end, after joining the entire orders table with customers.

The join processes millions of rows
Grouping is done on the expanded join result
Memory consumption grows significantly
Hash join must handle a very large input

After:

SELECT c.region, sum(x.order_ct) AS order_ct FROM customers c JOIN ( SELECT customer_id, count(*) AS order_ct FROM orders GROUP BY customer_id ) x ON x.customer_id = c.customer_id GROUP BY c.region;

Query plan:

Explanation:

The rewritten query aggregates orders before joining

This reduces the orders table from millions of rows to just thousands (one per customer). PostgreSQL can now join a tiny, aggregated set with customers.

PostgreSQL’s plan changes accordingly:
• HashAggregate on orders (small output)
• Join happens on significantly reduced data
• Final grouping is trivial because input has already been summarized

Why performance improves

Drastic reduction in join input size
Much smaller hash tables
Fewer rows grouped at the final stage
Less disk spill risk
Better CPU and memory efficiency

Net effect: Doing heavy work early shrinks the workload for the rest of the query.

--------------------------------------------------

Scenario 3: Correlated scalar subquery → JOIN

--------------------------------------------------

When it helps:

Correlated scalar subqueries can behave like N+1 queries. Rewriting them as joins allows PostgreSQL to use more efficient join strategies.

Before:

SELECT o.order_id, (SELECT c.region FROM customers c WHERE c.customer_id = o.customer_id) AS region FROM orders o;

Query plan:

Explanation:

Correlated Scalar Subquery

A correlated subquery runs once per row of the outer table (orders).

PostgreSQL has no choice but to generate a parameterized Nested Loop, which means:

Scan one order
Perform an index lookup into customers
Repeat for every order

The performance degrades linearly with table size.

After:

SELECT o.order_id, c.region FROM orders o JOIN customers c ON c.customer_id = o.customer_id;

Query plan:

Explanation:

The rewritten query is a standard join removes the row‑by‑row dependency. Now PostgreSQL can:

Scan both tables once
Build a hash table on customers
Use a Hash Join or Merge Join to match rows efficiently

This transforms the execution pattern from row‑by‑row lookups to a set‑based join, which is far more efficient.

Why performance improves

Eliminates repeated index probes
Replaces O(N) loops with a single O(N) scan + O(N) hash join
Better cache utilization
Fully parallelizable (semi‑correlated loops are not)

Net effect: Orders and customers are processed together in a single, optimized join rather than tens or hundreds of thousands of micro‑queries.

Conclusion:

The rewritten queries enable PostgreSQL to choose far more efficient physical operations—such as semi‑joins, early aggregation, and set‑based hash/merge joins—dramatically reducing row processing, memory usage, and repetitive work. Overall, these improvements streamline execution paths, allowing the optimizer to operate on smaller, cleaner datasets and produce faster, more scalable query plans.

GraphRAG and PostgreSQL integration in docker with Cypher query and AI agents (Version 2*)

Helen_Zeng — Mon, 23 Mar 2026 12:08:09 GMT

This is update from previous blog (version 1): GraphRAG and PostgreSQL integration in docker with Cypher query and AI agents | Microsoft Community Hub

Review the business needs of this solution from version 1

What's new in version 2?

MCP tools for GraphRAG and PostgreSQL with Apache AGE

This solution now includes MCP tools for GraphRAG and PostgreSQL. There are five MCP tools exposed:
- [graphrag_search]
  Used to run query (local or global) with runtime-tunable API parameters. One important aspect is that query behavior can be tuned at runtime, without changing the underlying index.
- [age_get_schema_cached]
  Used for schema inspection and diagnostics. It returns the graph schema (node labels and relationship types) from cache by default; and can optionally refresh the cache by re‑querying the database. This tool is typically used for introspection or debugging, not for answering user questions about data.
- [age_entity_lookup]
  Used for quick entity discovery and disambiguation. It performs a simple substring match on entity names or titles and is especially useful for questions like “Who is X?” or as a preliminary step before issuing more complex graph queries.
- [age_cypher_query]
  Executes a user‑provided Cypher query directly against the AGE graph. This is intended for advanced users who already know the graph structure and want full control over traversal logic and filters.
- [age_nl2cypher_query]
  Bridges natural language and Cypher. This tool converts a natural‑language question into a Cypher query (using only Entity nodes and RELATED_TO edges), executes it, and returns the results. It is most effective for multi‑hop or structurally complex questions where semantic interpretation is needed first, but execution must remain deterministic.

Besides that,

This solution now uses Microsoft agent framework. It enables clean orchestration over MCP tools, allowing the agent to dynamically select between GraphRAG and graph query capabilities at runtime, with a looser coupling and clearer execution model than traditional Semantic Kernel function plugins.
The new Docker image includes graphRAG3.0.5. This version stabilizes the 3.x configuration‑driven, API‑based architecture and improves indexing reliability, making graph construction more predictable and easier to integrate into real workflows.

New architecture

Updated Step 7 - run query in Jupyter notebook

This step runs Jupyter notebook in docker, which is the same as stated in previous blog.
> docker compose up query-notebook

After clicking the link highlighted in the above screen shot, you can explore all files within the project in the docker, then find the query-notebook.ipynb.

https://github.com/Azure-Samples/postgreSQL-graphRAG-docker/blob/main/project_folder/query-notebook.ipynb

But in this new version of notebook, the graphRAG3.0.5 uses different library for local Search and global Search.

New Step 8 - run agent and MCP tools in Jupyter notebook

This step runs Jupyter notebook in docker.
> docker compose up mcp-agent

Click on the highlighted URL, you can start working on agent-notebook.ipynb.
https://github.com/Azure-Samples/postgreSQL-graphRAG-docker/blob/main/project_folder/agent-notebook....

Multiple scenarios of agents with MCP tools are included in the notebook:

GraphRAG search: local search and global search examples with direct mcp call.
GraphRAG search: local search and global search examples with agent and include mcp tools.
Cypher query in direct mcp call.
Agent to query in natural language, and mcp tool included to convert NL2Cypher.
Agent with unified mcp (all five mcp tools), and based on the question route to the corresponding tool.
- ['graphrag_search', 'age_get_schema_cached', 'age_cypher_query', 'age_entity_lookup', 'age_nl2cypher_query']

Router agent: selecting the right MCP tool

The notebook also includes a router agent that has access to all five MCP tools and decides which one to invoke based on the user’s question. Rather than hard‑coding execution paths, the agent reasons about intent and selects the most appropriate capability at runtime.

General routing guidance used in this solution

Use [graphrag_search] when the question requires:

full dataset understanding,
themes, patterns, or trends across documents,
exploratory or open‑ended analysis,
global understanding or evaluation where we have a corpus of many tokens.

In these cases, GraphRAG’s semantic retrieval and aggregation are a better fit than explicit graph traversal.

Use AGE‑based tools
[age_get_schema_cached, age_entity_lookup, age_cypher_query, age_nl2cypher_query] when the question involves:

specific entities or explicit relationships,
deterministic graph traversal or filtering,
questions that depend on graph structure rather than document semantics,
complex graph queries involving multiple entities or multi‑hop paths.

Within the AGE toolset:

[age_entity_lookup] is typically used for quick entity discovery or disambiguation.
[age_cypher_query] is used when a precise Cypher query is already known.
[age_nl2cypher_query] is used when the question is expressed in natural language but requires a non‑trivial Cypher query to answer.
[age_get_schema_cached] is reserved for schema inspection and diagnostics.

The router agent dynamically selects between semantic search and deterministic graph tools based on question intent, keeping retrieval, graph execution, and orchestration clearly separated and extensible.

Note: The repository also includes [age_get_schema] and [age_get_schema_details] MCP tools for debugging and development purposes. These are not exposed to agents by default and are superseded by [age_get_schema_cached] for normal use.

Key takeaways

GraphRAG and postgreSQL AGE querying serve different purposes and each has its advantages.
MCP tools provide a uniform interface to both semantic search and deterministic graph operations.
Microsoft Agent Framework enables tool‑centric orchestration, where agents select the right capability at runtime instead of hard‑coding logic in prompts.
The Jupyter‑based agent workflow makes it easy to experiment with different interaction patterns, from direct tool calls to fully routed agent execution.

What's next

In this solution, the MCP server and agent runtime are architecturally separated but deployed together in a single Docker container to demonstrate how MCP tools work and to keep local experimentation simple.
There are other deployment options, such as running MCP servers remotely, where tools can be hosted and operated independently of the agent runtime. Contributions and enhancements are welcome.

February 2026 Recap: Azure Database for PostgreSQL

gauri-kasar — Mon, 16 Mar 2026 01:25:38 GMT

Hello Azure Community,

We’re excited to share the February 2026 recap for Azure Database for PostgreSQL, featuring a set of updates focused on speed, simplicity, and better visibility. From Terraform support for Elastic Clusters and a refreshed VM SKU selection experience in the Azure portal to built‑in Grafana dashboards, these improvements make it easier to build, operate, and scale PostgreSQL on Azure. This recap also includes practical GIN index tuning guidance, enhancements to the PostgreSQL VS Code extension, and improved connectivity for azure_pg_admin users.

Features

Terraform support for Elastic Clusters - Generally Available
Dashboards with Grafana - Generally Available
Easier way to choose VM SKUs on portal – Generally Available
What’s New in the PostgreSQL VS Code Extension
Priority Connectivity to azure_pg_admin users
Guide on 'gin_pending_list_limit' indexes

Terraform support for Elastic Clusters

Terraform now supports provisioning and managing Azure Database for PostgreSQL Elastic Clusters, enabling customers to define and operate elastic clusters using infrastructure‑as‑code workflows. With this support, it is now easier to create, scale, and manage multi‑node PostgreSQL clusters through Terraform, making it easier to automate deployments, replicate environments, and integrate elastic clusters into CI/CD pipelines. This improves operational consistency and simplifies management for horizontally scalable PostgreSQL workloads.

Learn more about building and scaling with Azure Database for PostgreSQL elastic clusters.

Dashboards with Grafana — Now Built-In

Grafana dashboards are now natively integrated into the Azure Portal for Azure Database for PostgreSQL. This removes the need to deploy or manage a separate Grafana instance. With just a few clicks, you can visualize key metrics and logs side by side, correlate events by timestamp, and gain deep insights into performance, availability, and query behavior all in one place.

Whether you're troubleshooting a spike, monitoring trends, or sharing insights with your team, this built-in experience simplifies day-to-day observability with no added cost or complexity.

Try it under Azure Portal > Dashboards with Grafana in your PostgreSQL server view.

For more details, see the blog post: Dashboards with Grafana — Now in Azure Portal for PostgreSQL.

Easier way to choose VM SKUs on portal

We’ve improved the VM SKU selection experience in the Azure portal to make it easier to find and compare the right compute options for your PostgreSQL workload. The updated experience organizes SKUs in a clearer, more scannable view, helping you quickly compare key attributes like vCores and memory without extra clicks. This streamlined approach reduces guesswork and makes selecting the right SKU faster and more intuitive.

What’s New in the PostgreSQL VS Code Extension

The VS Code extension for PostgreSQL helps developers and database administrators work with PostgreSQL directly from VS Code. It provides capabilities for querying, schema exploration, diagnostics, and Azure PostgreSQL management allowing users to stay within their editor while building and troubleshooting. This release focuses on improving developer productivity and diagnostics. It introduces new visualization capabilities, Copilot-powered experiences, enhanced schema navigation, and deeper Azure PostgreSQL management directly from VS Code.

New Features & Enhancements

Query Plan Visualization: Graphical execution plans can now be viewed directly in the editor, making it easier to diagnose slow queries without leaving VS Code.
AGE Graph Rendering: Support is now available for automatically rendering graph visualizations from Cypher queries, improving the experience of working with graph data in PostgreSQL.
Object Explorer Search: A new graphical search experience in Object Explorer allows users to quickly find tables, views, functions, and other objects across large schemas, addressing one of the highest-rated user feedback requests.
Azure PostgreSQL Backup Management: Users can now manage Azure Database for PostgreSQL backups directly from the Server Dashboard, including listing backups and configuring retention policies.
Server Logs Dashboard: A new Server Dashboard view surfaces Azure Database for PostgreSQL server logs and retention settings for faster diagnostics. Logs can be opened directly in VS Code and analyzed using the built-in GitHub Copilot integration.

This release also includes several reliability improvements and bug fixes, including resolving connection pool exhaustion issues, fixing Docker container creation failures when no password is provided, and improving stability around connection profiles and schema-related operations.

Priority Connectivity to azure_pg_admin Users

Members of the azure_pg_admin role can now use connections from the pg_use_reserved_connections pool. This ensures that an admin always has at least one available connection, even if all standard client connections from the server connection pool are in use. By making sure admin users can log in when the client connection pool is full, this change prevents lockout situations and lets admins handle emergencies without competing for available open connection slots.

Guide on 'gin_pending_list_limit' indexes

Struggling with slow GIN index inserts in PostgreSQL? This post dives into the often-overlooked gin_pending_list_limit parameter and how it directly impacts insert performance. Learn how GIN’s pending list works, why the right limit matters, and practical guidance on tuning it to strike the perfect balance between write performance and index maintenance overhead.

For a deeper dive into gin_pending_list_limit and tuning guidance, see the full blog here.

Learning Bytes

Create Azure Database for PostgreSQL elastic clusters with terraform:

Elastic clusters in Azure Database for PostgreSQL let you scale PostgreSQL horizontally using a managed, multi‑node architecture. With Elastic cluster now generally available, you can provision and manage elastic clusters using infrastructure‑as‑code, making it easier to automate deployments, standardize environments, and integrate PostgreSQL into CI/CD workflows.

Elastic clusters are a good fit when you need:

Horizontal scale for large or fast‑growing PostgreSQL workloads
Multi‑tenant applications or sharded data models
Repeatable and automated deployments across environments

The following example shows a basic Terraform configuration to create an Azure Database for PostgreSQL flexible server configured as an elastic cluster.

resource "azurerm_postgresql_flexible_server" "elastic_cluster" { name = "pg-elastic-cluster" resource_group_name = <rg-name> location = <region> administrator_login = var.admin_username administrator_password = var.admin_password version = "17" sku_name = "GP_Standard_D4ds_v5" storage_mb = 131072 cluster { size = 3 } }

Conclusion

That’s a wrap for the February 2026 Azure Database for PostgreSQL recap. We’re continuing to focus on making PostgreSQL on Azure easier to build, operate, and scale whether that’s through better automation with Terraform, improved observability, or a smoother day‑to‑day developer and admin experience. Your feedback is important to us, have suggestions, ideas, or questions? We’d love to hear from you: https://aka.ms/pgfeedback.

Understanding Hash Join Memory Usage and OOM Risks in PostgreSQL

FranciscoPardillo — Mon, 09 Mar 2026 13:38:54 GMT

Background: Why Memory Usage May Exceed work_mem

work_mem is commonly assumed to be a hard upper bound on per‑query memory usage.

However, for Hash Join operations, memory consumption depends not only on this parameter but also on:

✅ Data cardinality

✅ Hash table internal bucket distribution

✅ Join column characteristics

✅ Number of batches created

✅ Parallel workers involved

Under low‑cardinality conditions, a Hash Join may place an extremely large number of rows into very few buckets—sometimes a single bucket. This causes unexpectedly large memory allocations that exceed the nominal work_mem limit.

Background: What work_mem really means for Hash Joins

work_mem controls the amount of memory available per operation (e.g., a sort or a hash) per node (and per parallel worker) before spilling to disk. Hash operations can additionally use hash_mem_multiplier×work_mem for their hash tables. [postgresql.org], [postgresqlco.nf]
The Hash Join algorithm builds a hash table for the “build/inner” side and probes it with the “outer” side. The table is split into buckets; if it doesn’t fit in memory, PostgreSQL partitions work into batches (spilling to temporary files). Skewed distributions (e.g., very few distinct join keys) pack many rows into the same bucket(s), exploding memory usage even when work_mem is small. [postgrespro.com], [interdb.jp]
In EXPLAIN (ANALYZE) you’ll see Buckets:, Batches:, and Memory Usage: on the Hash node; Batches > 1 indicates spilling/partitioning. [postgresql.org], [thoughtbot.com]
The default for hash_mem_multiplier is version‑dependent (introduced in PG13; 1.0 in early versions, later 2.0). Tune with care; it scales the memory that hash operations may consume relative to work_mem. [pgpedia.info]

A safe, reproducible demo (containerized community PostgreSQL)

The goal is to show that data distribution alone can drive order(s) of magnitude difference in hash table memory, using conservative settings.

In order to simulate the behavior we´ll use pg_hint_plan extension to guide the execution plans and create some data distribution that may not have a good application logic, just to force and show the behavior.

Start PostgreSQL 16 container

docker run --name=postgresql16.8 -p 5414:5432 -e POSTGRES_PASSWORD=<password> -d postgres:16.8 docker exec -it postgresql16.8 /bin/bash -c "apt-get update -y;apt-get install procps -y;apt-get install postgresql-16-pg-hint-plan -y;apt-get install vim -y;apt-get install htop -y" docker exec -it postgresql16.8 /bin/bash vi /var/lib/postgresql/data/postgresql.conf -- Adding pg_hint_plan to shared_preload_libraries psql -h localhost -U postgres create extension pg_hint_plan; docker stop postgresql16.8 docker start postgresql16.8

To connect to our docker container we use:

psql -h localhost -p 5414 -U postgres

Connect and apply conservative session-level settings

We’ll discourage other join methods so the planner prefers Hash Join—without needing any extension.

set hash_mem_multiplier=1; set max_parallel_workers=0; set max_parallel_workers_per_gather=0; set enable_parallel_hash=off; set enable_material=off; set enable_sort=off; set pg_hint_plan.debug_print=verbose; set client_min_messages=notice; set pg_hint_plan.enable_hint_table=on;

Create tables and load data

We´ll create two tables for the join, table_1, with a single row, table_h initially with 10mill rows

drop table table_s; create table table_s (column_a text); insert into table_s values ('30020'); vacuum full table_s; drop table table_h; create table table_h(column_a text,column_b text); INSERT INTO table_h(column_a,column_b) SELECT i::text, i::text FROM generate_series(1, 10000000) AS t(i); vacuum full table_h;

Run Hash Join (high cardinality)

We´ll run the join using column_a in both tables, that was created previously having high cardinality in table_h

explain (analyze,buffers,costs,verbose) SELECT /*+ HashJoin(s h) Leading((s h)) */ COUNT(*) FROM table_s s JOIN table_h h ON s.column_a= h.column_a;

You should see a Hash node with small Memory Usage (a few MB) and Batches: 256 or similar due to our tiny work_mem, but no ballooning. Exact numbers vary by hardware/version/stats. (EXPLAIN fields and interpretation are documented here.) [postgresql.org]

QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=280930.01..280930.02 rows=1 width=8) (actual time=1902.965..1902.968 rows=1 loops=1) Output: count(*) Buffers: shared read=54055, temp read=135 written=34041 -> Hash Join (cost=279054.00..280805.01 rows=50000 width=0) (actual time=1900.539..1902.949 rows=1 loops=1) Hash Cond: (s.column_a = h.column_a) Buffers: shared read=54055, temp read=135 written=34041 -> Seq Scan on public.table_s s (cost=0.00..1.01 rows=1 width=32) (actual time=0.021..0.022 rows=1 loops=1) Output: s.column_a Buffers: shared read=1 -> Hash (cost=154054.00..154054.00 rows=10000000 width=32) (actual time=1896.895..1896.896 rows=10000000 loops=1) Output: h.column_a Buckets: 65536 Batches: 256 Memory Usage: 2031kB Buffers: shared read=54054, temp written=33785 -> Seq Scan on public.table_h h (cost=0.00..154054.00 rows=10000000 width=32) (actual time=2.538..638.830 rows=10000000 loops=1) Output: h.column_a Buffers: shared read=54054 Query Identifier: 334721522907995613 Planning: Buffers: shared hit=10 Planning Time: 0.302 ms JIT: Functions: 11 Options: Inlining false, Optimization false, Expressions true, Deforming true Timing: Generation 0.441 ms, Inlining 0.000 ms, Optimization 0.236 ms, Emission 2.339 ms, Total 3.017 ms Execution Time: 1903.472 ms (25 rows)

Findings (1) When we have the data totally distributed with high cardinality it takes only 2031kB of memory usage (work_mem), shared hit/read=54055

Force low cardinality / skew and re‑run

We´ll update table_h to have column_a all values to '30020', so, having only 1 distinct value for all the rows in the table

update table_h set column_a='30020', column_b='30020'; vacuum full table_h;

Checking execution plan:

QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=279056.04..279056.05 rows=1 width=8) (actual time=3568.936..3568.938 rows=1 loops=1) Output: count(*) Buffers: shared read=54056, temp read=63480 written=63480 -> Hash Join (cost=279055.00..279056.03 rows=1 width=0) (actual time=2650.696..3228.610 rows=10000000 loops=1) Hash Cond: (s.column_a = h.column_a) Buffers: shared read=54056, temp read=63480 written=63480 -> Seq Scan on public.table_s s (cost=0.00..1.01 rows=1 width=32) (actual time=0.007..0.008 rows=1 loops=1) Output: s.column_a Buffers: shared read=1 -> Hash (cost=154055.00..154055.00 rows=10000000 width=7) (actual time=1563.987..1563.989 rows=10000000 loops=1) Output: h.column_a Buckets: 131072 (originally 131072) Batches: 512 (originally 256) Memory Usage: 371094kB Buffers: shared read=54055, temp written=31738 -> Seq Scan on public.table_h h (cost=0.00..154055.00 rows=10000000 width=7) (actual time=2.458..606.422 rows=10000000 loops=1) Output: h.column_a Buffers: shared read=54055 Query Identifier: 334721522907995613 Planning: Buffers: shared hit=6 read=1 dirtied=1 Planning Time: 0.237 ms JIT: Functions: 11 Options: Inlining false, Optimization false, Expressions true, Deforming true Timing: Generation 0.330 ms, Inlining 0.000 ms, Optimization 0.203 ms, Emission 2.311 ms, Total 2.844 ms Execution Time: 3584.439 ms (25 rows)

Now, the Hash node typically reports hundreds of MB of Memory Usage, with more/larger temp spills (higher Batches, more temp_blks_*). What changed? Only the distribution (cardinality). (Why buckets/batches behave this way is covered in the algorithm references.) [postgrespro.com], [interdb.jp]

Findings (2) When we have the data distributed with LOW cardinality it takes 371094kB of memory usage (work_mem), shared hit/read=54056

So, same amount of data being handled by the query in terms of shared memory, but totally different work_mem usage pattern due to the low cardinality and the join method (Hash Join) that put most of those rows in a single bucket and that is not limited by default, so, it can cause OOM errors at any time.

Scale up rows to observe linear growth

We´ll add same rows in table_h repeat times so we can play with more data (low cardinality)

insert into table_h select * from table_h; vacuum full table_h;

You’ll see Memory Usage and temp I/O scale with rowcount under skew. (Beware: this can become I/O and RAM heavy—do this incrementally.) [thoughtbot.com]

NumRows table_h	Shared read/hit	Written	Temp read/written	Memory Usage (work_mem)
10M	54056		63480+63480	371094kB
20M	108110		126956+126956	742188kB
80M	432434 (1,64GB)	0	253908+253908	2968750kB (2,8GB)

Observability: what you will (and won’t) see

EXPLAIN is your friend

EXPLAIN (ANALYZE, BUFFERS) exposes Memory Usage, Buckets:, Batches: in the Hash node and temp block I/O. Batches > 1 is a near‑certain sign of spilling. [postgresql.org], [thoughtbot.com]

Query Store / pg_stat_statements limitations
- Azure Database for PostgreSQL – Flexible Server Query Store aggregates runtime and (optionally) wait stats over time windows and stores them in azure_sys, with views under query_store.*. It’s fantastic to find which queries chew CPU/I/O or wait, but it doesn’t report per‑query transient memory usage (e.g., “how many MB did that hash table peak at?”) you can estimate reviewing the temporary blocks. [learn.microsoft.com]
- Under the hood, what you do get—whether via Query Store or vanilla PostgreSQL pg_stat_statements—are cumulative counters like shared_blks_read, shared_blks_hit, temp_blks_read, temp_blks_written, timings, etc. Those help confirm buffer/temp activity, yet no direct hash table memory metric exists. Combine them with EXPLAIN and server metrics to triangulate. [postgresql.org]

Tip (Azure Flexible Server)
Enable Query Store in Server parameters via pg_qs.query_capture_mode and (optionally) wait sampling via pgms_wait_sampling.query_capture_mode, then use query_store.qs_view to correlate temp block usage and execution times across intervals. [learn.microsoft.com]

Typical OOM symptom in logs

In extreme skew with concurrent executions, you may encounter:

ERROR: out of memory

DETAIL: Failed on request of size 32800 in memory context "HashBatchContext".

This is a classic signature of hash join memory pressure. [postgresql.org], [thisistheway.wiki]

What to do about it (mitigations & best practices)

Don’t force Hash Join unless required
If you used planner hints (e.g., pg_hint_plan) or GUCs (Grand Unified Configuration) to force Hash Join, remove them and let the planner re‑evaluate. (If you must hint, be aware pg_hint_plan is a third‑party extension and not available in all environments.) [pg-hint-pl...thedocs.io], [pg-hint-pl...thedocs.io]
Fix skew / cardinality at the source
- Re‑model data to avoid low‑NDV (Number of Distinct Values in a column) joins (e.g., pre‑aggregate, filter earlier, or exclude degenerate keys).
- Ensure statistics are current so the planner estimates are realistic. (Skew awareness is limited; poor estimates → risky sizing.) [postgresql.org]
Pick a safer join strategy when appropriate
If distribution is highly skewed, Merge Join (with supporting indexes/sort order) or Nested Loop (for selective probes) might be more memory‑predictable. Let the planner choose, or enable alternatives by undoing GUCs that disabled them. [postgresql.org]
Bound memory consciously
- Keep work_mem modest for mixed/OLTP workloads; remember it’s per operation, per node, per worker.
- Adjust hash_mem_multiplier judiciously (introduced in PG13; default now commonly 2.0) if you understand the spill trade‑offs. [postgresqlco.nf], [pgpedia.info]
Observe spills and tune iteratively
Use EXPLAIN (ANALYZE, BUFFERS) to see Batches (spills) and Memory Usage; use Query Store/pg_stat_statements to find which queries generate the most temp I/O. Raise work_mem for a session only when justified. [postgresql.org], [postgresql.org]
Parallelism awareness
Each worker can perform its own memory‑using operations; parallel hash join has distinct behavior. If you aren’t sure, temporarily disable parallelism to simplify analysis, then re‑enable once you understand the footprint. [postgresql.org]

Validating on Azure Database for PostgreSQL – Flexible Server

The behavior is not Azure‑specific, but you can reproduce the same sequence on Flexible Server (e.g., General Purpose). A few notes:

Confirm/adjust work_mem, hash_mem_multiplier, enable_* planner toggles as session settings. (Azure exposes standard PostgreSQL parameters.) [learn.microsoft.com]
Use Query Store to confirm stable shared/temporary block patterns across executions, then use EXPLAIN (ANALYZE, BUFFERS) per query to spot hash table memory footprints. [learn.microsoft.com], [postgresql.org]
Changing some default parameters

We´ll repeat previous steps in Azure Database for PostgreSQL Flexible server:

Creating and populating tables

Query & Execution plan

explain (analyze,buffers,costs,verbose) SELECT /*+ HashJoin(s h) Leading((s h)) */ COUNT(*) FROM table_s s JOIN table_h h ON s.column_a= h.column_a;

QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=279052.88..279052.89 rows=1 width=8) (actual time=3171.186..3171.191 rows=1 loops=1) Output: count(*) Buffers: shared hit=33 read=54023, temp read=135 written=34042 I/O Timings: shared read=184.869, temp read=0.278 write=333.970 -> Hash Join (cost=279051.84..279052.88 rows=1 width=0) (actual time=3147.288..3171.182 rows=1 loops=1) Hash Cond: (s.column_a = h.column_a) Buffers: shared hit=33 read=54023, temp read=135 written=34042 I/O Timings: shared read=184.869, temp read=0.278 write=333.970 -> Seq Scan on public.table_s s (cost=0.00..1.01 rows=1 width=32) (actual time=0.315..0.316 rows=1 loops=1) Output: s.column_a Buffers: shared read=1 I/O Timings: shared read=0.018 -> Hash (cost=154053.04..154053.04 rows=9999904 width=7) (actual time=3109.278..3109.279 rows=10000000 loops=1) Output: h.column_a Buckets: 131072 Batches: 256 Memory Usage: 2551kB Buffers: shared hit=32 read=54022, temp written=33786 I/O Timings: shared read=184.851, temp write=332.059 -> Seq Scan on public.table_h h (cost=0.00..154053.04 rows=9999904 width=7) (actual time=0.019..1258.472 rows=10000000 loops=1) Output: h.column_a Buffers: shared hit=32 read=54022 I/O Timings: shared read=184.851 Query Identifier: 5636209387670245929 Planning: Buffers: shared hit=37 Planning Time: 0.575 ms Execution Time: 3171.375 ms (26 rows)

Findings (3) In Azure Database for PostgreSQL Flexible server, when we have the data totally distributed with high cardinality it takes only 2551kB of memory usage (work_mem), shared hit/read=54056

Skew it to LOW cardinality

As we did previously, we change column_a to having only one different value in all rows in table_h

update table_h set column_a='30020', column_b='30020'; vacuum full table_h;

In this case we force the join method with pg_hint_plan:

explain (analyze,buffers,costs,verbose) SELECT /*+ HashJoin(s h) Leading((s h)) */ COUNT(*) FROM table_s s JOIN table_h h ON s.column_a= h.column_a; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=279056.04..279056.05 rows=1 width=8) (actual time=4397.556..4397.560 rows=1 loops=1) Output: count(*) Buffers: shared hit=2 read=54055, temp read=63480 written=63480 I/O Timings: shared read=89.396, temp read=90.377 write=300.290 -> Hash Join (cost=279055.00..279056.03 rows=1 width=0) (actual time=3271.145..3987.154 rows=10000000 loops=1) Hash Cond: (s.column_a = h.column_a) Buffers: shared hit=2 read=54055, temp read=63480 written=63480 I/O Timings: shared read=89.396, temp read=90.377 write=300.290 -> Seq Scan on public.table_s s (cost=0.00..1.01 rows=1 width=32) (actual time=0.006..0.008 rows=1 loops=1) Output: s.column_a Buffers: shared hit=1 -> Hash (cost=154055.00..154055.00 rows=10000000 width=7) (actual time=1958.729..1958.731 rows=10000000 loops=1) Output: h.column_a Buckets: 262144 (originally 262144) Batches: 256 (originally 128) Memory Usage: 371094kB Buffers: shared read=54055, temp written=31738 I/O Timings: shared read=89.396, temp write=149.076 -> Seq Scan on public.table_h h (cost=0.00..154055.00 rows=10000000 width=7) (actual time=0.159..789.449 rows=10000000 loops=1) Output: h.column_a Buffers: shared read=54055 I/O Timings: shared read=89.396 Query Identifier: 8893575855188549861 Planning: Buffers: shared hit=5 Planning Time: 0.157 ms Execution Time: 4414.268 ms (25 rows)

NumRows table_h	Shared read/hit	Written	Temp read/written	Memory Usage (work_mem)
10M	54056		63480+63480	371094kB
20M	108110		126956+126956	742188kB
80M	432434	0	253908+253908	2968750kB

We observe the same numbers compared with our docker installation.

See the extension docs for installation/usage and the hint table for cases where you want to force a specific join method. [pg-hint-pl...thedocs.io], [pg-hint-pl...thedocs.io]

FAQ

Q: I set work_mem = 4MB. Why did my Hash Join report ~371MB Memory Usage?
A: Hash joins can use up to hash_mem_multiplier × work_mem per hash table, and skew can cause large per‑bucket chains. Multiple nodes/workers multiply usage. work_mem is not a global hard cap. [postgresqlco.nf], [pgpedia.info]

Q: How do I know if a Hash Join spilled to disk?
A: In EXPLAIN (ANALYZE), Hash node shows Batches: N. N > 1 indicates partitioning and temp I/O; you’ll also see temp_blks_read/written in buffers and Temp I/O timings. [postgresql.org], [thoughtbot.com]

Q: Can Query Store tell me per‑query memory consumption?
A: Not directly. It gives time‑sliced runtime and wait stats (plus temp/shared block counters via underlying stats), but no “peak MB used by this hash table” metric. Use EXPLAIN and server metrics. [learn.microsoft.com], [postgresql.org]

Q: I hit “Failed on request … in HashBatchContext.” What’s that?
A: That’s an OOM raised by the executor while allocating memory. Reduce skew, avoid forced hash joins, or review per‑query memory and concurrency. [postgresql.org]

Ready‑to‑use mitigation checklist (DBA quick wins)

Remove joins hints/GUC overrides that force Hash Join; re‑plan. [pg-hint-pl...thedocs.io]
Refresh stats; confirm realistic rowcount/NDV estimates. [postgresql.org]
Consider alternate join strategies (Merge/Index‑Nested‑Loop) when skew is high. [postgresql.org]
Keep work_mem conservative for OLTP; consider session‑scoped bumps only for specific analytic queries. [postgresql.org]
Tune hash_mem_multiplier carefully only after understanding spill patterns. [postgresqlco.nf]
Use EXPLAIN (ANALYZE, BUFFERS) to verify Batches and Memory Usage. [postgresql.org]
Use Query Store/pg_stat_statements to find heavy temp/shared I/O offenders over time.

Dashboards with Grafana - Now in Azure Portal for PostgreSQL

varun-dhawan — Thu, 26 Feb 2026 19:19:45 GMT

Monitoring Azure Database for PostgreSQL just got significantly simpler. With the new Azure Monitor Dashboards with Grafana, you can visualize key metrics and logs directly inside Azure Portal - no Grafana servers to deploy, no configuration to manage, and no additional cost

In this post, we’ll walk through how these built-in Grafana dashboards help you troubleshoot faster, understand database behavior at a glance, and decide when you might still want Azure Managed Grafana for advanced scenarios.

Native Grafana Dashboards — No Setup, No Hosting, No Cost

We are thrilled to announce that Azure Database for PostgreSQL users can now access prebuilt Grafana dashboards directly within the Azure portal - with no additional cost or configuration required. This integration eliminates the complexity of deploying and administering self-hosted or managed Grafana instances. Grafana’s powerful visualization capabilities are now embedded directly in the Azure experience

From the moment you open the Azure Portal you have immediate access to dashboards for PostgreSQL. Simply navigate to Azure Database for PostgreSQL server in the Azure Portal and select “Dashboards with Grafana” and choose Featured dashboards. Within seconds, you have a rich, real-time view of your database server’s health - no custom queries or manual wiring required.

Figure 1: Azure Portal showing the “Dashboards with Grafana” blade , featuring the prebuilt monitoring dashboard tile.

Comprehensive PostgreSQL Metrics at a Glance

Figure 2: Azure PostgreSQL Grafana dashboard showing resource utilization, performance metrics, and server configuration.

As shown above, the new Grafana dashboard provides at-a-glance visibility into all the key metrics that matter for Azure Database for PostgreSQL. These dashboards are purpose-built to surface the health and performance of your database server, so you can immediately spot trends or issues.

Quick Configuration Snapshot

Figure 3: PostgreSQL server details, showing version, region, compute size, availability, and resource usage gauges.

Every monitoring session starts with instant answers to critical questions:

Is the server up?
Is High Availability configured?
How much storage is available?

The summary panel provides:

Instance status (Up/Down)
High Availability and replica configuration
Azure region, PostgreSQL version, and SKU
Live resource usage (CPU, memory, storage)

No extra clicks. No custom queries. Just clarity.

Metrics Coverage

The prebuilt dashboards visualize telemetry emitted by Azure Database for PostgreSQL, including:

Server availability and status
Active connections and connection failures
CPU and memory utilization
Storage usage and WAL consumption
Disk I/O (IOPS and throughput)
Network ingress and egress
Transaction rates, commits, and rollbacks

These metrics are collected via Azure Monitor platform metrics and refreshed at near-real-time intervals (depending on metric type). For a complete list, see the Azure Database for PostgreSQL monitoring documentation.

Metrics and Logs—Together

Ever struggled to trace a spike in CPU to the actual query behind it? With PostgreSQL logs and metrics visualized side-by-side, you can now correlate the exact timestamp of a CPU surge with detailed query logs in seconds.

Figure 4: CPU usage metrics co-relation with PostgreSQL log entries in Azure Monitor, highlighting slow query detection and log integration.

💡Note: To view logs in Grafana, make sure diagnostic settings are enabled to send PostgreSQL logs to Azure Monitor Logs. You can configure this in the Azure Portal under your PostgreSQL resource > Monitoring > Diagnostic settings. Learn how.

In the example above, high CPU usage (73.2%) aligns precisely with poor-running queries against a large salesorderdetail_big table. This helps engineers instantly validate and pinpoint slow queries without jumping between tools.

The unified Metric + Logs view, you can:

Plot query errors over time
Correlate failed logins with resource spikes
Investigate locking or memory pressure using timestamps

Grafana Explore mode is also available for deep-dive troubleshooting without altering dashboards.

First-Class Azure Integration

This is not just embedded Grafana - it is first-class Azure-native:

Dashboards are Azure resources, scoped to subscriptions and resource groups
Access is controlled using Azure RBAC
Dashboards can be exported and deployed via ARM templates
Easy sharing and migration across environments

You get the flexibility of open-source Grafana with Azure’s enterprise-grade governance.

Getting Started

To use the pre-built dashboard

Open the Azure portal
Navigate to Azure Database for PostgreSQL
Select Dashboards with Grafana
Open the PostgreSQL featured dashboard

To customize a dashboard:

Open the prebuilt PostgreSQL dashboard
Select Save As to create a copy
Modify panels or add new visualizations
Connect additional data sources (metrics or logs)
Save and share with your team

For advanced customization, refer to the Azure Monitor + Grafana Learn documentation.

When to Use Azure Managed Grafana?

Dashboards with Grafana in the Azure portal cover most common PostgreSQL monitoring scenarios. Azure Managed Grafana is still the right choice when you need:

Extended plugin support (community and OSS plugins)
Advanced authentication and provisioning APIs
Fine-grained, multi-tenant access control
Multi-cloud or hybrid data source connectivity

See the detailed comparison to choose the right option.

Learn More

Start visualizing your Azure PostgreSQL data instantly—right where you already work.

Nasdaq builds thoughtfully designed AI for board governance with PostgreSQL on Azure

charlesfeddersenMS — Wed, 25 Feb 2026 05:34:02 GMT

Authored by: Charles Federssen, Partner Director of Product Management for PostgreSQL at Microsoft and Mohsin Shafqat, Senior Manager, Software Engineering at Nasdaq

When people think of Nasdaq, they usually think of markets, trading floors, and financial data moving at extraordinary speed. But behind the scenes, Nasdaq also plays an equally critical role in how boards of directors govern, deliberate, and make decisions.

Nasdaq Boardvantage® is the company’s governance platform, used by more than 4,400 organizations worldwide—including nearly half of the Fortune 100. It’s where directors review board books, collaborate in an environment designed with robust security, and prepare for meetings that often involve some of the most sensitive information a company has.

In recent years, Nasdaq set out to modernize Nasdaq Boardvantage with AI, without compromising security and reliability. That journey was featured in a Microsoft Ignite session, “Nasdaq Boardvantage: AI-Driven Governance on PostgreSQL and Foundry.” It offers a practical look at how Azure Database for PostgreSQL can support AI-driven applications where precision, isolation, and data control are non-negotiable.

Introducing AI where trust is everything

Board governance isn’t a typical productivity workload. Board packets can run 400 to 600 pages, meeting minutes are legal records, and any AI-generated insight must be confined to a customer’s own data.

“Our customers trust us with some of their most strategic, sensitive data,” said Mohsin Shafqat, Senior Manager of Software Development at Nasdaq. That trust meant tackling several core challenges upfront, including:

How do you minimize AI hallucinations in a governance context?
How do you guarantee tenant isolation at scale?
How do you keep data regional across a global customer base?

A cloud foundation built for governance

Before adding intelligence, Nasdaq decided to re-architect Nasdaq Boardvantage on Microsoft Azure, using Azure Kubernetes Service (AKS) to run containerized, multi-tenant workloads with strong isolation boundaries. Microsoft Foundry provides the managed foundation for deploying, governing, and operating AI models across this architecture, adding consistency, security, and control as intelligence is introduced.

At the data layer, Azure Database for PostgreSQL and Azure Database for MySQL became the backbone for governance data. PostgreSQL, in particular, plays a central role in managing structured governance information alongside vector embeddings that support AI-driven features. Together, these services give Nasdaq the performance, security, and operational control required for a highly regulated, multi-tenant environment, while still moving quickly.

Key architectural choices included:

Tenant isolation by design, with separate databases and storage
Regional deployments to align with data residency requirements
High availability and managed operations, so teams could focus on product innovation instead of infrastructure maintenance

PostgreSQL and pgvector: Powering context-aware AI

With that foundation in place, Nasdaq was ready to carefully introduce AI. One of the first AI capabilities was intelligent document summarization. Board materials that once took hours to review could now be condensed into concise, contextually accurate summaries.

Under the hood, this required more than just calling an LLM. Nasdaq uses pgvector, natively supported in Azure Database for PostgreSQL, to store and query embeddings generated from board documents. This allows the platform to perform hybrid searches that combine traditional SQL queries with vector similarity to retrieve the most relevant context before sending anything to a language model.

Instead of treating AI as a black box, the team built a pipeline where:

Documents are processed with Azure Document Intelligence to preserve structure and meaning
Content is chunked and embedded
Embeddings are stored in PostgreSQL with pgvector
Vector similarity searches retrieve precise context for each AI task

Because this runs inside PostgreSQL, the same database benefits from Azure’s built-in high availability, security controls, and operational tooling – delivering tangible results, including a 25% reduction in overall board preparation time and internal testing shows 91–97% accuracy for AI-generated summaries and meeting minutes.

From summaries to an AI Board Assistant

With summarization working in production, Nasdaq expanded further. The team is now building an AI-powered Board Assistant that will help directors prepare for upcoming meetings by surfacing trends, risks, and insights from prior discussions.

This introduces a new level of scale. Years of board data across thousands of customers translate into millions of embeddings. PostgreSQL continues to anchor this architecture, storing vectors for semantic retrieval while MySQL supports complementary non-vector workloads. Across Nasdaq Boardvantage, users are advised to always review AI outputs, and no customer data is shared or used to train external models. “We designed AI for governance, not the other way around,” Shafqat said.

More importantly, customers trust the system because security, isolation, and data control were engineered in from day one.

Looking ahead

Nasdaq’s work shows how Azure Database for PostgreSQL can support AI workloads that demand both intelligence and integrity. With PostgreSQL at the core, Nasdaq has built a governance platform that scales globally, respects regulatory boundaries, and introduces AI in a way that feels dependable and not experimental.

What started as a modernization of Nasdaq Boardvantage is now influencing how Nasdaq approaches AI across the enterprise.

To dive deeper into the architecture and hear directly from the engineers behind it, watch the Ignite session and check out these resources:

Watch the Ignite breakout session for a technical walkthrough of how Nasdaq Boardvantage is built, including PostgreSQL on Azure, pgvector, and Microsoft Foundry in production.
Read the case study to see how Nasdaq introduced AI into board governance and what changed for directors, administrators, and decision-making.
Watch the Ignite broadcast for a candid discussion on Azure Database for PostgreSQL, Azure HorizonDB, and what it takes to scale AI-driven governance.

Microsoft at PGConf India 2026

sumedhpathak — Tue, 24 Feb 2026 17:13:49 GMT

I’m genuinely excited about PGConf India 2026. Over the past few editions, the conference has continued to grow year over year—both in size and in impact—and it has firmly established itself as one of the key events on the global PostgreSQL calendar. That momentum was very evident again in the depth, breadth, and overall quality of the program for PGConf India 2026. Microsoft is proud to be a diamond sponsor for the conference again this year.

At Microsoft, we continue our contributions to the upstream PostgreSQL open-source project—as well as to serve our customers with our Postgres managed service offerings, both Azure Database for PostgreSQL and our newest Postgres offering, Azure HorizonDB . On the open-source front, Microsoft had 540 commits in PG18, including major features like Asynchronous IO. We’re also excited to grow our Postgres open-source contributors team, and so happy to welcome Noah Misch to our team. Noah is a Postgres committer who has deep expertise in PostgreSQL security and is focused on correctness and reliability in PostgreSQL’s core.

Microsoft at PGConf India 2026: Highlights from Our Speakers

PGConf India has several tracks, all of which have some great talks I am looking forward to. First, the plug. 😊 Microsoft has some amazing talks this year, and we have 8 different talks spread across all the tracks.

Postgres on Azure : Scaling with Azure HorizonDB, AI, and Developer Workflows, by Aditya Duvuri & Divya Bhargov
Resizing shared buffer pool in a running PostgreSQL server: important, yet impossible, by Ashutosh Bapat
Ten Postgres Hacker Journeys—and what they teach us, by Claire Giordano
How Postgres can leverage disk bandwidth for better TPS, by Nikhil Chawla
AWSM FSM! Free Space Maps Decoded by Nikhil Sontakke
Journey of developing a Performance Optimization Feature in PostgreSQL, by Rahila Syed
Build Agentic AI with Semantic Kernel and Graph RAG on PostgreSQL, by Shriram Muthukrishnan & Palak Chaturvedi
All things Postgres @ Microsoft (2026 edition) by Sumedh Pathak

Claire is an amazing speaker and has done a lot of work over the last several years documenting and understanding PostgreSQL committers and hackers. Her talk will definitely have some key insights and nuggets of information.

Rahila’s talk will go in depth on performance optimization features and how best to test and benchmark them, and all the tools and tricks she has used as part of the feature development. This should be a must-see talk for anyone doing performance work.

Diving Deep: Case Studies & Technical Tracks

One of the tracks I’m really excited about is the Case Study track. I see these as similar to ‘Experience’ papers in academia. An experience paper documents what actually happened when applying a technique or system in the real world, what worked, what didn’t, and why. One of the talks I’m looking forward to is ‘Operating Postgres Logical Replication at Massive Scale’ by Sai Srirampur from Clickhouse. Logical Replication is an extremely useful tool, and I’m curious to learn more about pitfalls and lessons learnt when running this at large scale. Another interesting one I’m curious to hear is ‘Understanding the importance of the commit log through a database corruption’ by Amit Kumar Singh from EDB.

The Database Engine Developers track allows us to go deep into the PostgreSQL code base and get a better understanding of how PostgreSQL is built. Even if you are not a database developer, this track is useful to understand how and why PostgreSQL does things, helping you be a better user of the database.

With the rise of larger machines and memory available in the Cloud, different and newer memory architectures/tiers and serverless product offerings, there is a lot of deep dive in PostgreSQL’s memory architecture. There are some great talks focused on this area, which should be must-see for anyone interested in this topic:

Resizing shared buffer pool in a running PostgreSQL server: important, yet impossible by Ashutosh Bapat from Microsoft
From Disk to Data: Exploring PostgreSQL's Buffer Management by Lalit Choudhary from PurnaBIT
Beyond shared_buffers: On-Demand Memory in Modern PostgreSQL by Vaibhav Popat from Google

Finally, the Database Administration and Application Developer tracks have some really great content as well. They cover a wide range of topics, from PII data, HA/DR, Query Tuning to connection pooling and understanding conflict detection and resolution.

PostgreSQL in India: A Community Effort Worth Celebrating

Conferences like these are a rich source of information, dramatically increasing my personal understanding of the product and the ecosystem. Separately, they are also a great way to meet other practitioners in the space and connect with people in the industry. For people in Bangalore, another great option is the PostgreSQL Bangalore Meetup, and I’m super happy that Microsoft was able to join the ranks of other companies to host the eighth iteration of this meetup.

Finally, I would be remiss in not mentioning the hard work done by the PGConf India organizing team including Pavan Deolasse, Ashish Mehra, Nikhil Sontakke, Hari Kiran, and Rushabh Lathia who are making all of this happen. Also, a big shout out to the PGConf India Program Committee (Amul Sul, Dilip Kumar, Marc Linster, Thomas Munro, Vigneshwaran C) for putting together an amazing set of talks.

I look forward to meeting all of you in Bangalore! Be sure to drop by the Microsoft booth to say hello (and to snag a free pair of our famous socks). I’d love to learn more about how you’re using Postgres.

Mastering gin_pending_list_limit: How This parameter shapes GIN index insert performance

Gayathri_Paderla — Mon, 23 Feb 2026 17:03:18 GMT

GIN (Generalized Inverted Index) indexes are a cornerstone of PostgreSQL when working with JSONB, arrays, and full-text search. They provide excellent read performance, but their write behavior—especially under sustained insert workloads—can vary dramatically depending on how data is primarily written and how GIN index maintenance is configured.

One often-overlooked configuration in this space is the interaction between the fastupdate and gin_pending_list_limit server parameters. While these settings do not directly impact query performance, they play a critical role in insert throughput, CPU usage, and worst-case write latency for GIN-indexed tables.

This post explains how the GIN pending list works, how gin_pending_list_limit governs its behavior, and why choosing the right configuration can make or break write performance on large datasets and write-intensive database operations.

Let us now go through the 2 server parameters discussed above.

Fastupdate

The fastupdate storage parameter controls how PostgreSQL handles write to a GIN index:

When fastupdate = ON (default)

GIN uses an in-memory buffer containing the pending list of new index entries before flushing them into the main index structure.

gin_pending_list_limit
The gin_pending_list_limit parameter in PostgreSQL controls the maximum size of the pending list for a GIN (Generalized Inverted Index) before it's flushed to the main index structure. This setting can significantly affect insert performance and index maintenance behavior. By default, this parameter is set to 4 MB and by default.

New GIN entries are first written to the pending list.
This makes inserts much faster because PostgreSQL batches write's and avoids expensive GIN maintenance on each individual insert.
The pending list is later merged into the main GIN index by:
- Autovacuum or a VACUUM operation or
- When the pending list exceeds gin_pending_list_limit.
During heavy insert workloads, these merges can cause latency spikes

In short, the pending list makes writes cheap—until the cleanup happens.

How fastupdate Interacts With gin_pending_list_limit

Together, these parameters decide how much index maintenance work each insert must perform.

When fastupdate = ON

Pending list absorbs writes efficiently → best for single inserts
Flush cycles controlled by gin_pending_list_limit

When fastupdate = OFF

Inserts bypass the pending list and write directly to the index
This increases CPU costs dramatically during both single and batch inserts

Test run /results and analysis

Below are some tests run result’s based on different options highlighted in the above sections.

A test table (of size 3 TB) uses a GIN index. We tested insert performance under different configurations of fastupdate (ON/OFF) for single and batch inserts.
SKU: 16 vcore and 8 TB storage

Setup:

Create a table with a jsonb column and create a gin index on jsonb column.
Insert around 2 TB of data
setup an insert to perform single inserts for 15 minutes
2nd run around setup an insert to perform batch inserts for 15 minutes

Key Factor:

Autovacuum was turned off, so pending list cleanup did not occur automatically.
Runs were captured over a 15-minute window

Single Inserts:

Batch Inserts:

Analysis:

Fastupdate ON is optimal for single-row, write-heavy workloads (much lower per-insert latency), but it hurts sustained throughput and worst-case latency due to pending-list cleanups with significantly lower CPU usage.
Fastupdate OFF consistently wins for batch/bulk inserts, delivering ~1.5–1.7× higher throughput, significantly lower max execution time and more predictable behavior despite higher CPU consumption, making it the better choice for controlled batch loads.

Conclusion

GIN indexes are often treated as “build once and forget,” but for write-heavy systems, that mindset leaves significant performance on the table. By understanding how the pending list works—and tuning fastupdate and gin_pending_list_limit intentionally—you can dramatically improve both throughput and stability in large-scale PostgreSQL workloads.

If you routinely work with heavy JSONB or array ingestion, these settings deserve a permanent spot in your performance toolbox.

Distribute PostgreSQL 18 with Citus 14

mehmetyilmaz — Tue, 17 Feb 2026 16:19:55 GMT

The Citus 14.0 release is out and includes PostgreSQL 18 support! We know you've been waiting, and we've been hard at work adding features we believe will take your experience to the next level, focusing on bringing the Postgres 18 exciting improvements to you at distributed scale.

The Citus database is an open-source extension of Postgres that brings the power of Postgres to any scale, from a single node to a distributed database cluster. Since Citus is an extension, using Citus means you're also using Postgres, giving you direct access to the Postgres features. And the latest of such features came with Postgres 18 release!

PostgreSQL 18 is a substantial release: asynchronous I/O (AIO), skip-scan for multicolumn B-tree indexes, uuidv7(), virtual generated columns by default, OAuth authentication, RETURNING OLD/NEW, and temporal constraints. For those of you who are interested in upgrading to Postgres 18 and scaling these new features of Postgres: you can upgrade to Citus 14.0!

Let's take a closer look at what's new in Citus 14.0.

Postgres 18 support in Citus 14.0

Citus 14.0 introduces support for PostgreSQL 18. This means that just by enabling PG18 in Citus 14.0, all the query performance improvements directly reflect on the Citus distributed queries, and several optimizer improvements benefit queries in Citus out of the box! Among the many new features in PG 18, the following capabilities enabled in Citus 14.0 are especially noteworthy for Citus users.

To learn more about how you can use Citus 14.0 + PostgreSQL 18, as well as currently unsupported features and future work, you can consult the Citus 14.0 Updates page, which gives you detailed release notes.

PostgreSQL 18 highlights that benefit Citus clusters

Because Citus is implemented as a Postgres extension, the following PG18 improvements benefit your distributed cluster automatically, no Citus-specific changes needed.

Faster scans and maintenance via AIO

Postgres 18 adds an asynchronous I/O subsystem that can improve sequential scans, bitmap heap scans, and vacuuming—workloads that show up constantly in shard-heavy distributed clusters. This means your Citus cluster can benefit from faster table scans and more efficient maintenance operations without any code changes.

You can control the I/O method via the new io_method GUC:

-- Check the current I/O method SHOW io_method;

Better index usage with skip-scan

Postgres 18 expands when multicolumn B-tree indexes can be used via skip scan, helping common multi-tenant schemas where predicates don't always constrain the leading index column. This is particularly valuable for Citus users with multi-tenant applications where queries often filter by non-leading columns.

-- Multi-tenant index: (tenant_id, created_at) -- PG18 skip-scan lets this query use the index even without tenant_id SELECT * FROM events WHERE created_at > now() - interval '1 day' ORDER BY created_at DESC LIMIT 100;

uuidv7() for time-ordered UUIDs

Time-ordered UUIDs can reduce index churn and improve locality; Postgres 18 adds uuidv7(). This is especially useful for distributed tables where you want predictable ordering and better index performance across shards.

-- Use uuidv7() as a time-ordered primary key CREATE TABLE events ( id uuid DEFAULT uuidv7() PRIMARY KEY, tenant_id bigint, payload jsonb ); SELECT create_distributed_table('events', 'tenant_id');

OAuth authentication support

Postgres 18 adds OAuth authentication, making it easier to plug database auth into modern SSO flows often a practical requirement in multi-node deployments. This simplifies authentication management across your Citus coordinator and worker nodes.

What Citus 14 adds for PostgreSQL 18 compatibility

While the highlights above work out of the box, PG18 also introduces new SQL syntax and behavior changes that require Citus-specific work parsing/deparsing, DDL propagation across coordinator + workers, and distributed execution correctness. Here's what we built to make these work end-to-end.

JSON_TABLE() COLUMNS

PG18 expands SQL/JSON JSON_TABLE() with a richer COLUMNS clause, making it easy to extract multiple fields from JSON documents in a single, typed table expression. Citus 14 ensures the syntax can be parsed/deparsed and executed consistently in distributed queries.

CREATE TABLE pg18_json_test (id serial PRIMARY KEY, data JSON); SELECT jt.name, jt.age FROM pg18_json_test, JSON_TABLE( data, '$.user' COLUMNS ( age INT PATH '$.age', name TEXT PATH '$.name' ) ) AS jt WHERE jt.age BETWEEN 25 AND 35 ORDER BY jt.age, jt.name;

Temporal constraints

Postgres 18 adds temporal constraint syntax that Citus must propagate and preserve correctly:

WITHOUT OVERLAPS for PRIMARY KEY / UNIQUE
PERIOD for FOREIGN KEY

CREATE TABLE temporal_rng ( id int4range, valid_at daterange, CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS) ); SELECT create_distributed_table('temporal_rng', 'id');

CREATE FOREIGN TABLE ... LIKE

Postgres 18 supports CREATE FOREIGN TABLE ... LIKE, letting you define a foreign table by copying the column layout (and optionally defaults/constraints/indexes) from an existing table. Citus 14 includes coverage so FDW workflows remain compatible in distributed environments.

-- Copy column layout from an existing table CREATE FOREIGN TABLE my_ft (LIKE my_local_table EXCLUDING ALL) SERVER foreign_server OPTIONS (schema_name 'public', table_name 'my_local_table');

Generated columns (Virtual by Default)

PostgreSQL 18 changes generated column behavior significantly:

Virtual by default: Generated columns are now computed on read rather than stored, reducing write amplification
Logical replication support: New publish_generated_columns publication option for replicating generated values

CREATE TABLE events ( id bigint, payload jsonb, payload_hash text GENERATED ALWAYS AS (md5(payload::text)) VIRTUAL ); SELECT create_distributed_table('events', 'id');

VACUUM/ANALYZE ONLY semantics

Postgres 18 introduces ONLY for VACUUM and ANALYZE so you can explicitly target only the parent of a partitioned/inheritance tree without automatically processing children. Citus 14 adapts distributed utility-command behavior so ONLY works as intended.

-- Parent-only: do not recurse into partitions/children VACUUM (ANALYZE) ONLY metrics; ANALYZE ONLY metrics;

Constraints: NOT ENFORCED + partitioned-table additions

Postgres 18 expands constraint syntax in several ways that Citus must parse/deparse and propagate across coordinator + workers:

CHECK constraints can be marked NOT ENFORCED
FOREIGN KEY constraints can be marked NOT ENFORCED
NOT VALID foreign keys on partitioned tables
DROP CONSTRAINT ONLY on partitioned tables

ALTER TABLE orders ADD CONSTRAINT orders_amount_positive CHECK (amount > 0) NOT ENFORCED; ALTER TABLE orders ADD CONSTRAINT orders_customer_fk FOREIGN KEY (customer_id) REFERENCES customers(id) NOT ENFORCED;

DML: RETURNING OLD/NEW

Postgres 18 lets RETURNING reference both the previous (old) and new (new) row values in INSERT/UPDATE/DELETE/MERGE. Citus 14 preserves these semantics in distributed execution.

UPDATE t SET v = v + 1 WHERE id = 42 RETURNING old.v AS old_v, new.v AS new_v;

COPY expansions

PG18 adds two useful COPY improvements that Citus 14 supports in distributed queries:

COPY ... REJECT_LIMIT: set a threshold for how many rows can be rejected before the COPY fails, useful for resilient bulk loading into sharded tables
COPY table TO from materialized views: export data directly from materialized views

-- Tolerate up to 10 bad rows during bulk load COPY my_distributed_table FROM '/data/import.csv' WITH (FORMAT csv, REJECT_LIMIT 10);

MIN()/MAX() on arrays and composite types

PG18 extends MIN() and MAX() aggregates to work on arrays and composite types. Citus 14 ensures these aggregates work correctly in distributed queries.

CREATE TABLE sensor_data ( tenant_id bigint, readings int[] ); SELECT create_distributed_table('sensor_data', 'tenant_id'); -- Now works with array columns SELECT MIN(readings), MAX(readings) FROM sensor_data;

Nondeterministic collations

PG18 extends LIKE and text-position search functions to work with nondeterministic collations. Citus 14 verifies these work correctly across distributed queries.

sslkeylogfile connection parameter

PG18 adds the sslkeylogfile libpq connection parameter for dumping SSL key material, useful for debugging encrypted connections. Citus 14 allows configuring this via citus.node_conn_info so it works across inter-node connections.

Planner fix: enable_self_join_elimination

PG18 introduces the enable_self_join_elimination planner optimization. Citus 14 ensures this works correctly for joins between distributed and local tables, avoiding wrong results that could occur in early PG18 integration.

Utility/Ops plumbing and observability

Citus 14 adapts to PG18 interface/output changes that affect tooling and extension plumbing:

New GUC file_copy_method for CREATE DATABASE ... STRATEGY=FILE_COPY
EXPLAIN (WAL) adds a "WAL buffers full" field; Citus propagates it through distributed EXPLAIN output
New extension macro PG_MODULE_MAGIC_EXT so extensions can report name/version metadata
New libpq parameter sslkeylogfile support via citus.node_conn_info

Diving deeper into Citus 14.0 and distributed Postgres

To learn more about Citus 14.0, you can:

Check out the 14.0 Updates page to get the detailed release notes.
As of this release, Citus documentation is now hosted on Microsoft Learn.

With Citus 14, elastic clusters will soon support PostgreSQL 18, now available in Azure Database for PostgreSQL.

You can stay connected on the Citus Slack and visit the Citus open source GitHub repo to see recent developments as well. If there's something you'd like to see next in Citus, feel free to also open a feature request issue :)