Postgres
115 TopicsAzure PostgreSQL Lesson Learned #3: Fix FATAL: sorry, too many clients already
We encountered a support case involving Azure Database for PostgreSQL Flexible Server where the application started failing with connection errors. This blog explains the root cause, resolution steps, and best practices to prevent similar issues.169Views3likes0CommentsAzure PostgreSQL Lesson Learned#1:Fix Cannot Execute in a Read-Only Transaction After HA Failover
We encountered a support case involving Azure Database for PostgreSQL Flexible Server where the database returned a read-only error after a High Availability (HA) failover. This blog explains the root cause, resolution steps, and best practices to prevent similar issues. The issue occurred when the application attempted write operations immediately after an HA failover. The failover caused the primary role to switch, but the client continued connecting to the old primary (now standby), which is in read-only mode.250Views2likes0CommentsAzure PostgreSQL Lesson Learned #2: Fixing Read Only Mode Storage Threshold Explained
Co-authored with angesalsaa The issue occurred when the server’s storage usage reached approximately 95% of the allocated capacity. Automatic storage scaling was disabled. Symptoms included: Server switching to read-only mode Application errors indicating write failures No prior alerts or warnings received by the customer Example error: ERROR: cannot execute %s in a read-only transaction Root Cause The root cause was the server hitting the configured storage usage threshold (95%), which triggered an automatic transition to read-only mode to prevent data corruption or loss. Storage options - Azure Database for PostgreSQL | Microsoft Learn If your Storage Usage is below 95% but you're still seeing the same error, please refer to this article for more information > Azure PostgreSQL Lesson Learned#1:Fix Cannot Execute in a Read-Only Transaction After HA Failover Contributing factors: Automatic storage scaling was disabled Lack of proactive monitoring on storage usage High data ingestion rate during peak hours Specific conditions: Customer had a custom workload with large batch inserts No alerts configured for storage usage thresholds Mitigation To resolve the issue: Increased the allocated storage manually via Azure Portal No restart is needed after you scale up the storage because it is an online operation but make sure If you grow the disk from any size between 32 GiB and 4 TiB, to any other size in the same range, the operation is performed without causing any server downtime. It's also the case if you grow the disk from any size between 8 TiB and 32 TiB. In all those cases, the operation is performed while the server is online. However, if you increase the size of disk from any value lower or equal to 4096 GiB, to any size higher than 4096 GiB, a server restart is required. In that case, you're required to confirm that you understand the consequences of performing the operation. Scale storage size - Azure Database for PostgreSQL | Microsoft Learn Verified server returned to read-write mode Steps: Navigate to Azure Portal > PostgreSQL Flexible Server > Compute & Storage Increase storage size (e.g., from 100 GB to 150 GB) Post-resolution: Server resumed normal operations Write operations were successful Prevention & Best Practices Enable automatic storage scaling to prevent hitting usage limits > Configure Storage Autogrow - Azure Database for PostgreSQL | Microsoft Learn Set up alerts for storage usage thresholds (e.g., 80%, 90%) Monitor storage metrics regularly using Azure Monitor or custom dashboards Why This Matters Failing to monitor storage and configure scaling can lead to: Application downtime Read-only errors impacting business-critical transactions By following these practices, customers can ensure seamless operations and avoid unexpected read-only transitions. Key Takeaways Symptom: Server switched to read-only mode, causing write failures (ERROR: cannot execute INSERT in a read-only transaction). Root Cause: Storage usage hit 95% threshold, triggering read-only mode to prevent corruption. Contributing Factors: Automatic storage scaling disabled. No alerts for storage thresholds. High ingestion during peak hours with large batch inserts. Mitigation: Increased storage manually via Azure Portal (online operation unless crossing 4 TiB → restart required). Server returned to read-write mode. Prevention & Best Practices: Enable automatic storage scaling. Configure alerts for storage usage (e.g., 80%, 90%). Monitor storage metrics regularly using Azure Monitor or dashboards.133Views0likes0CommentsPrevent Accidental Deletion of an Instance in Azure Postgres
Did you know that accidental deletion of database servers is a leading source of support tickets? Read this blog post to learn how you can safeguard your Azure Database for PostgreSQL Flexible Server instances using ARM’s CanNotDelete lock — an easy best-practice that helps prevent accidental deletions while keeping regular operations seamless. 🌐 Prevent Accidental Deletion of an Instance in Azure PostgresPostgreSQL 18 Vacuuming Improvements Explained
Introduction This PostgreSQL 18 release brings one of the most significant collections of VACUUM and ANALYZE improvements in years. These updates include faster heap scans through asynchronous I/O (AIO), flexible on-the-fly autovacuum scaling, earlier triggers for large tables, safer tail shrinking, and more robust observability. At a Glance: What’s New VACUUM runs faster by overlapping reads with processing via asynchronous I/O (AIO). Autovacuum throughput is adjustable without restarts: set a startup ceiling at autovacuum_worker_slots and tune autovacuum_max_workers on the fly. Autovacuum can trigger earlier on big tables using autovacuum_vacuum_max_threshold (hard cap on dead tuples). Normal VACUUM can “eager-freeze” all visible pages to amortize anti-wraparound work; control with vacuum_max_eager_freeze_failure_rate . VACUUM and ANALYZE extend recursively into child tables by default; use the ONLY table modifier to target parent-level only. Tail shrinking is explicit and predictable using the server setting vacuum_truncate or per-command TRUNCATE. New visibility: per-table cumulative times, explicit cost-delay timing ( track_cost_delay_timing ). Deep Dive: Changes That Matter 1) Vacuum is faster with asynchronous I/O (AIO) A new asynchronous I/O subsystem lets VACUUM queue up reads and continue working asynchronously while data is fetched, reducing heap-scan wait time on cache misses and smoothing throughput during large table passes. How to enable & verify: Controlled by the io_method server parameter SHOW io_method; (typically 'worker' by default) SHOW io_workers; (default 3) Increase workers gradually if VACUUM remains I/O-bound: io_workers = 6; (test on non-prod first) Note: Raise workers gradually (e.g., 3 → 6 → 12) and monitor server storage metrics and VACUUM durations. 2) Adjust autovacuum workers on the fly Autovacuum scaling is now much more flexible. Previously, changing autovacuum_max_workers required a server restart, which could be painful for time-based tuning or urgent bloat cleanup activities. The new approach now separates these two concepts: autovacuum_worker_slots : The total number of slots reserved at startup (server restart required). autovacuum_max_workers : The number of workers allowed to run concurrently (reloadable on the fly). If you set autovacuum_max_workers higher than the reserved slots, PostgreSQL will cap the number and log a warning. Note: When you scale up workers, consider enlarging the shared cost budget so that per-worker share does not collapse. Either raise autovacuum_vacuum_cost_limit or reduce autovacuum_vacuum_cost_delay . 3) Force earlier autovacuum on very large tables Extremely large tables can accumulate a significant number of dead tuples while staying below the classic trigger threshold. threshold = reltuples * autovacuum_vacuum_scale_factor + autovacuum_vacuum_threshold To fix this, a new parameter autovacuum_vacuum_max_threshold adds a hard cap on dead tuples (default: 100,000,000). Once a table crosses above this number of dead rows, autovacuum runs immediately even if the scale factor says “wait.” This prevents bloat buildup between runs on very large relations. Set it to -1 to disable the feature. Examples: Server-wide cap (example: 50M dead tuples): autovacuum_vacuum_max_threshold = 50000000 Per-table override for a hot relation: ALTER TABLE orders SET (autovacuum_vacuum_max_threshold = 20000000); Disable on a table if needed: ALTER TABLE staging SET (autovacuum_vacuum_max_threshold = -1); Note: Prefer per-table caps (e.g., 10–50M) for a few giant tables instead of lowering the server-wide cap too far. 4) Eager freezing during normal vacuum On very large tables, “normal” VACUUM often skips all visible pages and leaves a backlog of pages that are all visible but not all frozen. When an anti-wraparound (aggressive) VACUUM finally triggers, it must scan all those pages, causing long, I/O heavy maintenance runs that compete for resources with your foreground workload. With new eager freezing, normal VACUUM can proactively freeze tuples on all visible pages, spreading the freeze workload across routine runs and shortening future anti-wraparound passes on very large tables. Examples: Server-wide: be a bit more eager: vacuum_max_eager_freeze_failure_rate = 0.05 Per-table: even more eager on a hot relation: ALTER TABLE events SET (vacuum_max_eager_freeze_failure_rate = 0.10); Disable for churny tables: ALTER TABLE temp_staging SET (vacuum_max_eager_freeze_failure_rate = 0.0); Note: Increase if anti-wraparound VACUUMs on big tables are too long or spike I/O. Decrease if routine VACUUMs become noticeably longer without observable benefit. 5) Vacuum/analyze: recurse by default; use ONLY modifier when needed PostgreSQL now simplifies maintenance for partitioned tables. For inheritance-based partitioning, VACUUM and ANALYZE automatically include child tables by default, removing the need for manual recursion. Use the ONLY keyword when you want to process just the parent. Declarative partitioning already behaved this way, but the new ability to run ANALYZE ONLY on a partitioned parent is a key improvement. It lets you refresh the parent’s statistics without scanning all partitions, ideal when partition stats are already current but accurate parent-level estimates are needed for better query planning. Examples: Inheritance-based partitioning: Recurse by default (new behavior) VACUUM VERBOSE public.legacy_parent; --vacuum parent + child tables ANALYZE public.legacy_parent; -- analyze parent + child tables Restrict to the parent only VACUUM VERBOSE ONLY public.legacy_parent; --old behaviour ANALYZE ONLY public.legacy_parent; --old behaviour Declarative partitioning: Refresh just the parent’s stats (skip partitions) ANALYZE ONLY public.sales_partitioned; VACUUM ONLY on a partitioned parent warns and does no work VACUUM VERBOSE ONLY public.sales_partitioned; Note: For large-partitioned tables, schedule parent-only ANALYZE during peak hours and full recursive runs off-peak. 6) Tail shrinking made explicit ( vacuum_truncate ) Autovacuum’s tail shrink step (removing empty pages from the end of a table) can take an ACCESS EXCLUSIVE lock that is replayed on hot standbys, briefly stalling read operations. To make this more predictable, PostgreSQL added a top level switch ( vacuum_truncate ) so you can disable automatic truncation and perform this operation only when you choose. Examples: Prefer explicit control: disable automatic tail-shrinks: vacuum_truncate = off Later, shrink the tail of a large log table in a window: VACUUM (VERBOSE, TRUNCATE ON) public.pgbench_history; Or keep truncation off for specific hot tables: ALTER TABLE public.orders SET (vacuum_truncate = off); Note: If replicas serve heavy read operations, keep automatic truncation off and run explicit truncation within your maintenance windows. 7) Easier to see vacuum/analyze overhead Per-table time counters: cumulative total_vacuum_time / total_autovacuum_time in the pg_stat_all_tables views expose where maintenance time is spent, which is great for finding your top offenders. Throttling timing: flip track_cost_delay_timing = on to record actual sleep time from cost based delays in pg_stat_progress_vacuum and in VERBOSE output, perfect to identify “we waited” vs. “we hit disk.” (This setting is Off by default to avoid adding timing overhead.) Example: Top tables by total maintenance time: SELECT relname, total_vacuum_time, total_autovacuum_time FROM pg_stat_user_tables ORDER BY (COALESCE(total_vacuum_time, 0) + COALESCE(total_autovacuum_time, 0)) DESC NULLS LAST LIMIT 10; Conclusion These improvements make your maintenance operations faster, more predictable, and easier to tune. Get started by enabling AIO, right sizing autovacuum concurrency and cost budgets, setting hard caps for your biggest tables, and adopting explicit tail shrinking if you have hot standbys. With better observability, you can quickly confirm whether VACUUM is throttled by cost-based delays or genuinely storage-bound and adjust accordingly. Before applying changes in production, ensure you have validated these settings in a staging environment, monitor pg_stat_progress_vacuum , pg_stat_io , and table-level counters, and iterate gradually. The net effect is smoother autovacuum behavior, shorter aggressive VACUUMs, and fewer unexpected locks.September 2025 Recap: Azure Database for PostgreSQL
Hello Azure Community, We are back with another round of updates for Azure Database for PostgreSQL! September is packed with powerful enhancements, from the public preview of PostgreSQL 18 to the general availability of Azure Confidential Computing, plus several new capabilities designed to boost performance, security, and developer experience. Stay tuned as we dive deeper into each of these feature updates. Before we dive into the feature highlights, let’s take a look at PGConf NYC 2025 highlights. PGConf NYC 2025 Highlights Our Postgres team was glad to be part of PGConf NYC 2025! As a Platinum sponsor, Microsoft joined the global PostgreSQL community for three days of sessions covering performance, extensibility, cloud, and AI, highlighted by Claire Giordano’s keynote, “What Microsoft is Building for Postgres—2025 in Review,” along with deep dives from core contributors and engineers. If you missed it, you can catch up here: Keynote slides: What Microsoft is Building for Postgres—2025 in Review by Claire Giordano at PGConf NYC 2025 Day 3 wrap-up: Key takeaways, highlights, and insights from the Azure Database for PostgreSQL team. Feature Highlights Near Zero Downtime scaling for High Availability (HA) enabled servers - Generally Available Azure Confidential Computing for Azure Database for PostgreSQL - Generally Available PostgreSQL 18 on Azure Database for PostgreSQL - Public Preview PostgreSQL Discovery & Assessment in Azure Migrate - Public Preview LlamaIndex Integration with Azure Postgres Latest Minor Versions GitHub Samples: Entra ID Token Refresh for PostgreSQL VS Code Extension for PostgreSQL enhancements Near Zero Downtime scaling for High Availability (HA) enabled servers – Generally Available Scaling compute for high availability (HA) enabled Azure Database for PostgreSQL servers just got faster. With Near Zero Downtime (NZD) scaling, compute changes such as vCore or tier modifications are now complete with minimal interruption, typically under 30 seconds using HA failover which maintains the connection string. The service provisions a new primary and standby instance with the updated configuration, synchronizes them with the existing setup, and performs a quick failover. This significantly reduces downtime compared to traditional scaling (which could take 2–10 minutes), improving overall availability. Visit our documentation for full details on how Near Zero Downtime scaling works. Azure Confidential Computing for Azure Database for PostgreSQL - Generally Available Azure Confidential Computing (ACC) Confidential Virtual Machines (CVMs) are now generally available for Azure Database for PostgreSQL. This capability brings hardware-based protection for data in use, ensuring your most sensitive information remains secure, even while being processed. With CVMs, your PostgreSQL flexible server instance runs inside a Trusted Execution Environment (TEE), a secure, hardware-backed enclave that encrypts memory and isolates it from the host OS, hypervisor, and even Azure operators. This means your data enjoys end-to-end protection: at rest, in transit, and in use. Key Benefits: End-to-End Security: Data protected at rest, in transit, and in use Enhanced Privacy: Blocks unauthorized access during processing Compliance Ready: Meets strict security standards for regulated workloads Confidence in Cloud: Hardware-backed isolation for critical data Discover how Azure Confidential Computing enhances PostgreSQL check out the blog announcement. PostgreSQL 18 on Azure Database for PostgreSQL – Public Preview PostgreSQL 18 is now available in public preview on Azure Database for PostgreSQL, launched the same day as the PostgreSQL community release. PostgreSQL 18 introduces new performance, scalability, and developer productivity improvements. With this preview, you get early access to the latest community release on a fully managed Azure service. By running PostgreSQL 18 on flexible server, you can test application compatibility, explore new SQL and performance features, and prepare for upgrades well before general availability. This preview release gives you the opportunity to validate your workloads, extensions, and development pipelines in a dedicated preview environment while taking advantage of the security, high availability, and management capabilities in Azure. With PostgreSQL 18 in preview, you are among the first to experience the next generation of PostgreSQL on Azure, ensuring your applications are ready to adopt it when it reaches full general availability. To learn more about preview, read https://aka.ms/pg18 PostgreSQL Discovery & Assessment in Azure Migrate – Public Preview The PostgreSQL Discovery & Assessment feature is now available in public preview on Azure Migrate, making it easier to plan your migration journey to Azure. Migrating PostgreSQL workloads can be challenging without clear visibility into your existing environment. This feature solves that problem by delivering deep insights into on-premises PostgreSQL deployments, making migration planning easier and more informed. With this feature, you can discover PostgreSQL instances across your infrastructure, assess migration readiness and identify potential blockers, receive configuration-based SKU recommendations for Azure Database for PostgreSQL, and estimate costs for running your workloads in Azure all in one unified experience. Key Benefits: Comprehensive Visibility: Understand your on-prem PostgreSQL landscape Risk Reduction: Identify blockers before migration Optimized Planning: Get tailored SKU and cost insights Faster Migration: Streamlined assessment for a smooth transition Learn more in our blog: PostgreSQL Discovery and Assessment in Azure Migrate LlamaIndex Integration with Azure Postgres The support for native LlamaIndex integration is now available for Azure Database for PostgreSQL! This enhancement brings seamless connectivity between Azure Database for PostgreSQL and LlamaIndex, allowing developers to leverage Azure PostgreSQL as a secure and high-performance vector store for their AI agents and applications. Specifically, this package adds support for: Microsoft Entra ID (formerly Azure AD) authentication when connecting to your Azure Database for PostgreSQL instances, and, DiskANN indexing algorithm when indexing your (semantic) vectors. This package makes it easy to connect LlamaIndex to your Azure PostgreSQL instances whether you're building intelligent agents, semantic search, or retrieval-augmented generation (RAG) systems. Explore the full guide here: https://aka.ms/azpg-llamaindex Latest Postgres minor versions: 17.6, 16.9, 15.13, 14.18 and 13.21 PostgreSQL minor versions 17.6, 16.9, 15.13, 14.18 and 13.21 are now supported by Azure Database for PostgreSQL. These minor version upgrades are automatically performed as part of the monthly planned maintenance in Azure Database for PostgreSQL. The upgrade automation ensures that your databases are always running the latest optimized versions without requiring manual intervention. This release fixes 3 security vulnerabilities and more than 55 bugs reported over the last several months. PostgreSQL minor versions are backward-compatible, so updates won’t affect your applications. For details about the release, see PostgreSQL community announcement. GitHub Samples: Entra ID Token Refresh for PostgreSQL We have introduced code samples for Entra ID token refresh, built specifically for Azure Database for PostgreSQL. These samples simplify implementing automatic token acquisition and refresh, helping you maintain secure, uninterrupted connectivity without manual intervention. By using these examples, you can keep sessions secure, prevent connection drops from expired tokens, and streamline integration with Azure Identity libraries for PostgreSQL workloads. What’s Included: Ready-to-use code snippets for token acquisition and refresh for Python and .NET Guidance for integrating with Azure Identity libraries Explore the samples repository on https://aka.ms/pg-access-token-refresh and start implementing it today. VS Code Extension for PostgreSQL enhancements A new version for VS Code Extension for PostgreSQL is out! This update introduces a Server Dashboard that provides high-level metadata and real-time performance metrics, along with historical insights for Azure Database for PostgreSQL Flexible Server. You can even use GitHub Copilot Chat to ask performance questions in natural language and receive diagnostic SQL queries in response. Additional enhancements include: A new keybinding for “Run Current Statement” in the Query Editor Support for dragging Object Explorer entities into the editor with properly quoted identifiers Ability to connect to databases via socket file paths Key fixes: Preserves the state of the Explain Analyze toolbar toggle Removes inadvertent logging of sensitive information from extension logs Stabilizes memory usage during long-running dashboard sessions Don’t forget to update to the latest version in the marketplace to take advantage of these enhancements and visit our GitHub repository to learn more about this month’s release. We’d love your feedback! Help us improve the Server Dashboard and other features by sharing your thoughts on GitHub . Azure Postgres Learning Bytes 🎓 Setting up logical replication between two servers This section will walk through setting up logical replication between two Azure Database for PostgreSQL flexible server instances. Logical replication replicates data changes from a source (publisher) server to a target (subscriber) server. Prerequisites PostgreSQL versions supported by logical replication (publisher/subscriber compatible). Network connectivity: subscriber must be able to connect to the publisher (VNet/NSG/firewall rules). A replication role on the publisher (or a role with REPLICATION privilege). Step 1: Configure Server Parameters on both publisher and subscriber: On Publisher: wal_level=logical max_worker_processes=16 max_replication_slots=10 max_wal_senders=10 track_commit_timestamp=on On Subscriber: wal_level=logical max_worker_processes=16 max_replication_slots=10 max_wal_senders=10 track_commit_timestamp=on max_worker_processes = 16 max_sync_workers_per_subscription = 6 autovacuum = OFF (during initial copy) max_wal_size = 64GB checkpoint_timeout = 3600 Step 2: Create Publication (Publisher) and alter role with replication privilege ALTER ROLE <replication_user> WITH REPLICATION; CREATE PUBLICATION pub FOR ALL TABLES; Step 3: Create Subscription (Subscriber) CREATE SUBSCRIPTION <subscription-name> CONNECTION 'host=<publisher_host> dbname=<db> user=<user> password=<pwd>' PUBLICATION <publication-name>;</publication-name></pwd></user></db></publisher_host></subscription-name> Step 4: Monitor Publisher: This shows active processes on the publisher, including replication workers. SELECT application_name, wait_event_type, wait_event, query, backend_type FROM pg_stat_activity WHERE state = 'active'; Subscriber: The ‘pg_stat_progress_copy’ table tracks the progress of the initial data copy for each table. SELECT * FROM pg_stat_progress_copy; To explore more details on how to get started with logical replication, visit our blog on Tuning logical replication for Azure Database for PostgreSQL. Conclusion That’s all for the September 2025 feature highlights! We remain committed to making Azure Database for PostgreSQL more powerful and secure with every release. Stay up to date on the latest enhancements by visiting our Azure Database for PostgreSQL blog updates link. Your feedback matters and helps us shape the future of PostgreSQL on Azure. If you have suggestions, ideas, or questions, we’d love to hear from you: https://aka.ms/pgfeedback. We look forward to sharing even more exciting capabilities in the coming months. Stay tuned!