azure
34 TopicsSupporting ChatGPT on PostgreSQL in Azure
Affan Dar, Vice President of Engineering, PostgreSQL at Microsoft Adam Prout, Partner Architect, PostgreSQL at Microsoft Panagiotis Antonopoulos, Distinguished Engineer, PostgreSQL at Microsoft The OpenAI engineering team recently published a blog post describing how they scaled their databases by 10x over the past year, to support 800 million monthly users. To do so, OpenAI relied on Azure Database for PostgreSQL to support important services like ChatGPT and the Developer API. Collaborating with a customer experiencing rapid user growth has been a remarkable journey. One key observation is that PostgreSQL works out of box for very large-scale points. As many in the public domain have noted, ChatGPT grew to 800M+ users before OpenAI started moving new and shardable workloads to Azure Cosmos DB. Nevertheless, supporting the growth of one of the largest Postgres deployments was a great learning experience for both of our teams. Our OpenAI friends did an incredible job at reacting fast and adjusting their systems to handle the growth. Similarly, the Postgres team at Azure worked to further tune the service to support the increasing OpenAI workload. The changes we made were not limited to OpenAI, hence all our Azure Database for PostgreSQL customers with demanding workloads have benefited. A few of the enhancements and the work that led to these are listed below. Changing the network congestion protocol to reduce replication lag Azure Database for PostgreSQL used the default CUBIC congestion control algorithm for replication traffic to replicas both within and outside the region. Leading up to one of the OpenAI launch events, we observed that several geo-distributed read replicas occasionally experienced replication lag. Replication from the primary server to the read replicas would typically operate without issues; however, at times, the replicas would unexpectedly begin falling behind the primary for reasons that were not immediately clear. This lag would not recover on its own and would grow to a point when, eventually, automation would restart the read replica. Once restarted, the read replica would once again catch up, only to repeat this cycle again within a day or less. After an extensive debugging effort, we traced the root cause to how the TCP congestion control algorithm handled a higher rate of packet drops. These drops were largely a result of high point-to-point traffic between the primary server and its replicas, compounded by the existing TCP window settings. Packet drops across regions are not unexpected; however, the default congestion control algorithm (CUBIC) treats packet loss as a sign of congestion and does an aggressive backoff. In comparison, the Bottleneck Bandwidth and Round-trip propagation time (BBR) congestion control algorithm is less sensitive to packet drops. Switching to BBR, adding SKU specific TCP window settings, and switching to fair queuing network discipline (which can control pacing of outgoing packets at hardware level) resolved this issue. We’ll also note that one of our seasoned PostgreSQL committers provided invaluable insights during this process, helping us pinpoint the issue more effectively. Scaling out with Read replicas PostgreSQL primaries, if configured properly, work amazingly well in supporting a large number of read replicas. In fact, as noted in the OpenAI engineering blog, a single primary has been able to power around 50+ replicas across multiple regions. However, going beyond this increases the chance of impacting the primary. For this reason, we added the cascading replica support to scale out reads even further. But this brings in a number of additional failure modes that need to be handled. The system must carefully orchestrate repairs around lagging and failing intermediary nodes, safely repointing replicas to new intermediary nodes while performing catch up or rewind in a mission critical setup. Furthermore, disaster recovery (DR) scenarios can require a fast rebuild of a replica and as data movement across regions is a costly and time-consuming operation, we developed the ability to create a geo replica from a snapshot of another replica in the same region. This feature avoids the traditional full data copy process, which may take hours or even days depending on the size of the data, by leveraging data for that cluster that already exists in that region. This feature will soon be available for all our customers as well. Scaling out Writes These improvements solved the read replica lag problems and read scale but did not help address the growing write scale for OpenAI. At some point, the balance tipped and it was obvious that the IOPs limits of a single PostgreSQL primary instance will not cut it anymore. As a result OpenAI decided to move new and shardable workloads to Azure Azure Cosmos DB, which is our default recommended NoSQL store for fully elastic workloads. However, some workloads, as noted in the OpenAI blog are much harder to shard. While OpenAI is using Azure Database for PostgreSQL flexible server, several of the write scaling requirements that came up have been baked into our new Azure HorizonDB offering, which entered private preview in November 2025. Some of the architectural innovations are described in the following sections. Azure HorizonDB scalability design To better support more demanding workloads, Azure HorizonDB introduces a new storage layer for Postgres that delivers significant performance and reliability enhancements: More efficient read scale out. Postgres read replicas no longer need to maintain their own copy of the data. They can read pages from the single copy maintained by the storage layer. Lower latency Write-Ahead Logging (WAL) writes and higher throughput page reads via two purpose-built storage services designed for WAL storage and Page storage. Durability and high availability responsibilities are shifted from the Postgres primary to the storage layer, allowing Postgres to dedicate more resources to executing transactions and queries. Postgres failovers are faster and more reliable. To understand how Azure HorizonDB delivers these capabilities, let’s look at its high‑level architecture as shown in Figure 1. It follows a log-centric storage model, where the PostgreSQL writeahead log (WAL) is the sole mechanism used to durably persist changes to storage. PostgreSQL compute nodes never write data pages to storage directly in Azure HorizonDB. Instead, pages and other on-disk structures are treated as derived state and are reconstructed and updated from WAL records by the data storage fleet. Azure HorizonDB storage uses two separate storage services for WAL and data pages. This separation allows each to be designed and optimized for the very different patterns of reads and writes PostgreSQL does against WAL files in contrast to data pages. The WAL server is optimized for very low latency writes to the tail of a sequential WAL stream and the Page server is designed for random reads and writes across potentially many terabytes of pages. These two separate services work together to enable Postgres to handle IO intensive OLTP workloads like OpenAI’s. The WAL server can durably write a transaction across 3 availability zones using a single network hop. The typical PostgreSQL replication setup with a hot standby (Figure 2) requires 4 hops to do the same work. Each hop is a component that can potentially fail or slow down and delay a commit. Azure HorizonDB page service can scale out page reads to many hundreds of thousands of IOPs for each Postgres instance. It does this by sharding the data in Postgres data files across a fleet of page servers. This spreads the reads across many high performance NVMe disks on each page server. 2 - WAL Writes in HorizonDB Another key design principle for Azure HorizonDB was to move durability and high availability related work off PostgreSQL compute allowing it to operate as a stateless compute engine for queries and transactions. This approach gives Postgres more CPU, disk and network to run your application’s business logic. Table 1 summarizes the different tasks that community PostgreSQL has to do, which Azure HorizonDB moves to its storage layer. Work like dirty page writing and checkpointing are no longer done by a Postgres primary. The work for sending WAL files to read replicas is also moved off the primary and into the storage layer – having many read replicas puts no load on the Postgres primary in Azure HorizonDB. Backups are handled by Azure Storage via snapshots, Postgres isn’t involved. Task Resource Savings Postgres Process Moved WAL sending to Postgres replicas Disk IO, Network IO Walsender WAL archiving to blob storage Disk IO, Network IO Archiver WAL filtering CPU, Network IO Shared Storage Specific (*) Dirty Page Writing Disk IO background writer Checkpointing Disk IO checkpointer PostgreSQL WAL recovery Disk IO, CPU startup recovering PostgreSQL read replica redo Disk IO, CPU startup recovering PostgreSQL read replica shared storage Disk IO background, checkpointer Backups Disk IO pg_dump, pg_basebackup, pg_backup_start, pg_backup_stop Full page writes Disk IO Backends doing WAL writing Hot standby feedback Vacuum accuracy walreceiver Table 1 - Summary of work that the Azure HorizonDB storage layer takes over from PostgreSQL The shared storage architecture of Azure HorizonDB is the fundamental building block for delivering exceptional read scalability and elasticity which are critical for many workloads. Users can spin up read replicas instantly without requiring any data copies. Page Servers are able to scale and serve requests from all replicas without any additional storage costs. Since WAL replication is entirely handled by the storage service, the primary’s performance is not impacted as the number of replicas changes. Each read replica can scale independently to serve different workloads, allowing for workload isolation. Finally, this architecture allows Azure HorizonDB to substantially improve the overall experience around high availability (HA). HA replicas can now be added without any data copying or storage costs. Since the data is shared between the replicas and continuously updated by Page Servers, secondary replicas only replay a portion of the WAL and can easily keep up with the primary, reducing failover times. The shared storage also guarantees that there is a single source of truth and the old primary never diverges after a failover. This prevents the need for expensive reconciliation, using pg_rewind, or other techniques and further improves availability. Azure HorizonDB was designed from the ground up with learnings from large scale customers, to meet the requirements of the most demanding workloads. The improved performance, scalability and availability of the Azure HorizonDB architecture make Azure a great destination for Postgres workloads.3.7KViews11likes0CommentsPostgreSQL for the enterprise: scale, secure, simplify
This week at Microsoft Ignite, along with unveiling the new Azure HorizonDB cloud native database service, we’re announcing multiple improvements to our fully managed open-source Azure Database for PostgreSQL service, delivering significant advances in performance, analytics, security, and AI-assisted migration. Let’s walk through nine of the top Azure Database for PostgreSQL features and improvements we’re announcing at Microsoft Ignite 2025. Feature Highlights New Intel and AMD v6-series SKUs (Preview) Scale to multiple nodes with Elastic Clusters (GA) PostgreSQL 18 (GA) Realtime analytics with Fabric Mirroring (GA) Analytical queries inside PostgreSQL with the pg_duckdb extension (Preview) Adding Parquet to the azure_storage extension (GA) Meet compliance requirements with the credcheck, anon & ip4r extensions (GA) Integrated identity with Entra token-refresh libraries for Python AI-Assisted Oracle to PostgreSQL Migration Tool (Preview) Performance and scale New Intel and AMD v6 series SKUs (Preview) You can run your most demanding Postgres workloads on new Intel and AMD v6 General Purpose and Memory Optimized hardware SKUs, now availble in preview These SKUs deliver massive scale for high-performance OLTP, analytics and complex queries, with improved price performance and higher memory ceilings. AMD Confidential Compute v6 SKUs are also in Public Preview, enabling enhanced security for sensitive workloads while leveraging AMD’s advanced hardware capabilities. Here’s what you need to know: Processors: Powered by 5th Gen Intel® Xeon® processor (code-named Emerald Rapids) and AMD's fourth Generation EPYC™ 9004 processors Scale: VM size options scale up to 192 vCores and 1.8 TiB IO: Using the NVMe protocol for data disk access, IO is parallelized to the number of CPU cores and processed more efficiently, offering significant IO improvements Compute tier: Available in our General Purpose and Memory Optimized tiers. You can scale up to these new compute SKUs as needed with minimal downtime. Learn more: Here's a quick summary of the v6 SKUs we’re launching, with links to more information: Processor SKU Max vCores Max Mem Intel Ddsv6 192 768 GiB Edsv6 192 1.8 TiB AMD Dadsv6 96 384 GiB Eadsv6 96 672 GiB DCadsv6 96 386 GiB ECadsv6 96 672 GiB Scale to multiple nodes with Elastic clusters (GA) Elastic clusters are now generally available in Azure Database for PostgreSQL. Built on Citus open-source technology, elastic clusters bring the horizontal scaling of a distributed database to the enterprise features of Azure Database for PostgreSQL. Elastic clusters enable horizontal scaling of databases running across multiple server nodes in a “shared nothing” architecture. This is ideal for workloads with high-throughput and storage-intensive demands such as multi-tenant SaaS and IoT-based workloads. Elastic clusters come with all the enterprise-level capabilities that organizations rely upon in Azure Database for PostgreSQL, including high availability, read replicas, private networking, integrated security and connection pooling. Built-in sharding support at both row and schema level enables you to distribute your data across a cluster of compute resources and run queries in parallel, dramatically increasing throughput and capacity. Learn more: Elastic clusters in Azure Database for PostgreSQL PostgreSQL 18 (GA) When PostgreSQL 18 was released in September, we made a preview available on Azure on the same day. Now we’re announcing that PostgreSQL 18 is generally available on Azure Database for PostgreSQL, with full Major Version Upgrade (MVU) support, marking our fastest-ever turnaround from open-source release to managed service general availability. This release reinforces our commitment to delivering the latest PostgreSQL community innovations to Azure customers, so you can adopt the latest features, performance improvements, and security enhancements on a fully managed, production-ready platform without delay. ^Note: MVU to PG18 is currently available in the NorthCentralUS and WestCentralUS regions, with additional regions being enabled over the next few weeks Now you can: Deploy PostgreSQL 18 in all public Azure regions. Perform in-place major version upgrades to PG18 with no endpoint or connection string changes. Use Microsoft Entra ID authentication for secure, centralized identity management in all PG versions. Enable Query Store and Index Tuning for built-in performance insights and automated optimization. Leverage the 90+ Postgres extensions supported by Azure Database for PostgreSQL. PostgreSQL 18 also delivers major improvements under the hood, ranging from asynchronous I/O and enhanced vacuuming to improved indexing and partitioning, ensuring Azure continues to lead as the most performant, secure, and developer-friendly PostgreSQL managed service in the cloud. Learn more: PostgreSQL 18 open-source release announcement Supported versions of PostgreSQL in Azure Database for PostgreSQL Analytics Real-time analytics with Fabric Mirroring (GA) With Fabric mirroring in Azure Database for PostgreSQL, now generally available, you can run your Microsoft Fabric analytical workloads and capabilities on near-real-time replicated data, without impacting the performance of your production PostgreSQL databases, and at no extra cost. Mirroring in Fabric connects your operational and analytical platforms with continuous data replication from PostgreSQL to Fabric. Transactions are mirrored to Fabric in near real-time, enabling advanced analytics, machine learning, and reporting on live data sets without waiting for traditional batch ETL processes to complete. This approach eliminates the overhead of custom integrations or data pipelines. Production PostgreSQL servers can run mission-critical transactional workloads without being affected by surges in analytical queries and reporting. With our GA announcement Fabric mirroring is ready for production workloads, with secure networking (VNET integration and Private Endpoints supported), Entra ID authentication for centralized identity management, and support for high availability enabled servers, ensuring business continuity for mirroring sessions. Learn more: Mirroring Azure Database for PostgreSQL flexible server Adding Parquet support to the azure_storage extension (GA) In addition to mirroring data directly to Microsoft Fabric, there are many other scenarios that require moving operational data into data lakes for analytics or archival. The complexity of building and maintaining ETL pipelines can be expensive and time-consuming. Azure Database for PostgreSQL now natively supports Parquet via the azure_storage extension, enabling direct SQL-based read/write to Parquet files in Azure Storage. This makes it easy to import and export data in Postgres without external tools or scripts. Parquet is a popular columnar storage format often used in big data and analytics environments (like Spark and Azure Data Lake) because of its efficient compression and query performance for large datasets. Now you can use the azure_storage extension to can skip an entire step: just issue a SQL command to write to and query from a Parquet file in Azure Blob Storage. Learn more: Azure storage extension in Azure Database for PostgreSQL Analytical queries inside PostgreSQL with the pg_duckdb extension (Preview) DuckDB’s columnar engine excels at high performance scans, aggregations and joins over large tables, making it particularly well-suited for analytical queries. The pg_duckdb extension, now available in preview for Azure Database for PostgreSQL combines PostgreSQL’s transactional performance and reliability with DuckDB’s analytical speed for large datasets. Together pg_duckdb and PostgreSQL are an ideal combination for hybrid OLTP + OLAP environments where you need to run analytical queries directly in PostgreSQL without sacrificing performance., To see the pg_duckdb extension in action check out this demo video: https://aka.ms/pg_duckdb Learn more: pg_duckdb – PostgreSQL extension for DuckDB Security Meet compliance requirements with the credcheck, anon & ip4r extensions (GA) Operating in a regulated industry such as Finance, Healthcare and Government means negotiating compliance requirements like HIPAA and PCI-DSS, GDPR that include protection for personalized data and password complexity, expiration and reuse. This week the anon extension, previously in preview, is now generally available for Azure Database for PostgreSQL adding support for dynamic and static masking, anonymized exports, randomization and many other advanced masking techniques. We’ve also added GA support for the credcheck extension, which provides credential checks for usernames, and password complexity, including during user creation, password change and user renaming. This is particularly useful if your application is not using Entra ID and needs to rely on native PostgreSQL users and passwords. If you need to store and query IP ranges for scenarios like auditing, compliance, access control lists, intrusion detection and threat intelligence, another useful extension announced this week is the ip4r extension which provides a set of data types for IPv4 and IPv6 network addresses. Learn more: PostgreSQL Anonymizer credcheck – PostgreSQL username/password checks IP4R - IPv4/v6 and IPv4/v6 range index type for PostgreSQL The Azure team maintains an active pipeline of new PostgreSQL extensions to onboard and upgrade to Azure Database for PostgreSQL For example, another important extension upgraded this week is pg_squeeze which removes unused space from a table. The updated 1.9.1 version adds important stability improvements. Learn more: List of extensions and modules by name Integrated identity with Entra token-refresh libraries for Python In a modern cloud-connected enterprise, identity becomes the most important security perimeter. Azure Database for PostgreSQL is the only managed PostgreSQL service with full Entra integration, but coding applications to take care of Entra token refresh can be complex. This week we’re announcing a new Python library to simplify Entra token refresh. The library automatically refreshes authentication tokens before they expire, eliminating manual token handling and reducing connection failures. The new python_azure_pg_auth library provides seamless Azure Entra ID authentication and supports the latest psycopg and SQLAlchemy drivers with automatic token acquisition, validation, and refresh. Built-in connection pooling is available for both synchronous and asynchronous workloads. Designed for cross-platform use (Windows, Linux, macOS), the package features clean architecture and flexible installation options for different driver combinations. This is our first milestone in a roadmap to add token refresh for additional programming languages and frameworks. Learn more, with code samples to get started here: https://aka.ms/python-azure-pg-auth Migration AI-Assisted Oracle to PostgreSQL Migration Tool (Preview) Database migration is a challenging and time-consuming process, with multiple manual steps requiring schema and apps specific information. The growing popularity, maturity and low cost of PostgreSQL has led to a healthy demand for migration tooling to simplify these steps. The new AI-assisted Oracle Migration Tool preview announced this week greatly simplifies moving from Oracle databases to Azure Database for PostgreSQL. Available in the VS Code PostgreSQL extension the new migration tool combines GitHub Copilot, Azure OpenAI, and custom Language Model Tools to convert Oracle schema, database code and client applications into PostgreSQL-compatible formats. Unlike traditional migration tools that rely on static rules, Azure’s approach leverages Large Language Models (LLMs) and validates every change against a running Azure Database for PostgreSQL instance. This system not only translates syntax but also detects and fixes errors through iterative re-compilation, flagging any items that require human review. Application codebases like Spring Boot and other popular frameworks are refactored and converted. The system also understands context by querying the target Postgres instance for version and installed extensions. It can even invoke capabilities from other VS Code extensions to validate the converted code. The new AI-assisted workflow reduces risk, eliminates significant manual effort, and enables faster modernization while lowering costs. Learn more: https://aka.ms/pg-migration-tooling Be sure to follow the Microsoft Blog for PostgreSQL for regular updates from the Postgres on Azure team at Microsoft. We publish monthly recaps about new features in Azure Database for PostgreSQL, as well as an annual blog about what’s new in Postgres at Microsoft.3.2KViews9likes0CommentsAnnouncing Azure HorizonDB
Affan Dar, Vice President of Engineering, PostgreSQL at Microsoft Charles Feddersen, Partner Director of Program Management, PostgreSQL at Microsoft Today at Microsoft Ignite, we’re excited to unveil the preview of Azure HorizonDB, a fully managed Postgres-compatible database service designed to meet the needs of modern enterprise workloads. The cloud native architecture of Azure HorizonDB delivers highly scalable shared storage, elastic scale-out compute, and a tiered cache optimized for running cloud applications of any scale. Postgres is transforming industries worldwide and is emerging as the foundation of modern data solutions across all sectors at an unprecedented pace. For developers, it is the database of choice for building new applications with its rich set of extensions, open-source API, and expansive ecosystems of tools and libraries. At the same time, but at the opposite end of the workload spectrum, enterprises around the world are also increasingly turning to Postgres to modernize their existing applications. Azure HorizonDB is designed to support applications across the entire workload spectrum from the first line of code in a new app to the migration of large-scale, mission-critical solutions. Developers benefit from the robust Postgres ecosystem and seamless integration with Azure’s advanced AI capabilities, while enterprises can gain a secure, highly available, and performant cloud database to host their business applications. Whether you’re building from scratch or transforming legacy infrastructure, Azure HorizonDB empowers you to innovate and scale with confidence, today and into the future. Azure HorizonDB introduces new levels of performance and scalability to PostgreSQL. The scale-out compute architecture supports up to 3,072 vCores across primary and replica nodes, and the auto-scaling shared storage supports up to 128TB databases while providing sub-millisecond multi-zone commit latencies. This storage innovation enables Azure HorizonDB to deliver up to 3x more throughput when compared with open-source Postgres for transactional workloads. Azure HorizonDB is enterprise ready on day one. With native support for Entra ID, Private Endpoints, and data encryption, it provides compliance and security for sensitive data stored in the cloud. All data is replicated across availability zones by default and maintenance operations are transparent with near-zero downtime. Backups are fully automated, and integration with Azure Defender for Cloud provides additional protection for highly sensitive data. All up, Azure HorizonDB offers enterprise-grade security, compliance, and reliability, making it ready for business use today. Since the launch of ChatGPT, there has been an explosion of new AI apps being built, and Postgres has become the database of choice due in large part to its vector index support. Azure HorizonDB extends the AI capabilities of Postgres further with two key features. We are introducing advanced filtering capabilities to the DiskANN vector index which enable query predicate pushdowns directly into the vector similarity search. This provides significant performance and scalability improvements over pgvector HNSW while maintaining accuracy and is ideal for similarity search over transactional data in Postgres. The second feature is built-in AI model management that seamlessly integrates generative, embedding, and reranking models from Microsoft Foundry for developers to use in the database with zero configuration. In addition to enhanced vector indexing and simplified model management to build powerful new AI apps, we’re also pleased to announce the general availability of Microsoft’s PostgreSQL Extension for VS Code that provides the tooling for Postgres developers to maximize their productivity. Using this extension, GitHub Copilot is context aware of the Postgres database which means less prompting and higher quality answers, and in the Ignite release, we’ve added live monitoring with one-click GitHub Copilot debugging where Agent mode can launch directly from the performance monitoring dashboard to diagnose Postgres performance issues and guide users to a fix. Alpha Life Sciences are an existing Azure customers “I’m truly excited about how Azure HorizonDB empowers our AI development. Its seamless support for Vector DB, RAG, and Agentic AI allows us to build intelligent features directly on a reliable Postgres foundation. With Azure HorizonDB, I can focus on advancing AI capabilities instead of managing infrastructure complexities. It’s a smart, forward-looking solution that perfectly aligns with how we design and deliver AI-powered applications.” Pengcheng Xu, CTO Alpha Life Sciences For enterprises that are modernizing their applications to Postgres in the cloud, the security and availability of Azure HorizonDB make it an ideal platform. However, these migrations are often complex and time consuming for large legacy codebase conversions. To simplify this and reduce the risk, we’re pleased to announce the preview of GitHub Copilot powered Oracle migration built into the PostgreSQL Extension for VS Code. Built into VS Code, teams of engineers can work with GitHub Copilot to automate the end-to-end conversion of complex database code using rich code editing, version control, text authoring, and deployment in an integrated development environment. Azure HorizonDB is the next generation of fully managed, cloud native PostgreSQL database service. Built on the latest Azure infrastructure with state-of-the-art cloud architecture, Azure HorizonDB is ready to for the most demanding application workloads. In addition to our portfolio of managed Postgres services in Azure, Microsoft is deeply invested into the open source Postgres project and is one of the top corporate upstream contributors and sponsors for the PostgreSQL project, with 19 Postgres project contributors employed by Microsoft. As a hyperscale Postgres vendor, it’s critical to actively participate in the open-source project. It enables us to better support our customers down to the metal in Azure, and to contribute our learnings from running Postgres at scale back to the community. We’re committed to continuing our investment to push the Postgres project forward, and the team is already active in making contributions to Postgres 19 to be released in 2026. Ready to explore Azure HorizonDB? Azure HorizonDB is initially available in Central US, West US3, UK South and Australia East regions. Customers are invited to apply for early preview access to Azure HorizonDB and get hands-on experience with this new service. Participation is limited, apply now at aka.ms/PreviewHorizonDBIntroducing support for Graph data in Azure Database for PostgreSQL (Preview)
We are excited to announce the addition of Apache AGE extension in Azure Database for PostgreSQL, a significant advancement that provides graph processing capabilities within the PostgreSQL ecosystem. This new extension brings a powerful toolset for developers looking to leverage a graph database with the robust enterprise features of Azure Database for PostgreSQL.9.3KViews6likes7CommentsScaling PostgreSQL at OpenAI: Lessons in Reliability, Efficiency, and Innovation
At POSETTE: An Event for Postgres 2025, Bohan Zhang of OpenAI delivered a compelling talk on how OpenAI has scaled Azure Database for PostgreSQL- Flexible Server to meet the demands of one of the world’s most advanced AI platforms running at planetary scale. The Postgres team at Microsoft has partnered deeply with OpenAI for years to enhance the service to meet their performance, scale, and availability requirements, and it is great to see how OpenAI is now deploying and depending on Flexible Server as a core component of ChatGPT. Hearing firsthand about their challenges and breakthroughs is a reminder of what’s possible when innovation meets real-world needs. This blog post captures the key insights from Bohan’s POSETTE talk, paired with how Azure’s cloud platform supports innovation at scale. PostgreSQL at the Heart of OpenAI As Bohan shared during his talk, PostgreSQL is the backbone of OpenAI’s most critical systems. Because PostgreSQL plays a critical role in powering services like ChatGPT, Open AI has prioritized making it more resilient and scalable to avoid any disruptions. That’s why OpenAI has invested deeply in optimizing PostgreSQL for reliability and scale. Why Azure Database for PostgreSQL? OpenAI has long operated PostgreSQL on Azure, initially using a single primary instance without sharding. This architecture worked well—until write scalability limits emerged. Azure’s managed PostgreSQL service provides the flexibility to scale read replicas, optimize performance, and maintain high availability to provide global low latency reads without the burden of managing infrastructure. This is why we designed Azure Database for PostgreSQL to support precisely these kinds of high-scale, mission-critical workloads, and OpenAI’s use case is a powerful validation of that vision. Tackling Write Bottlenecks PostgreSQL’s MVCC (Multi-Version Concurrency Control) design presents challenges for write-heavy workloads—such as index bloat, autovacuum tuning complexity, and version churn. OpenAI addressed this by: Reducing unnecessary writes at the application level Using lazy writes and controlled backfills to smooth spikes Migrating extreme write-heavy workloads with natural sharding keys to other systems. These strategies allowed OpenAI to preserve PostgreSQL’s strengths while mitigating its limitations. Optimizing Read-Heavy Workloads With writes offloaded, OpenAI focused on scaling read-heavy workloads. Key optimizations included: Offloading read queries to replicas Avoiding long-running queries and expensive multi-way join queries Using PgBouncer for connection pooling, reducing latency from 50ms to under 5ms Categorizing requests by priority and assigning dedicated read replicas to high-priority traffic As Bohan noted, “After all the optimization we did, we are super happy with Postgres right now for our read-heavy workloads.” Schema Governance and Resilience OpenAI also implemented strict schema governance to avoid full table rewrites and production disruptions. Only lightweight schema changes are allowed, and long-running queries are monitored to prevent them from blocking migrations. To ensure resilience, we categorized requests by priority and implemented multi-level rate limiting—at the application, connection, and query digest levels. This helped prevent resource exhaustion and service degradation. Takeaway OpenAI’s journey is a masterclass in how to operate PostgreSQL at hyper-scale. By offloading writes, scaling read replicas, and enforcing strict schema governance, OpenAI demonstrated PostgreSQL on Azure meets the demands of cutting-edge AI systems. It also reinforces the value of Azure’s managed database services in enabling teams to focus on innovation rather than infrastructure. We’re proud of the work we’ve done to co-innovate with OpenAI and excited to see how other organizations can apply these lessons to their own PostgreSQL deployments. Check out the on-demand talk “Scaling Postgres to the next level at OpenAI” and many more PostgreSQL community sessions from POSETTE.1.7KViews5likes0CommentsNew ESG study validates how fully managed PostgreSQL on Azure delivers economic wins
Migrating your PostgreSQL databases to Azure delivers cost, performance and productivity benefits, while laying a strong foundation for innovation. But don’t just take our word for it. We’ve worked with the Enterprise Strategy Group (ESG), now a part of Omdia, to validate how organizations benefit economically from moving their PostgreSQL databases to Azure. Whether you’re modernizing your mission-critical applications or developing the next groundbreaking feature, migrating to Azure gives you the freedom, flexibility and continuous improvements of open source backed by the reliability, security and efficiency of Azure. Read the full PostgreSQL report PostgreSQL is the preferred choice of developers building the next generation of intelligent applications, according to the 2025 Stack Overflow survey. However, many teams are finding that managing these open-source databases on-premises is increasingly challenging, especially as their innovation initiatives demand more and more resources. Because of this, organizations are rapidly modernizing their database infrastructure to better support these next-gen initiatives. At a glance – benefits of migrating to Azure Database for PostgreSQL Increasing complexity is nothing new to today’s IT and developer teams. Some of the key drivers contributing to this complexity include integrating emerging tech like AI and managing cybersecurity concerns—two things that the fully managed Azure Database for PostgreSQL service handles very well. Built-in GenAI capabilities, performance recommendations, and enterprise-grade security, scalability, compliance and availability make PostgreSQL on Azure a natural fit for teams looking to build intelligent enterprise applications. The ESG report highlights: 58% lower total cost of ownership 65% improvement in database performance $770K in savings from avoiding downtime “We have seen wins on both sides of the financial equation. Our costs are down across the board, and we have increased our revenue specifically because of the capabilities that moving our Azure Database for PostgreSQL.” Review the Azure Database for PostgreSQL Economic Validation Infographic A closer look – how fully managed PostgreSQL on Azure delivers economic wins for the enterprise Lower total cost of ownership Migration dramatically lowers the total cost of ownership of enterprise databases. By shifting from on-premises infrastructure to Azure’s managed service, enterprises eliminate many capital and operational expenses. Elimination of hardware and maintenance costs: On-premises PostgreSQL deployments require investing in servers, storage, networking hardware, as well as ongoing power, cooling, and data center space. Migrating to Azure removes these needs entirely. Companies no longer have to purchase or refresh hardware or pay for associated facilities and utilities, directly cutting capex and support costs. Reduced licensing and support expenses: Azure’s model also eliminates traditional database licensing fees, third-party support contracts, and expensive monitoring tools for on-premises systems. Organizations reported saving thousands on separate support agreements or software licenses for their PostgreSQL instances. Pay-as-you-go flexibility: Azure Database for PostgreSQL offers pay-as-you-go and reserved pricing models, so enterprises only pay for the compute and storage they actually use. There’s no more overprovisioning resources to handle peak loads, and dynamic scaling ensures capacity matches demand. Operational efficiency: By offloading database management to Microsoft, organizations also reduce administrative overhead, which indirectly lowers labor costs. In ESG’s study, moving to Azure cut the monthly DBA hours per database from 2.1 hours to just 0.6 hours, a ~70% decrease in effort, effectively saving payroll expenditure on routine upkeep. Improved performance and scalability Enterprises see substantial improvements in database performance and scalability after migrating to Azure. Because Azure Database for PostgreSQL runs on high-end cloud infrastructure with intelligent optimizations, applications can achieve faster response times and handle greater workloads. Higher throughput and lower latency: ESG’s interviews found average database performance improved by ~65%, and in one case a customer saw a 9× increase in throughput for its primary application after migration. Such gains come from Azure’s optimized compute, premium SSD storage options, and features like automatic performance tuning that are difficult to replicate on-premises. Elastic scaling on demand: In on-premises environments, supporting peak workloads often meant overprovisioning. Azure Database for PostgreSQL completely changes this paradigm with cloud elasticity. The ability to instantly right-size resources means applications always have the performance they need, and users experience responsive, low-latency service. Handling growth with ease: As an enterprise’s data and user base grows, Azure’s global infrastructure can seamlessly accommodate that expansion. This cloud scalability gives enterprises headroom to innovate and onboard more customers without performance bottlenecks. In contrast, scaling an on-premises PostgreSQL often requires complex sharding or hardware upgrades. Accelerated time to value: Improved performance and scalability directly impact business agility. Batch processes complete faster, reports generate sooner, and websites or applications can serve more customers per second. ESG noted that by removing infrastructure constraints, Azure empowered businesses to accelerate their time-to-value and respond faster to market demands. Operational agility and developer productivity By migrating to a fully managed service, enterprises gain agility and allow their IT/development teams to focus on innovation. Offloading database management to Azure not only saves costs but also frees up technical staff from mundane maintenance. This shift translates into faster project delivery and greater productivity: Less time spent “keeping the lights on”: ESG found that after migration, companies saw a major reduction in the effort required to manage databases. Administrators went from spending 2+ hours per database per month on upkeep to less than one hour. This over 70% drop in DBA workload means IT teams are no longer bogged down by routine chores. Faster development and release cycles: ESG observed that organizations enjoyed increased development velocity after migrating, since their engineers could devote time to coding and testing new features instead of managing database infrastructure. For example, one company in the study was able to increase its software release frequency significantly. Improved business agility: The combination of easier scaling, better performance, and less ops overhead means the organization can respond to opportunities faster. Some enterprises even credited the move to Azure with helping increase their revenue, because it allowed them to deliver new capabilities to market sooner. Focus on core competencies: After migration, organizations can let Azure handle the heavy lifting of database administration and instead concentrate on work that differentiates them in the marketplace. Developers spend more time building applications and analyzing areas that drive business value rather than performing software updates or fixing replication issues. Enhanced security, compliance, and reliability Azure Database for PostgreSQL provides enterprise-grade security and reliability features that far exceed what most companies can achieve on-premises. This results in a stronger risk posture, reducing the likelihood of breaches or downtime while also easing compliance burdens. Built-in high availability and disaster recovery: ESG’s modeled scenario saw annual PostgreSQL downtime drop from 10 hours on-premises to just 5 hours on Azure. With a 99.99% availability SLA for Azure Database for PostgreSQL, unplanned outages that used to disrupt business are largely a thing of the past. One ESG case study estimated about $770K in costs were avoided thanks to preventing downtime and the associated business disruptions. Strong security and data protection: PostgreSQL instances on Azure benefit from Microsoft’s massive investments in cybersecurity and compliance. One customer highlighted, “We are much more secure since we moved to Azure Database for PostgreSQL. We use Azure AI to set our security standards and get constant recommendations on how to increase our security even more.” Automated updates and governance: Azure takes care of updating PostgreSQL with the latest security fixes and can even upgrade the database engine version with minimal downtime. Furthermore, features like audit logging, advanced threat protection, and integration with Azure Security Center provide continuous oversight of database activity. Geo-redundancy and backup management: For disaster recovery, Azure allows geo-redundant backups and read replicas in different regions, improving an enterprise’s resilience to regional outages or disasters. Should data restoration be needed, it’s as simple as clicking a button. Azure Database for PostgreSQL offers enterprises a frictionless path to greater efficiency, innovation, and growth. By lowering costs and management burdens, it lets you redirect resources to strategic projects. By boosting performance and scalability, it ensures your applications can keep up with business demands. And by enhancing security and reliability, it safeguards one of your most precious assets—your data—while meeting the strict requirements of enterprise IT. The benefits outlined in the ESG study make a strong business case: migrating on-premises databases to Azure’s managed PostgreSQL can transform your IT operations and deliver tangible business value from day one. Tested, approved, trusted Migrating to a fully managed PostgreSQL service supports digital transformation. It allows enterprises to modernize their data estate without abandoning the familiarity of PostgreSQL. Developers can continue using the open-source tools and skills they know, but now with cloud-powered capabilities at their fingertips. Azure integrations (with AI services, analytics tools, etc.) further enable organizations to do more with their data. For example, companies can readily infuse AI or machine learning into their applications or take advantage of advanced analytics on their PostgreSQL data, since that data is easily accessible in the cloud. Read the full report for more details about the quantified benefits and customer testimonials. If you’re ready to start your journey, check out our migration guides. With Azure’s fully managed PostgreSQL, you can supercharge your data strategy, empower your developers, and ultimately accelerate your path to an AI-driven future.265Views4likes0Comments