posette
9 TopicsScaling PostgreSQL at OpenAI: Lessons in Reliability, Efficiency, and Innovation
At POSETTE: An Event for Postgres 2025, Bohan Zhang of OpenAI delivered a compelling talk on how OpenAI has scaled Azure Database for PostgreSQL- Flexible Server to meet the demands of one of the world’s most advanced AI platforms running at planetary scale. The Postgres team at Microsoft has partnered deeply with OpenAI for years to enhance the service to meet their performance, scale, and availability requirements, and it is great to see how OpenAI is now deploying and depending on Flexible Server as a core component of ChatGPT. Hearing firsthand about their challenges and breakthroughs is a reminder of what’s possible when innovation meets real-world needs. This blog post captures the key insights from Bohan’s POSETTE talk, paired with how Azure’s cloud platform supports innovation at scale. PostgreSQL at the Heart of OpenAI As Bohan shared during his talk, PostgreSQL is the backbone of OpenAI’s most critical systems. Because PostgreSQL plays a critical role in powering services like ChatGPT, Open AI has prioritized making it more resilient and scalable to avoid any disruptions. That’s why OpenAI has invested deeply in optimizing PostgreSQL for reliability and scale. Why Azure Database for PostgreSQL? OpenAI has long operated PostgreSQL on Azure, initially using a single primary instance without sharding. This architecture worked well—until write scalability limits emerged. Azure’s managed PostgreSQL service provides the flexibility to scale read replicas, optimize performance, and maintain high availability to provide global low latency reads without the burden of managing infrastructure. This is why we designed Azure Database for PostgreSQL to support precisely these kinds of high-scale, mission-critical workloads, and OpenAI’s use case is a powerful validation of that vision. Tackling Write Bottlenecks PostgreSQL’s MVCC (Multi-Version Concurrency Control) design presents challenges for write-heavy workloads—such as index bloat, autovacuum tuning complexity, and version churn. OpenAI addressed this by: Reducing unnecessary writes at the application level Using lazy writes and controlled backfills to smooth spikes Migrating extreme write-heavy workloads with natural sharding keys to other systems. These strategies allowed OpenAI to preserve PostgreSQL’s strengths while mitigating its limitations. Optimizing Read-Heavy Workloads With writes offloaded, OpenAI focused on scaling read-heavy workloads. Key optimizations included: Offloading read queries to replicas Avoiding long-running queries and expensive multi-way join queries Using PgBouncer for connection pooling, reducing latency from 50ms to under 5ms Categorizing requests by priority and assigning dedicated read replicas to high-priority traffic As Bohan noted, “After all the optimization we did, we are super happy with Postgres right now for our read-heavy workloads.” Schema Governance and Resilience OpenAI also implemented strict schema governance to avoid full table rewrites and production disruptions. Only lightweight schema changes are allowed, and long-running queries are monitored to prevent them from blocking migrations. To ensure resilience, we categorized requests by priority and implemented multi-level rate limiting—at the application, connection, and query digest levels. This helped prevent resource exhaustion and service degradation. Takeaway OpenAI’s journey is a masterclass in how to operate PostgreSQL at hyper-scale. By offloading writes, scaling read replicas, and enforcing strict schema governance, OpenAI demonstrated PostgreSQL on Azure meets the demands of cutting-edge AI systems. It also reinforces the value of Azure’s managed database services in enabling teams to focus on innovation rather than infrastructure. We’re proud of the work we’ve done to co-innovate with OpenAI and excited to see how other organizations can apply these lessons to their own PostgreSQL deployments. Check out the on-demand talk “Scaling Postgres to the next level at OpenAI” and many more PostgreSQL community sessions from POSETTE.POSETTE - What’s New with Azure Database for PostgreSQL - Flexible Server in 2025 🆕
Talk Recap I had the opportunity to present at POSETTE: An Event for Postgres 2025, where I shared what’s new with Azure Database for PostgreSQL – Flexible Server. The session covers: Recent feature updates in performance, storage, and compute New AI-ready extensions like AZURE_AI, DISKANN, and PGVECTOR Improvements in high availability, GeoDR, and major version upgrades Enterprise-grade security enhancements (CMK, Entra ID) Tuning & monitoring improvements to simplify day-to-day operations 🎥 Watch the talk here: https://youtu.be/GnA8Z1Ojnk0?si=r1dbJb57JKjTGl68 Would love your feedback—and happy to answer any follow-up questions in this thread!34Views0likes0CommentsCalling Postgres speakers, POSETTE CFP is open until Apr 7th 2024
Call for Proposals (CFP) is open til Sun Apr 7 at 11:59pm PDT for POSETTE: An Event for Postgres, a free & virtual event. What’s your Postgres story? We’d love to see your talk proposal, whether you’re a first-time speaker, a regular on the Postgres conference circuit, or somewhere in between.Need feedback on blog post "What's new with Postgres at Microsoft, 2024 edition"
Just published this brand new deep-dive of a blog post to share highlights of all the Azure & the open source work done by the Postgres team at Microsoft over the last 8 months. The title: What's new with Postgres at Microsoft, 2024 edition. // And would like to know: do you find this useful?? The post contains a detailed infographic (handmade) that gives you a visual outline of all the different Postgres workstreams our engineering & PM teams have been driving. And because the Postgres 17 code freeze just happened last month, I included highlights from some of the new PG17 capabilities our Postgres contributor team worked on as well. If you're an Azure Database for PostgreSQL - Flexible Server customer, you won't be disappointed. Lots of new features rolled out in the last 8 months.408Views5likes0Comments