Blog Post

Azure Database for PostgreSQL Blog
3 MIN READ

Scaling PostgreSQL at OpenAI: Lessons in Reliability, Efficiency, and Innovation

charlesfeddersenMS's avatar
Jun 30, 2025

By: Charles Feddersen, Director of Product Management, PostgreSQL and MySQL

At POSETTE: An Event for Postgres 2025, Bohan Zhang of OpenAI delivered a compelling talk on how OpenAI has scaled Azure Database for PostgreSQL- Flexible Server to meet the demands of one of the world’s most advanced AI platforms running at planetary scale. The Postgres team at Microsoft has partnered deeply with OpenAI for years to enhance the service to meet their performance, scale, and availability requirements, and it is great to see how OpenAI is now deploying and depending on Flexible Server as a core component of ChatGPT. Hearing firsthand about their challenges and breakthroughs is a reminder of what’s possible when innovation meets real-world needs.

This blog post captures the key insights from Bohan’s POSETTE talk, paired with how Azure’s cloud platform supports innovation at scale.

PostgreSQL at the Heart of OpenAI

As Bohan shared during his talk, PostgreSQL is the backbone of OpenAI’s most critical systems. Because PostgreSQL plays a critical role in powering services like ChatGPT, Open AI has prioritized making it more resilient and scalable to avoid any disruptions. That’s why OpenAI has invested deeply in optimizing PostgreSQL for reliability and scale.

Why Azure Database for PostgreSQL?

OpenAI has long operated PostgreSQL on Azure, initially using a single primary instance without sharding. This architecture worked well—until write scalability limits emerged. Azure’s managed PostgreSQL service provides the flexibility to scale read replicas, optimize performance, and maintain high availability to provide global low latency reads without the burden of managing infrastructure.

This is why we designed Azure Database for PostgreSQL to support precisely these kinds of high-scale, mission-critical workloads, and OpenAI’s use case is a powerful validation of that vision.

Tackling Write Bottlenecks

PostgreSQL’s MVCC (Multi-Version Concurrency Control) design presents challenges for write-heavy workloads—such as index bloat, autovacuum tuning complexity, and version churn. OpenAI addressed this by:

  • Reducing unnecessary writes at the application level
  • Using lazy writes and controlled backfills to smooth spikes
  • Migrating extreme write-heavy workloads with natural sharding keys to other systems.

These strategies allowed OpenAI to preserve PostgreSQL’s strengths while mitigating its limitations.

Optimizing Read-Heavy Workloads

With writes offloaded, OpenAI focused on scaling read-heavy workloads. Key optimizations included:

  • Offloading read queries to replicas
  • Avoiding long-running queries and expensive multi-way join queries
  • Using PgBouncer for connection pooling, reducing latency from 50ms to under 5ms
  • Categorizing requests by priority and assigning dedicated read replicas to high-priority traffic

As Bohan noted, “After all the optimization we did, we are super happy with Postgres right now for our read-heavy workloads.”

Schema Governance and Resilience

OpenAI also implemented strict schema governance to avoid full table rewrites and production disruptions. Only lightweight schema changes are allowed, and long-running queries are monitored to prevent them from blocking migrations.

To ensure resilience, we categorized requests by priority and implemented multi-level rate limiting—at the application, connection, and query digest levels. This helped prevent resource exhaustion and service degradation.

Takeaway

OpenAI’s journey is a masterclass in how to operate PostgreSQL at hyper-scale. By offloading writes, scaling read replicas, and enforcing strict schema governance, OpenAI demonstrated PostgreSQL on Azure meets the demands of cutting-edge AI systems. It also reinforces the value of Azure’s managed database services in enabling teams to focus on innovation rather than infrastructure.

We’re proud of the work we’ve done to co-innovate with OpenAI and excited to see how other organizations can apply these lessons to their own PostgreSQL deployments.

Check out the on-demand talk “Scaling Postgres to the next level at OpenAI” and many more PostgreSQL community sessions from POSETTE.

Updated Jun 30, 2025
Version 1.0
No CommentsBe the first to comment