databases
12 Topics- Scaling PostgreSQL at OpenAI: Lessons in Reliability, Efficiency, and InnovationAt POSETTE: An Event for Postgres 2025, Bohan Zhang of OpenAI delivered a compelling talk on how OpenAI has scaled Azure Database for PostgreSQL- Flexible Server to meet the demands of one of the world’s most advanced AI platforms running at planetary scale. The Postgres team at Microsoft has partnered deeply with OpenAI for years to enhance the service to meet their performance, scale, and availability requirements, and it is great to see how OpenAI is now deploying and depending on Flexible Server as a core component of ChatGPT. Hearing firsthand about their challenges and breakthroughs is a reminder of what’s possible when innovation meets real-world needs. This blog post captures the key insights from Bohan’s POSETTE talk, paired with how Azure’s cloud platform supports innovation at scale. PostgreSQL at the Heart of OpenAI As Bohan shared during his talk, PostgreSQL is the backbone of OpenAI’s most critical systems. Because PostgreSQL plays a critical role in powering services like ChatGPT, Open AI has prioritized making it more resilient and scalable to avoid any disruptions. That’s why OpenAI has invested deeply in optimizing PostgreSQL for reliability and scale. Why Azure Database for PostgreSQL? OpenAI has long operated PostgreSQL on Azure, initially using a single primary instance without sharding. This architecture worked well—until write scalability limits emerged. Azure’s managed PostgreSQL service provides the flexibility to scale read replicas, optimize performance, and maintain high availability to provide global low latency reads without the burden of managing infrastructure. This is why we designed Azure Database for PostgreSQL to support precisely these kinds of high-scale, mission-critical workloads, and OpenAI’s use case is a powerful validation of that vision. Tackling Write Bottlenecks PostgreSQL’s MVCC (Multi-Version Concurrency Control) design presents challenges for write-heavy workloads—such as index bloat, autovacuum tuning complexity, and version churn. OpenAI addressed this by: Reducing unnecessary writes at the application level Using lazy writes and controlled backfills to smooth spikes Migrating extreme write-heavy workloads with natural sharding keys to other systems. These strategies allowed OpenAI to preserve PostgreSQL’s strengths while mitigating its limitations. Optimizing Read-Heavy Workloads With writes offloaded, OpenAI focused on scaling read-heavy workloads. Key optimizations included: Offloading read queries to replicas Avoiding long-running queries and expensive multi-way join queries Using PgBouncer for connection pooling, reducing latency from 50ms to under 5ms Categorizing requests by priority and assigning dedicated read replicas to high-priority traffic As Bohan noted, “After all the optimization we did, we are super happy with Postgres right now for our read-heavy workloads.” Schema Governance and Resilience OpenAI also implemented strict schema governance to avoid full table rewrites and production disruptions. Only lightweight schema changes are allowed, and long-running queries are monitored to prevent them from blocking migrations. To ensure resilience, we categorized requests by priority and implemented multi-level rate limiting—at the application, connection, and query digest levels. This helped prevent resource exhaustion and service degradation. Takeaway OpenAI’s journey is a masterclass in how to operate PostgreSQL at hyper-scale. By offloading writes, scaling read replicas, and enforcing strict schema governance, OpenAI demonstrated PostgreSQL on Azure meets the demands of cutting-edge AI systems. It also reinforces the value of Azure’s managed database services in enabling teams to focus on innovation rather than infrastructure. We’re proud of the work we’ve done to co-innovate with OpenAI and excited to see how other organizations can apply these lessons to their own PostgreSQL deployments. Check out the on-demand talk “Scaling Postgres to the next level at OpenAI” and many more PostgreSQL community sessions from POSETTE.
- April 2025 Recap: Azure Database for PostgreSQL Flexible ServerHello Azure Community, April has brought powerful capabilities to Azure Database for PostgreSQL flexible server, On-Demand backups are now Generally Available, a new Terraform version for our latest REST API has been released, the Public Preview of the MCP Server is now live, and there are also a few other updates that we are excited to share in this blog. Stay tuned as we dive into the details of these new features and how they can benefit you! Feature Highlights General Availability of On-Demand Backups Public Preview of Model Context Protocol (MCP) Server Additional Tuning Parameters in PG 17 Terraform resource released for latest REST API version General Availability of pg_cron extension in PG 17 General Availability of On-Demand Backups We are excited to announce General Availability of On-Demand backups for Azure Database for PostgreSQL flexible server. With this it becomes easier to streamline the process of backup management, including automated, scheduled storage volume snapshots encompassing the entire database instance and all associated transaction logs. On-demand backups provide you with the flexibility to initiate backups at any time, supplementing the existing scheduled backups. This capability is useful for scenarios such as application upgrades, schema modifications, or major version upgrades. For instance, before making schema changes, you can take a database backup, in an unlikely case, if you run into any issues, you can quickly restore (PITR) database back to a point before the schema changes were initiated. Similarly, during major version upgrades, on-demand backups provide a safety net, allowing you to revert to a previous state if anything goes wrong. In the absence of on-demand backup, the PITR could take much longer as it would need to take the last snapshot which could be 24 hours earlier and then replay the WAL. Azure Database for PostgreSQL flexible server already does on-demand backup behind the scenes for you and then deletes it when the upgrade is successful. Key Benefits: Immediate Backup Creation: Trigger full backups instantly. Cost Control: Delete on-demand backups when no longer needed. Improved Safety: Safeguard data before major changes or refreshes. Easy Access: Use via Azure Portal, CLI, ARM templates, or REST APIs. For more details and on how to get started, check out this announcement blog post. Create your first on-demand backup using the Azure portal or Azure CLI. Public Preview of Model Context Protocol (MCP) Server Model Context Protocol (MCP) is a new and emerging open protocol designed to integrate AI models with the environments where your data and tools reside in a scalable, standardized, and secure manner. We are excited to introduce the Public Preview of MCP Server for Azure Database for PostgreSQL flexible server which enables your AI applications and models to talk to your data hosted in Azure Database for PostgreSQL flexible servers according to the MCP standard. The MCP Server exposes a suite of tools including listing databases, tables, and schema information, reading and writing data, creating and dropping tables, listing Azure Database for PostgreSQL configurations, retrieving server parameter values, and more. You can either build custom AI apps and agents with MCP clients to invoke these capabilities or use AI tools like Claude Desktop and GitHub Copilot in Visual Studio Code to interact with your Azure PostgreSQL data simply by asking questions in plain English. For more details and demos on how to get started, check out this announcement blog post. Additional Tuning Parameters in PG17 We have now provided an expanded set of configuration parameters in Azure Database for PostgreSQL flexible server (V17) that allows you to modify and have greater control to optimize your database performance for unique workloads. You can now tune internal buffer settings like commit timestamp, multixact member and offset, notify, serializable, subtransaction, and transaction buffers, allowing you to better manage memory and concurrency in high-throughput environments. Additionally, you can also configure parallel append, plan cache mode, and event triggers that opens powerful optimization and automation opportunities for analytical workloads and custom logic execution. This gives you more control for memory intensive and high-concurrency applications, increased control over execution plans and allowing parallel execution of queries. To get started, all newly modifiable parameters are available now through the Azure portal, Azure CLI, and ARM templates, just like any other server configuration setting. To learn more, visit our Server Parameter Documentation. Terraform resource released for latest REST API version A new version of the Terraform resource for Azure Databases for PostgreSQL flexible server is now available, this brings several key improvements including the ability to easily revive dropped databases with geo-redundancy and customer-managed keys (Geo + CMK - Revive Dropped), seamless switchover of read replicas to a new site (Read Replicas - Switchover), improved connectivity through virtual endpoints for read replicas, and using on-demand backups for your servers. To get started with Terraform support, please follow this link: Deploy Azure Database for PostgreSQL flexible server with Terraform General Availability of pg_cron extension in PG 17 We’re excited to announce that the pg_cron extension is now supported in Azure Database for PostgreSQL flexible server major versions including PostgreSQL 17. This extension enables simple, time-based job scheduling directly within your database, making maintenance and automation tasks easier than ever. You can get started today by enabling the extension through the Azure portal or CLI. To learn more, please refer Azure Database for PostgreSQL flexible server list of extensions. Azure Postgres Learning Bytes 🎓 Setting up alerts for Azure Database PostgreSQL flexible server using Terraform Monitoring metrics and setting up alerts for your Azure Database for PostgreSQL flexible server instance is crucial for maintaining optimal performance and troubleshooting workload issues. By configuring alerts, you can track key metrics like CPU usage and storage etc. and receive notifications by creating an action group for your alert metrics. This guide will walk you through the process of setting up alerts using Terraform. First, create an instance of Azure Database for PostgreSQL flexible server (if not already created) Next, create a Terraform File and add these resources 'azurerm_monitor_action_group', 'azurerm_monitor_metric_alert' as shown below. resource "azurerm_monitor_action_group" "example" { name = "<action-group-name>" resource_group_name = "<rg-name>" short_name = "<short-name>" email_receiver { name = "sendalerts" email_address = "<youremail>" use_common_alert_schema = true } } resource "azurerm_monitor_metric_alert" "example" { name = "<alert-name>" resource_group_name = "<rg-name>" scopes = [data.azurerm_postgresql_flexible_server.demo.id] description = "Alert when CPU usage is high" severity = 3 frequency = "PT5M" window_size = "PT5M" enabled = true criteria { metric_namespace = "Microsoft.DBforPostgreSQL/flexibleServers" metric_name = "cpu_percent" aggregation = "Average" operator = "GreaterThan" threshold = 80 } action { action_group_id = azurerm_monitor_action_group.example.id } } 3. Run the terraform initialize, plan and apply commands to create an action group and attach a metric to the Azure Database for PostgreSQL flexible server instance. terraform init -upgrade terraform plan -out <file-name> terraform apply <file-name>.tfplan Note: This script assumes you have already created an Azure Database for PostgreSQL flexible server instance. To verify your alert, check the Azure portal under Monitoring -> Alerts -> Alert Rules tab. Conclusion That's a wrap for the April 2025 feature updates! Stay tuned for our Build announcements, as we have a lot of exciting updates and enhancements for Azure Database for PostgreSQL flexible server coming up this month. We’ve also published our Yearly Recap Blog, highlighting many improvements and announcements we’ve delivered over the past year. Take a look at our yearly recap blog here: What's new with Postgres at Microsoft, 2025 edition We are always dedicated to improving our service with new array of features, if you have any feedback or suggestions we would love to hear from you. 📢 Share your thoughts here: aka.ms/pgfeedback Thanks for being part of our growing Azure Postgres community.