open source
118 TopicsStop Drawing Architecture Diagrams Manually: Meet the Open-Source AI Architecture Review Agents
Designing and documenting software architecture is often a battle against static diagrams that become outdated the moment they are drawn. The Architecture Review Agent changes that by turning your design process into a dynamic, AI-powered workflow. In this post, we explore how to leverage Microsoft Foundry Hosted Agents, Azure OpenAI, and Excalidraw to build an open-source tool that instantly converts messy text descriptions, YAML, or README files into editable architecture diagrams. Beyond just drawing boxes, the agent acts as a technical co-pilot, delivering prioritized risk assessments, highlighting single points of failure, and mapping component dependencies. Discover how to eliminate manual diagramming, catch security flaws early, and deploy your own enterprise-grade agent with zero infrastructure overhead.810Views1like0CommentsIntegrating Microsoft Foundry with OpenClaw: Step by Step Model Configuration
Step 1: Deploying Models on Microsoft Foundry Let us kick things off in the Azure portal. To get our OpenClaw agent thinking like a genius, we need to deploy our models in Microsoft Foundry. For this guide, we are going to focus on deploying gpt-5.2-codex on Microsoft Foundry with OpenClaw. Navigate to your AI Hub, head over to the model catalog, choose the model you wish to use with OpenClaw and hit deploy. Once your deployment is successful, head to the endpoints section. Important: Grab your Endpoint URL and your API Keys right now and save them in a secure note. We will need these exact values to connect OpenClaw in a few minutes. Step 2: Installing and Initializing OpenClaw Next up, we need to get OpenClaw running on your machine. Open up your terminal and run the official installation script: curl -fsSL https://openclaw.ai/install.sh | bash The wizard will walk you through a few prompts. Here is exactly how to answer them to link up with our Azure setup: First Page (Model Selection): Choose "Skip for now". Second Page (Provider): Select azure-openai-responses. Model Selection: Select gpt-5.2-codex , For now only the models listed (hosted on Microsoft Foundry) in the picture below are available to be used with OpenClaw. Follow the rest of the standard prompts to finish the initial setup. Step 3: Editing the OpenClaw Configuration File Now for the fun part. We need to manually configure OpenClaw to talk to Microsoft Foundry. Open your configuration file located at ~/.openclaw/openclaw.json in your favorite text editor. Replace the contents of the models and agents sections with the following code block: { "models": { "providers": { "azure-openai-responses": { "baseUrl": "https://<YOUR_RESOURCE_NAME>.openai.azure.com/openai/v1", "apiKey": "<YOUR_AZURE_OPENAI_API_KEY>", "api": "openai-responses", "authHeader": false, "headers": { "api-key": "<YOUR_AZURE_OPENAI_API_KEY>" }, "models": [ { "id": "gpt-5.2-codex", "name": "GPT-5.2-Codex (Azure)", "reasoning": true, "input": ["text", "image"], "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }, "contextWindow": 400000, "maxTokens": 16384, "compat": { "supportsStore": false } }, { "id": "gpt-5.2", "name": "GPT-5.2 (Azure)", "reasoning": false, "input": ["text", "image"], "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }, "contextWindow": 272000, "maxTokens": 16384, "compat": { "supportsStore": false } } ] } } }, "agents": { "defaults": { "model": { "primary": "azure-openai-responses/gpt-5.2-codex" }, "models": { "azure-openai-responses/gpt-5.2-codex": {} }, "workspace": "/home/<USERNAME>/.openclaw/workspace", "compaction": { "mode": "safeguard" }, "maxConcurrent": 4, "subagents": { "maxConcurrent": 8 } } } } You will notice a few placeholders in that JSON. Here is exactly what you need to swap out: Placeholder Variable What It Is Where to Find It <YOUR_RESOURCE_NAME> The unique name of your Azure OpenAI resource. Found in your Azure Portal under the Azure OpenAI resource overview. <YOUR_AZURE_OPENAI_API_KEY> The secret key required to authenticate your requests. Found in Microsoft Foundry under your project endpoints or Azure Portal keys section. <USERNAME> Your local computer's user profile name. Open your terminal and type whoami to find this. Step 4: Restart the Gateway After saving the configuration file, you must restart the OpenClaw gateway for the new Foundry settings to take effect. Run this simple command: openclaw gateway restart Configuration Notes & Deep Dive If you are curious about why we configured the JSON that way, here is a quick breakdown of the technical details. Authentication Differences Azure OpenAI uses the api-key HTTP header for authentication. This is entirely different from the standard OpenAI Authorization: Bearer header. Our configuration file addresses this in two ways: Setting "authHeader": false completely disables the default Bearer header. Adding "headers": { "api-key": "<key>" } forces OpenClaw to send the API key via Azure's native header format. Important Note: Your API key must appear in both the apiKey field AND the headers.api-key field within the JSON for this to work correctly. The Base URL Azure OpenAI's v1-compatible endpoint follows this specific format: https://<your_resource_name>.openai.azure.com/openai/v1 The beautiful thing about this v1 endpoint is that it is largely compatible with the standard OpenAI API and does not require you to manually pass an api-version query parameter. Model Compatibility Settings "compat": { "supportsStore": false } disables the store parameter since Azure OpenAI does not currently support it. "reasoning": true enables the thinking mode for GPT-5.2-Codex. This supports low, medium, high, and xhigh levels. "reasoning": false is set for GPT-5.2 because it is a standard, non-reasoning model. Model Specifications & Cost Tracking If you want OpenClaw to accurately track your token usage costs, you can update the cost fields from 0 to the current Azure pricing. Here are the specs and costs for the models we just deployed: Model Specifications Model Context Window Max Output Tokens Image Input Reasoning gpt-5.2-codex 400,000 tokens 16,384 tokens Yes Yes gpt-5.2 272,000 tokens 16,384 tokens Yes No Current Cost (Adjust in JSON) Model Input (per 1M tokens) Output (per 1M tokens) Cached Input (per 1M tokens) gpt-5.2-codex $1.75 $14.00 $0.175 gpt-5.2 $2.00 $8.00 $0.50 Conclusion: And there you have it! You have successfully bridged the gap between the enterprise-grade infrastructure of Microsoft Foundry and the local autonomy of OpenClaw. By following these steps, you are not just running a chatbot; you are running a sophisticated agent capable of reasoning, coding, and executing tasks with the full power of GPT-5.2-codex behind it. The combination of Azure's reliability and OpenClaw's flexibility opens up a world of possibilities. Whether you are building an automated devops assistant, a research agent, or just exploring the bleeding edge of AI, you now have a robust foundation to build upon. Now it is time to let your agent loose on some real tasks. Go forth, experiment with different system prompts, and see what you can build. If you run into any interesting edge cases or come up with a unique configuration, let me know in the comments below. Happy coding!1.6KViews1like1CommentMicrosoft at PGConf India 2026
I’m genuinely excited about PGConf India 2026. Over the past few editions, the conference has continued to grow year over year—both in size and in impact—and it has firmly established itself as one of the key events on the global PostgreSQL calendar. That momentum was very evident again in the depth, breadth, and overall quality of the program for PGConf India 2026. Microsoft is proud to be a diamond sponsor for the conference again this year. At Microsoft, we continue our contributions to the upstream PostgreSQL open-source project—as well as to serve our customers with our Postgres managed service offerings, both Azure Database for PostgreSQL and our newest Postgres offering, Azure HorizonDB . On the open-source front, Microsoft had 540 commits in PG18, including major features like Asynchronous IO. We’re also excited to grow our Postgres open-source contributors team, and so happy to welcome Noah Misch to our team. Noah is a Postgres committer who has deep expertise in PostgreSQL security and is focused on correctness and reliability in PostgreSQL’s core. Microsoft at PGConf India 2026: Highlights from Our Speakers PGConf India has several tracks, all of which have some great talks I am looking forward to. First, the plug. 😊 Microsoft has some amazing talks this year, and we have 8 different talks spread across all the tracks. Postgres on Azure : Scaling with Azure HorizonDB, AI, and Developer Workflows, by Aditya Duvuri & Divya Bhargov Resizing shared buffer pool in a running PostgreSQL server: important, yet impossible, by Ashutosh Bapat Ten Postgres Hacker Journeys—and what they teach us, by Claire Giordano How Postgres can leverage disk bandwidth for better TPS, by Nikhil Chawla AWSM FSM! Free Space Maps Decoded by Nikhil Sontakke Journey of developing a Performance Optimization Feature in PostgreSQL, by Rahila Syed Build Agentic AI with Semantic Kernel and Graph RAG on PostgreSQL, by Shriram Muthukrishnan & Palak Chaturvedi All things Postgres @ Microsoft (2026 edition) by Sumedh Pathak Claire is an amazing speaker and has done a lot of work over the last several years documenting and understanding PostgreSQL committers and hackers. Her talk will definitely have some key insights and nuggets of information. Rahila’s talk will go in depth on performance optimization features and how best to test and benchmark them, and all the tools and tricks she has used as part of the feature development. This should be a must-see talk for anyone doing performance work. Diving Deep: Case Studies & Technical Tracks One of the tracks I’m really excited about is the Case Study track. I see these as similar to ‘Experience’ papers in academia. An experience paper documents what actually happened when applying a technique or system in the real world, what worked, what didn’t, and why. One of the talks I’m looking forward to is ‘Operating Postgres Logical Replication at Massive Scale’ by Sai Srirampur from Clickhouse. Logical Replication is an extremely useful tool, and I’m curious to learn more about pitfalls and lessons learnt when running this at large scale. Another interesting one I’m curious to hear is ‘Understanding the importance of the commit log through a database corruption’ by Amit Kumar Singh from EDB. The Database Engine Developers track allows us to go deep into the PostgreSQL code base and get a better understanding of how PostgreSQL is built. Even if you are not a database developer, this track is useful to understand how and why PostgreSQL does things, helping you be a better user of the database. With the rise of larger machines and memory available in the Cloud, different and newer memory architectures/tiers and serverless product offerings, there is a lot of deep dive in PostgreSQL’s memory architecture. There are some great talks focused on this area, which should be must-see for anyone interested in this topic: Resizing shared buffer pool in a running PostgreSQL server: important, yet impossible by Ashutosh Bapat from Microsoft From Disk to Data: Exploring PostgreSQL's Buffer Management by Lalit Choudhary from PurnaBIT Beyond shared_buffers: On-Demand Memory in Modern PostgreSQL by Vaibhav Popat from Google Finally, the Database Administration and Application Developer tracks have some really great content as well. They cover a wide range of topics, from PII data, HA/DR, Query Tuning to connection pooling and understanding conflict detection and resolution. PostgreSQL in India: A Community Effort Worth Celebrating Conferences like these are a rich source of information, dramatically increasing my personal understanding of the product and the ecosystem. Separately, they are also a great way to meet other practitioners in the space and connect with people in the industry. For people in Bangalore, another great option is the PostgreSQL Bangalore Meetup, and I’m super happy that Microsoft was able to join the ranks of other companies to host the eighth iteration of this meetup. Finally, I would be remiss in not mentioning the hard work done by the PGConf India organizing team including Pavan Deolasse, Ashish Mehra, Nikhil Sontakke, Hari Kiran, and Rushabh Lathia who are making all of this happen. Also, a big shout out to the PGConf India Program Committee (Amul Sul, Dilip Kumar, Marc Linster, Thomas Munro, Vigneshwaran C) for putting together an amazing set of talks. I look forward to meeting all of you in Bangalore! Be sure to drop by the Microsoft booth to say hello (and to snag a free pair of our famous socks). I’d love to learn more about how you’re using Postgres.186Views3likes0CommentsDistribute PostgreSQL 18 with Citus 14
The Citus 14.0 release is out and includes PostgreSQL 18 support! We know you've been waiting, and we've been hard at work adding features we believe will take your experience to the next level, focusing on bringing the Postgres 18 exciting improvements to you at distributed scale. The Citus database is an open-source extension of Postgres that brings the power of Postgres to any scale, from a single node to a distributed database cluster. Since Citus is an extension, using Citus means you're also using Postgres, giving you direct access to the Postgres features. And the latest of such features came with Postgres 18 release! PostgreSQL 18 is a substantial release: asynchronous I/O (AIO), skip-scan for multicolumn B-tree indexes, uuidv7(), virtual generated columns by default, OAuth authentication, RETURNING OLD/NEW, and temporal constraints. For those of you who are interested in upgrading to Postgres 18 and scaling these new features of Postgres: you can upgrade to Citus 14.0! Let's take a closer look at what's new in Citus 14.0. Postgres 18 support in Citus 14.0 Citus 14.0 introduces support for PostgreSQL 18. This means that just by enabling PG18 in Citus 14.0, all the query performance improvements directly reflect on the Citus distributed queries, and several optimizer improvements benefit queries in Citus out of the box! Among the many new features in PG 18, the following capabilities enabled in Citus 14.0 are especially noteworthy for Citus users. To learn more about how you can use Citus 14.0 + PostgreSQL 18, as well as currently unsupported features and future work, you can consult the Citus 14.0 Updates page, which gives you detailed release notes. PostgreSQL 18 highlights that benefit Citus clusters Because Citus is implemented as a Postgres extension, the following PG18 improvements benefit your distributed cluster automatically, no Citus-specific changes needed. Faster scans and maintenance via AIO Postgres 18 adds an asynchronous I/O subsystem that can improve sequential scans, bitmap heap scans, and vacuuming—workloads that show up constantly in shard-heavy distributed clusters. This means your Citus cluster can benefit from faster table scans and more efficient maintenance operations without any code changes. You can control the I/O method via the new io_method GUC: -- Check the current I/O method SHOW io_method; Better index usage with skip-scan Postgres 18 expands when multicolumn B-tree indexes can be used via skip scan, helping common multi-tenant schemas where predicates don't always constrain the leading index column. This is particularly valuable for Citus users with multi-tenant applications where queries often filter by non-leading columns. -- Multi-tenant index: (tenant_id, created_at) -- PG18 skip-scan lets this query use the index even without tenant_id SELECT * FROM events WHERE created_at > now() - interval '1 day' ORDER BY created_at DESC LIMIT 100; uuidv7() for time-ordered UUIDs Time-ordered UUIDs can reduce index churn and improve locality; Postgres 18 adds uuidv7(). This is especially useful for distributed tables where you want predictable ordering and better index performance across shards. -- Use uuidv7() as a time-ordered primary key CREATE TABLE events ( id uuid DEFAULT uuidv7() PRIMARY KEY, tenant_id bigint, payload jsonb ); SELECT create_distributed_table('events', 'tenant_id'); OAuth authentication support Postgres 18 adds OAuth authentication, making it easier to plug database auth into modern SSO flows often a practical requirement in multi-node deployments. This simplifies authentication management across your Citus coordinator and worker nodes. What Citus 14 adds for PostgreSQL 18 compatibility While the highlights above work out of the box, PG18 also introduces new SQL syntax and behavior changes that require Citus-specific work parsing/deparsing, DDL propagation across coordinator + workers, and distributed execution correctness. Here's what we built to make these work end-to-end. JSON_TABLE() COLUMNS PG18 expands SQL/JSON JSON_TABLE() with a richer COLUMNS clause, making it easy to extract multiple fields from JSON documents in a single, typed table expression. Citus 14 ensures the syntax can be parsed/deparsed and executed consistently in distributed queries. CREATE TABLE pg18_json_test (id serial PRIMARY KEY, data JSON); SELECT jt.name, jt.age FROM pg18_json_test, JSON_TABLE( data, '$.user' COLUMNS ( age INT PATH '$.age', name TEXT PATH '$.name' ) ) AS jt WHERE jt.age BETWEEN 25 AND 35 ORDER BY jt.age, jt.name; Temporal constraints Postgres 18 adds temporal constraint syntax that Citus must propagate and preserve correctly: WITHOUT OVERLAPS for PRIMARY KEY / UNIQUE PERIOD for FOREIGN KEY CREATE TABLE temporal_rng ( id int4range, valid_at daterange, CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS) ); SELECT create_distributed_table('temporal_rng', 'id'); CREATE FOREIGN TABLE ... LIKE Postgres 18 supports CREATE FOREIGN TABLE ... LIKE, letting you define a foreign table by copying the column layout (and optionally defaults/constraints/indexes) from an existing table. Citus 14 includes coverage so FDW workflows remain compatible in distributed environments. -- Copy column layout from an existing table CREATE FOREIGN TABLE my_ft (LIKE my_local_table EXCLUDING ALL) SERVER foreign_server OPTIONS (schema_name 'public', table_name 'my_local_table'); Generated columns (Virtual by Default) PostgreSQL 18 changes generated column behavior significantly: Virtual by default: Generated columns are now computed on read rather than stored, reducing write amplification Logical replication support: New publish_generated_columns publication option for replicating generated values CREATE TABLE events ( id bigint, payload jsonb, payload_hash text GENERATED ALWAYS AS (md5(payload::text)) VIRTUAL ); SELECT create_distributed_table('events', 'id'); VACUUM/ANALYZE ONLY semantics Postgres 18 introduces ONLY for VACUUM and ANALYZE so you can explicitly target only the parent of a partitioned/inheritance tree without automatically processing children. Citus 14 adapts distributed utility-command behavior so ONLY works as intended. -- Parent-only: do not recurse into partitions/children VACUUM (ANALYZE) ONLY metrics; ANALYZE ONLY metrics; Constraints: NOT ENFORCED + partitioned-table additions Postgres 18 expands constraint syntax in several ways that Citus must parse/deparse and propagate across coordinator + workers: CHECK constraints can be marked NOT ENFORCED FOREIGN KEY constraints can be marked NOT ENFORCED NOT VALID foreign keys on partitioned tables DROP CONSTRAINT ONLY on partitioned tables ALTER TABLE orders ADD CONSTRAINT orders_amount_positive CHECK (amount > 0) NOT ENFORCED; ALTER TABLE orders ADD CONSTRAINT orders_customer_fk FOREIGN KEY (customer_id) REFERENCES customers(id) NOT ENFORCED; DML: RETURNING OLD/NEW Postgres 18 lets RETURNING reference both the previous (old) and new (new) row values in INSERT/UPDATE/DELETE/MERGE. Citus 14 preserves these semantics in distributed execution. UPDATE t SET v = v + 1 WHERE id = 42 RETURNING old.v AS old_v, new.v AS new_v; COPY expansions PG18 adds two useful COPY improvements that Citus 14 supports in distributed queries: COPY ... REJECT_LIMIT: set a threshold for how many rows can be rejected before the COPY fails, useful for resilient bulk loading into sharded tables COPY table TO from materialized views: export data directly from materialized views -- Tolerate up to 10 bad rows during bulk load COPY my_distributed_table FROM '/data/import.csv' WITH (FORMAT csv, REJECT_LIMIT 10); MIN()/MAX() on arrays and composite types PG18 extends MIN() and MAX() aggregates to work on arrays and composite types. Citus 14 ensures these aggregates work correctly in distributed queries. CREATE TABLE sensor_data ( tenant_id bigint, readings int[] ); SELECT create_distributed_table('sensor_data', 'tenant_id'); -- Now works with array columns SELECT MIN(readings), MAX(readings) FROM sensor_data; Nondeterministic collations PG18 extends LIKE and text-position search functions to work with nondeterministic collations. Citus 14 verifies these work correctly across distributed queries. sslkeylogfile connection parameter PG18 adds the sslkeylogfile libpq connection parameter for dumping SSL key material, useful for debugging encrypted connections. Citus 14 allows configuring this via citus.node_conn_info so it works across inter-node connections. Planner fix: enable_self_join_elimination PG18 introduces the enable_self_join_elimination planner optimization. Citus 14 ensures this works correctly for joins between distributed and local tables, avoiding wrong results that could occur in early PG18 integration. Utility/Ops plumbing and observability Citus 14 adapts to PG18 interface/output changes that affect tooling and extension plumbing: New GUC file_copy_method for CREATE DATABASE ... STRATEGY=FILE_COPY EXPLAIN (WAL) adds a "WAL buffers full" field; Citus propagates it through distributed EXPLAIN output New extension macro PG_MODULE_MAGIC_EXT so extensions can report name/version metadata New libpq parameter sslkeylogfile support via citus.node_conn_info Diving deeper into Citus 14.0 and distributed Postgres To learn more about Citus 14.0, you can: Check out the 14.0 Updates page to get the detailed release notes. As of this release, Citus documentation is now hosted on Microsoft Learn. With Citus 14, elastic clusters will soon support PostgreSQL 18, now available in Azure Database for PostgreSQL. You can stay connected on the Citus Slack and visit the Citus open source GitHub repo to see recent developments as well. If there's something you'd like to see next in Citus, feel free to also open a feature request issue :)355Views6likes0CommentsJanuary 2026 Recap: Azure Database for PostgreSQL
We just dropped the 𝗝𝗮𝗻𝘂𝗮𝗿𝘆 𝟮𝟬𝟮𝟲 𝗿𝗲𝗰𝗮𝗽 for Azure Database for PostgreSQL and this one’s all about developer velocity, resiliency, and production-ready upgrades. January 2026 Recap: Azure Database for PostgreSQL • PostgreSQL 18 support via Terraform (create + upgrade) • Premium SSD v2 (Preview) with HA, replicas, Geo-DR & MVU • Latest PostgreSQL minor version releases • Ansible module GA with latest REST API features • Zone-redundant HA now configurable via Azure CLI • SDKs GA (Go, Java, JS, .NET, Python) on stable APIs Read the full January 2026 recap here and see what’s new (and what’s coming) - January 2026 Recap: Azure Database for PostgreSQLSupporting ChatGPT on PostgreSQL in Azure
Affan Dar, Vice President of Engineering, PostgreSQL at Microsoft Adam Prout, Partner Architect, PostgreSQL at Microsoft Panagiotis Antonopoulos, Distinguished Engineer, PostgreSQL at Microsoft The OpenAI engineering team recently published a blog post describing how they scaled their databases by 10x over the past year, to support 800 million monthly users. To do so, OpenAI relied on Azure Database for PostgreSQL to support important services like ChatGPT and the Developer API. Collaborating with a customer experiencing rapid user growth has been a remarkable journey. One key observation is that PostgreSQL works out of box for very large-scale points. As many in the public domain have noted, ChatGPT grew to 800M+ users before OpenAI started moving new and shardable workloads to Azure Cosmos DB. Nevertheless, supporting the growth of one of the largest Postgres deployments was a great learning experience for both of our teams. Our OpenAI friends did an incredible job at reacting fast and adjusting their systems to handle the growth. Similarly, the Postgres team at Azure worked to further tune the service to support the increasing OpenAI workload. The changes we made were not limited to OpenAI, hence all our Azure Database for PostgreSQL customers with demanding workloads have benefited. A few of the enhancements and the work that led to these are listed below. Changing the network congestion protocol to reduce replication lag Azure Database for PostgreSQL used the default CUBIC congestion control algorithm for replication traffic to replicas both within and outside the region. Leading up to one of the OpenAI launch events, we observed that several geo-distributed read replicas occasionally experienced replication lag. Replication from the primary server to the read replicas would typically operate without issues; however, at times, the replicas would unexpectedly begin falling behind the primary for reasons that were not immediately clear. This lag would not recover on its own and would grow to a point when, eventually, automation would restart the read replica. Once restarted, the read replica would once again catch up, only to repeat this cycle again within a day or less. After an extensive debugging effort, we traced the root cause to how the TCP congestion control algorithm handled a higher rate of packet drops. These drops were largely a result of high point-to-point traffic between the primary server and its replicas, compounded by the existing TCP window settings. Packet drops across regions are not unexpected; however, the default congestion control algorithm (CUBIC) treats packet loss as a sign of congestion and does an aggressive backoff. In comparison, the Bottleneck Bandwidth and Round-trip propagation time (BBR) congestion control algorithm is less sensitive to packet drops. Switching to BBR, adding SKU specific TCP window settings, and switching to fair queuing network discipline (which can control pacing of outgoing packets at hardware level) resolved this issue. We’ll also note that one of our seasoned PostgreSQL committers provided invaluable insights during this process, helping us pinpoint the issue more effectively. Scaling out with Read replicas PostgreSQL primaries, if configured properly, work amazingly well in supporting a large number of read replicas. In fact, as noted in the OpenAI engineering blog, a single primary has been able to power around 50+ replicas across multiple regions. However, going beyond this increases the chance of impacting the primary. For this reason, we added the cascading replica support to scale out reads even further. But this brings in a number of additional failure modes that need to be handled. The system must carefully orchestrate repairs around lagging and failing intermediary nodes, safely repointing replicas to new intermediary nodes while performing catch up or rewind in a mission critical setup. Furthermore, disaster recovery (DR) scenarios can require a fast rebuild of a replica and as data movement across regions is a costly and time-consuming operation, we developed the ability to create a geo replica from a snapshot of another replica in the same region. This feature avoids the traditional full data copy process, which may take hours or even days depending on the size of the data, by leveraging data for that cluster that already exists in that region. This feature will soon be available for all our customers as well. Scaling out Writes These improvements solved the read replica lag problems and read scale but did not help address the growing write scale for OpenAI. At some point, the balance tipped and it was obvious that the IOPs limits of a single PostgreSQL primary instance will not cut it anymore. As a result OpenAI decided to move new and shardable workloads to Azure Azure Cosmos DB, which is our default recommended NoSQL store for fully elastic workloads. However, some workloads, as noted in the OpenAI blog are much harder to shard. While OpenAI is using Azure Database for PostgreSQL flexible server, several of the write scaling requirements that came up have been baked into our new Azure HorizonDB offering, which entered private preview in November 2025. Some of the architectural innovations are described in the following sections. Azure HorizonDB scalability design To better support more demanding workloads, Azure HorizonDB introduces a new storage layer for Postgres that delivers significant performance and reliability enhancements: More efficient read scale out. Postgres read replicas no longer need to maintain their own copy of the data. They can read pages from the single copy maintained by the storage layer. Lower latency Write-Ahead Logging (WAL) writes and higher throughput page reads via two purpose-built storage services designed for WAL storage and Page storage. Durability and high availability responsibilities are shifted from the Postgres primary to the storage layer, allowing Postgres to dedicate more resources to executing transactions and queries. Postgres failovers are faster and more reliable. To understand how Azure HorizonDB delivers these capabilities, let’s look at its high‑level architecture as shown in Figure 1. It follows a log-centric storage model, where the PostgreSQL writeahead log (WAL) is the sole mechanism used to durably persist changes to storage. PostgreSQL compute nodes never write data pages to storage directly in Azure HorizonDB. Instead, pages and other on-disk structures are treated as derived state and are reconstructed and updated from WAL records by the data storage fleet. Azure HorizonDB storage uses two separate storage services for WAL and data pages. This separation allows each to be designed and optimized for the very different patterns of reads and writes PostgreSQL does against WAL files in contrast to data pages. The WAL server is optimized for very low latency writes to the tail of a sequential WAL stream and the Page server is designed for random reads and writes across potentially many terabytes of pages. These two separate services work together to enable Postgres to handle IO intensive OLTP workloads like OpenAI’s. The WAL server can durably write a transaction across 3 availability zones using a single network hop. The typical PostgreSQL replication setup with a hot standby (Figure 2) requires 4 hops to do the same work. Each hop is a component that can potentially fail or slow down and delay a commit. Azure HorizonDB page service can scale out page reads to many hundreds of thousands of IOPs for each Postgres instance. It does this by sharding the data in Postgres data files across a fleet of page servers. This spreads the reads across many high performance NVMe disks on each page server. 2 - WAL Writes in HorizonDB Another key design principle for Azure HorizonDB was to move durability and high availability related work off PostgreSQL compute allowing it to operate as a stateless compute engine for queries and transactions. This approach gives Postgres more CPU, disk and network to run your application’s business logic. Table 1 summarizes the different tasks that community PostgreSQL has to do, which Azure HorizonDB moves to its storage layer. Work like dirty page writing and checkpointing are no longer done by a Postgres primary. The work for sending WAL files to read replicas is also moved off the primary and into the storage layer – having many read replicas puts no load on the Postgres primary in Azure HorizonDB. Backups are handled by Azure Storage via snapshots, Postgres isn’t involved. Task Resource Savings Postgres Process Moved WAL sending to Postgres replicas Disk IO, Network IO Walsender WAL archiving to blob storage Disk IO, Network IO Archiver WAL filtering CPU, Network IO Shared Storage Specific (*) Dirty Page Writing Disk IO background writer Checkpointing Disk IO checkpointer PostgreSQL WAL recovery Disk IO, CPU startup recovering PostgreSQL read replica redo Disk IO, CPU startup recovering PostgreSQL read replica shared storage Disk IO background, checkpointer Backups Disk IO pg_dump, pg_basebackup, pg_backup_start, pg_backup_stop Full page writes Disk IO Backends doing WAL writing Hot standby feedback Vacuum accuracy walreceiver Table 1 - Summary of work that the Azure HorizonDB storage layer takes over from PostgreSQL The shared storage architecture of Azure HorizonDB is the fundamental building block for delivering exceptional read scalability and elasticity which are critical for many workloads. Users can spin up read replicas instantly without requiring any data copies. Page Servers are able to scale and serve requests from all replicas without any additional storage costs. Since WAL replication is entirely handled by the storage service, the primary’s performance is not impacted as the number of replicas changes. Each read replica can scale independently to serve different workloads, allowing for workload isolation. Finally, this architecture allows Azure HorizonDB to substantially improve the overall experience around high availability (HA). HA replicas can now be added without any data copying or storage costs. Since the data is shared between the replicas and continuously updated by Page Servers, secondary replicas only replay a portion of the WAL and can easily keep up with the primary, reducing failover times. The shared storage also guarantees that there is a single source of truth and the old primary never diverges after a failover. This prevents the need for expensive reconciliation, using pg_rewind, or other techniques and further improves availability. Azure HorizonDB was designed from the ground up with learnings from large scale customers, to meet the requirements of the most demanding workloads. The improved performance, scalability and availability of the Azure HorizonDB architecture make Azure a great destination for Postgres workloads.3.5KViews11likes0CommentsFrom Oracle to Azure: How Quadrant Technologies accelerates migrations
This blog was authored by Manikyam Thukkapuram, Director, Alliances & Engineering at Quadrant Technologies; and Thiwagar Bhalaji, Migration Engineer and DevOps Architect at Quadrant Technologies Over the past 20+ years, Quadrant Technologies has accelerated database modernization for hundreds of organizations. As momentum to the cloud continues to grow, a major focus for our business has been migrating on-premises Oracle databases to Azure. We’ve found that landing customers in Azure Database for PostgreSQL has been the best option both in terms of cost savings and efficiency. Azure Migrate is by far the best way to get them there. With Azure Migrate, we’re able to streamline migrations that traditionally took months, into weeks. As a Microsoft solutions partner, we help customers migrate to Azure and develop Azure-based solutions. We’re known as “the great modernization specialists” because many of our customers come to us with complex legacy footprints, outdated infrastructure, and monolithic applications that can be challenging to move to the cloud. But we excel at untangling these complex environments. And with our Q-Migrator tool, which is a wrapper around Azure Migrate, we’re able to automate and accelerate these kinds of migrations. Manual steps slowed down timelines In general, each migration we lead includes a discovery phase, a compatibility assessment, and the migration execution. In discovery, we identify every server, database, and application in a customer’s environment and map their interactions. Next, we assess each asset’s readiness for Azure and plan for optimal cloud configurations. Finally, we bring the plan to life, integrating applications, moving workloads, and validating performance. Before adopting Azure Migrate, each of these phases involved manual tasks for our team. During our discovery process we manually collected inventory and wrote custom scripts to track server relationships and database dependencies. Our engineers also had to dig through configuration files and use third-party assessment tools for aspects like VM utilization and Oracle schema. When we mapped compatibility, we worked from static data to predict cost estimates and sizing, as opposed to operating from real-time telemetry. By the time we reached the migration phase, fragmented tooling and inconsistent assessments made it difficult to maintain accuracy and efficiency. Hidden dependencies sometimes surfaced late in the process, causing unexpected rework and delays. Streamlining migrations with Azure Migrate To automate and streamline these manual tasks, we developed Q-Migrator, which is our in-house framework built around Azure Migrate. Now we can offer clients an efficient, agentless approach to discovery, assessment, and migration. As part of our on-premises database migration initiatives, we rely on Azure Migrate to seamlessly migrate a wide range of structured databases (including MySQL, Microsoft SQL Server, PostgreSQL, and Oracle) from on-premises environments to Azure IaaS and PaaS. For instance, for an on-premises PostgreSQL migration, we begin by setting up an Azure Migrate appliance in the client’s environment to automatically discover servers, databases, and applications. That generates a complete inventory and dependency map that identifies every relationship between servers and databases. From there, we run an assessment through Azure Migrate to check compatibility, identify blockers, and right-size target environments for Azure Database for PostgreSQL. By integrating Azure Database Migration Service (DMS), we can replicate data continuously until cutover, ensuring near-zero downtime. In addition, Azure DMS provides robust telemetry and analytics for deep visibility into every stage of the process. This unified and automated workflow not only replaces manual steps but also increases reliability and accelerates delivery. Teams benefit from a consolidated dashboard for planning, execution, and performance tracking, driving efficiency throughout the migration lifecycle. 75% faster deployment, 60% cost savings Since implementing Azure Migrate, which now facilitates discovery and assessment for on-premises PostgreSQL workloads, we’ve accelerated deployment by 75% compared to traditional migration methods. We’ve also reduced costs for our clients by up to 60 percent. Automated discovery alone reduces that phase by nearly 40%, and dependency mapping now takes a fraction of the effort. With the integrated dashboard in Azure Migrate we can also track progress across discovery, assessment, and migration in one place. This eliminates the need for multiple third-party tools. These efficiencies allow us to deliver complex migrations on tighter timelines without sacrificing quality or reliability. Rounding out the modernization journey with AKS As “the great modernization specialists,” we’re often asked which is the best database for landing Oracle workloads in the cloud. From our experience, Azure Database for PostgreSQL is ideal for enterprises seeking cost-efficient and secure PostgreSQL deployments. Its managed services reduce operational overhead while maintaining high availability, compliance, and scalability. Plus, seamless integration with Azure AI services allows us to innovate for clients and keep them ahead of the curve. We also recognize that database migration is only the first step for many clients—modernizing the application layer delivers even greater scalability, security, and manageability. When clients come to Quadrant for a broader modernization strategy, we often use Azure Kubernetes Service (AKS) to containerize their applications and break monoliths into microservices. AKS delivers a cloud-native architecture alongside database modernization. This integration supports DevOps practices, simplifies deployments, and allows customers to take full advantage of elastic cloud infrastructure. More innovation to come Overall, Azure Migrate and Azure Database for PostgreSQL, Azure Database for MySQL, and Azure SQL Database have redefined how we deliver database modernization, and our close collaboration with Microsoft has made it possible. By engaging early with Microsoft, we can validate migration architectures and gain insights into best practices for high-performance and secure cloud deployments. Access to Microsoft experts helps us fine-tune our designs, optimize performance, and resolve complex issues quickly. We’re also investing in AI-driven automation using Azure OpenAI in Foundry Models to analyze migration data, optimize queries, and predict performance outcomes. These innovations allow us to deliver more intelligent, adaptive solutions tailored to each customer’s unique environment.330Views2likes0Comments