enterprise
247 TopicsSupporting ChatGPT on PostgreSQL in Azure
Affan Dar, Vice President of Engineering, PostgreSQL at Microsoft Adam Prout, Partner Architect, PostgreSQL at Microsoft Panagiotis Antonopoulos, Distinguished Engineer, PostgreSQL at Microsoft The OpenAI engineering team recently published a blog post describing how they scaled their databases by 10x over the past year, to support 800 million monthly users. To do so, OpenAI relied on Azure Database for PostgreSQL to support important services like ChatGPT and the Developer API. Collaborating with a customer experiencing rapid user growth has been a remarkable journey. One key observation is that PostgreSQL works out of box for very large-scale points. As many in the public domain have noted, ChatGPT grew to 800M+ users before OpenAI started moving new and shardable workloads to Azure Cosmos DB. Nevertheless, supporting the growth of one of the largest Postgres deployments was a great learning experience for both of our teams. Our OpenAI friends did an incredible job at reacting fast and adjusting their systems to handle the growth. Similarly, the Postgres team at Azure worked to further tune the service to support the increasing OpenAI workload. The changes we made were not limited to OpenAI, hence all our Azure Database for PostgreSQL customers with demanding workloads have benefited. A few of the enhancements and the work that led to these are listed below. Changing the network congestion protocol to reduce replication lag Azure Database for PostgreSQL used the default CUBIC congestion control algorithm for replication traffic to replicas both within and outside the region. Leading up to one of the OpenAI launch events, we observed that several geo-distributed read replicas occasionally experienced replication lag. Replication from the primary server to the read replicas would typically operate without issues; however, at times, the replicas would unexpectedly begin falling behind the primary for reasons that were not immediately clear. This lag would not recover on its own and would grow to a point when, eventually, automation would restart the read replica. Once restarted, the read replica would once again catch up, only to repeat this cycle again within a day or less. After an extensive debugging effort, we traced the root cause to how the TCP congestion control algorithm handled a higher rate of packet drops. These drops were largely a result of high point-to-point traffic between the primary server and its replicas, compounded by the existing TCP window settings. Packet drops across regions are not unexpected; however, the default congestion control algorithm (CUBIC) treats packet loss as a sign of congestion and does an aggressive backoff. In comparison, the Bottleneck Bandwidth and Round-trip propagation time (BBR) congestion control algorithm is less sensitive to packet drops. Switching to BBR, adding SKU specific TCP window settings, and switching to fair queuing network discipline (which can control pacing of outgoing packets at hardware level) resolved this issue. We’ll also note that one of our seasoned PostgreSQL committers provided invaluable insights during this process, helping us pinpoint the issue more effectively. Scaling out with Read replicas PostgreSQL primaries, if configured properly, work amazingly well in supporting a large number of read replicas. In fact, as noted in the OpenAI engineering blog, a single primary has been able to power around 50+ replicas across multiple regions. However, going beyond this increases the chance of impacting the primary. For this reason, we added the cascading replica support to scale out reads even further. But this brings in a number of additional failure modes that need to be handled. The system must carefully orchestrate repairs around lagging and failing intermediary nodes, safely repointing replicas to new intermediary nodes while performing catch up or rewind in a mission critical setup. Furthermore, disaster recovery (DR) scenarios can require a fast rebuild of a replica and as data movement across regions is a costly and time-consuming operation, we developed the ability to create a geo replica from a snapshot of another replica in the same region. This feature avoids the traditional full data copy process, which may take hours or even days depending on the size of the data, by leveraging data for that cluster that already exists in that region. This feature will soon be available for all our customers as well. Scaling out Writes These improvements solved the read replica lag problems and read scale but did not help address the growing write scale for OpenAI. At some point, the balance tipped and it was obvious that the IOPs limits of a single PostgreSQL primary instance will not cut it anymore. As a result OpenAI decided to move new and shardable workloads to Azure Azure Cosmos DB, which is our default recommended NoSQL store for fully elastic workloads. However, some workloads, as noted in the OpenAI blog are much harder to shard. While OpenAI is using Azure Database for PostgreSQL flexible server, several of the write scaling requirements that came up have been baked into our new Azure HorizonDB offering, which entered private preview in November 2025. Some of the architectural innovations are described in the following sections. Azure HorizonDB scalability design To better support more demanding workloads, Azure HorizonDB introduces a new storage layer for Postgres that delivers significant performance and reliability enhancements: More efficient read scale out. Postgres read replicas no longer need to maintain their own copy of the data. They can read pages from the single copy maintained by the storage layer. Lower latency Write-Ahead Logging (WAL) writes and higher throughput page reads via two purpose-built storage services designed for WAL storage and Page storage. Durability and high availability responsibilities are shifted from the Postgres primary to the storage layer, allowing Postgres to dedicate more resources to executing transactions and queries. Postgres failovers are faster and more reliable. To understand how Azure HorizonDB delivers these capabilities, let’s look at its high‑level architecture as shown in Figure 1. It follows a log-centric storage model, where the PostgreSQL writeahead log (WAL) is the sole mechanism used to durably persist changes to storage. PostgreSQL compute nodes never write data pages to storage directly in Azure HorizonDB. Instead, pages and other on-disk structures are treated as derived state and are reconstructed and updated from WAL records by the data storage fleet. Azure HorizonDB storage uses two separate storage services for WAL and data pages. This separation allows each to be designed and optimized for the very different patterns of reads and writes PostgreSQL does against WAL files in contrast to data pages. The WAL server is optimized for very low latency writes to the tail of a sequential WAL stream and the Page server is designed for random reads and writes across potentially many terabytes of pages. These two separate services work together to enable Postgres to handle IO intensive OLTP workloads like OpenAI’s. The WAL server can durably write a transaction across 3 availability zones using a single network hop. The typical PostgreSQL replication setup with a hot standby (Figure 2) requires 4 hops to do the same work. Each hop is a component that can potentially fail or slow down and delay a commit. Azure HorizonDB page service can scale out page reads to many hundreds of thousands of IOPs for each Postgres instance. It does this by sharding the data in Postgres data files across a fleet of page servers. This spreads the reads across many high performance NVMe disks on each page server. 2 - WAL Writes in HorizonDB Another key design principle for Azure HorizonDB was to move durability and high availability related work off PostgreSQL compute allowing it to operate as a stateless compute engine for queries and transactions. This approach gives Postgres more CPU, disk and network to run your application’s business logic. Table 1 summarizes the different tasks that community PostgreSQL has to do, which Azure HorizonDB moves to its storage layer. Work like dirty page writing and checkpointing are no longer done by a Postgres primary. The work for sending WAL files to read replicas is also moved off the primary and into the storage layer – having many read replicas puts no load on the Postgres primary in Azure HorizonDB. Backups are handled by Azure Storage via snapshots, Postgres isn’t involved. Task Resource Savings Postgres Process Moved WAL sending to Postgres replicas Disk IO, Network IO Walsender WAL archiving to blob storage Disk IO, Network IO Archiver WAL filtering CPU, Network IO Shared Storage Specific (*) Dirty Page Writing Disk IO background writer Checkpointing Disk IO checkpointer PostgreSQL WAL recovery Disk IO, CPU startup recovering PostgreSQL read replica redo Disk IO, CPU startup recovering PostgreSQL read replica shared storage Disk IO background, checkpointer Backups Disk IO pg_dump, pg_basebackup, pg_backup_start, pg_backup_stop Full page writes Disk IO Backends doing WAL writing Hot standby feedback Vacuum accuracy walreceiver Table 1 - Summary of work that the Azure HorizonDB storage layer takes over from PostgreSQL The shared storage architecture of Azure HorizonDB is the fundamental building block for delivering exceptional read scalability and elasticity which are critical for many workloads. Users can spin up read replicas instantly without requiring any data copies. Page Servers are able to scale and serve requests from all replicas without any additional storage costs. Since WAL replication is entirely handled by the storage service, the primary’s performance is not impacted as the number of replicas changes. Each read replica can scale independently to serve different workloads, allowing for workload isolation. Finally, this architecture allows Azure HorizonDB to substantially improve the overall experience around high availability (HA). HA replicas can now be added without any data copying or storage costs. Since the data is shared between the replicas and continuously updated by Page Servers, secondary replicas only replay a portion of the WAL and can easily keep up with the primary, reducing failover times. The shared storage also guarantees that there is a single source of truth and the old primary never diverges after a failover. This prevents the need for expensive reconciliation, using pg_rewind, or other techniques and further improves availability. Azure HorizonDB was designed from the ground up with learnings from large scale customers, to meet the requirements of the most demanding workloads. The improved performance, scalability and availability of the Azure HorizonDB architecture make Azure a great destination for Postgres workloads.1.9KViews11likes0CommentsAnnouncing Azure HorizonDB
Affan Dar, Vice President of Engineering, PostgreSQL at Microsoft Charles Feddersen, Partner Director of Program Management, PostgreSQL at Microsoft Today at Microsoft Ignite, we’re excited to unveil the preview of Azure HorizonDB, a fully managed Postgres-compatible database service designed to meet the needs of modern enterprise workloads. The cloud native architecture of Azure HorizonDB delivers highly scalable shared storage, elastic scale-out compute, and a tiered cache optimized for running cloud applications of any scale. Postgres is transforming industries worldwide and is emerging as the foundation of modern data solutions across all sectors at an unprecedented pace. For developers, it is the database of choice for building new applications with its rich set of extensions, open-source API, and expansive ecosystems of tools and libraries. At the same time, but at the opposite end of the workload spectrum, enterprises around the world are also increasingly turning to Postgres to modernize their existing applications. Azure HorizonDB is designed to support applications across the entire workload spectrum from the first line of code in a new app to the migration of large-scale, mission-critical solutions. Developers benefit from the robust Postgres ecosystem and seamless integration with Azure’s advanced AI capabilities, while enterprises can gain a secure, highly available, and performant cloud database to host their business applications. Whether you’re building from scratch or transforming legacy infrastructure, Azure HorizonDB empowers you to innovate and scale with confidence, today and into the future. Azure HorizonDB introduces new levels of performance and scalability to PostgreSQL. The scale-out compute architecture supports up to 3,072 vCores across primary and replica nodes, and the auto-scaling shared storage supports up to 128TB databases while providing sub-millisecond multi-zone commit latencies. This storage innovation enables Azure HorizonDB to deliver up to 3x more throughput when compared with open-source Postgres for transactional workloads. Azure HorizonDB is enterprise ready on day one. With native support for Entra ID, Private Endpoints, and data encryption, it provides compliance and security for sensitive data stored in the cloud. All data is replicated across availability zones by default and maintenance operations are transparent with near-zero downtime. Backups are fully automated, and integration with Azure Defender for Cloud provides additional protection for highly sensitive data. All up, Azure HorizonDB offers enterprise-grade security, compliance, and reliability, making it ready for business use today. Since the launch of ChatGPT, there has been an explosion of new AI apps being built, and Postgres has become the database of choice due in large part to its vector index support. Azure HorizonDB extends the AI capabilities of Postgres further with two key features. We are introducing advanced filtering capabilities to the DiskANN vector index which enable query predicate pushdowns directly into the vector similarity search. This provides significant performance and scalability improvements over pgvector HNSW while maintaining accuracy and is ideal for similarity search over transactional data in Postgres. The second feature is built-in AI model management that seamlessly integrates generative, embedding, and reranking models from Microsoft Foundry for developers to use in the database with zero configuration. In addition to enhanced vector indexing and simplified model management to build powerful new AI apps, we’re also pleased to announce the general availability of Microsoft’s PostgreSQL Extension for VS Code that provides the tooling for Postgres developers to maximize their productivity. Using this extension, GitHub Copilot is context aware of the Postgres database which means less prompting and higher quality answers, and in the Ignite release, we’ve added live monitoring with one-click GitHub Copilot debugging where Agent mode can launch directly from the performance monitoring dashboard to diagnose Postgres performance issues and guide users to a fix. Alpha Life Sciences are an existing Azure customers “I’m truly excited about how Azure HorizonDB empowers our AI development. Its seamless support for Vector DB, RAG, and Agentic AI allows us to build intelligent features directly on a reliable Postgres foundation. With Azure HorizonDB, I can focus on advancing AI capabilities instead of managing infrastructure complexities. It’s a smart, forward-looking solution that perfectly aligns with how we design and deliver AI-powered applications.” Pengcheng Xu, CTO Alpha Life Sciences For enterprises that are modernizing their applications to Postgres in the cloud, the security and availability of Azure HorizonDB make it an ideal platform. However, these migrations are often complex and time consuming for large legacy codebase conversions. To simplify this and reduce the risk, we’re pleased to announce the preview of GitHub Copilot powered Oracle migration built into the PostgreSQL Extension for VS Code. Built into VS Code, teams of engineers can work with GitHub Copilot to automate the end-to-end conversion of complex database code using rich code editing, version control, text authoring, and deployment in an integrated development environment. Azure HorizonDB is the next generation of fully managed, cloud native PostgreSQL database service. Built on the latest Azure infrastructure with state-of-the-art cloud architecture, Azure HorizonDB is ready to for the most demanding application workloads. In addition to our portfolio of managed Postgres services in Azure, Microsoft is deeply invested into the open source Postgres project and is one of the top corporate upstream contributors and sponsors for the PostgreSQL project, with 19 Postgres project contributors employed by Microsoft. As a hyperscale Postgres vendor, it’s critical to actively participate in the open-source project. It enables us to better support our customers down to the metal in Azure, and to contribute our learnings from running Postgres at scale back to the community. We’re committed to continuing our investment to push the Postgres project forward, and the team is already active in making contributions to Postgres 19 to be released in 2026. Ready to explore Azure HorizonDB? Azure HorizonDB is initially available in Central US, West US3, UK South and Australia East regions. Customers are invited to apply for early preview access to Azure HorizonDB and get hands-on experience with this new service. Participation is limited, apply now at aka.ms/PreviewHorizonDBMap only local drives and default printer from clients computer when logging into 365 Desktop?
Hello, I have gone into Intune and created a new config profile and have set Windows Components > Remote Desktop Services > Remote Desktop Session Host > Printer Redirection \ Device and resource redirection to let the users map drives and printers that are on their laptop into Windows 365 Desktop. However, how can we set it so that: 1. The only printers that are mapped to the 365 desktops from the client's device is the clients default printer and not any network printers that are installed on the laptop. 2. The only drives it maps into 365Desktop are the clients local drives like their SSD drive, and usb drives pluged in and not any network drives that are on the laptop.113Views0likes1CommentWindows 365 Enterprise Cloud PC Connection Fails - VM Unavailable (Code 10012)
We are facing a critical and persistent connection failure for a Windows 365 Enterprise Cloud PC that appears to be stuck in a state where the VM is not available to the RDP client. Provisioning Policy Configuration: - Cloud PC Type: Windows 365 Enterprise - Experience: Access a full Cloud PC desktop - Use Microsoft Entra single sign-on: Yes - Join type: Microsoft Entra Join - Geography: Canada - Region: Automatic (Recommended) - Network: Microsoft hosted network - Current MDM -Microsoft Intune Checked logs and found that the RDP client connection attempts consistently failing with same error, Disconnected: reason = 10012 [Telemetry :: Event] Type: RDPClient Details: DisconnectReason Subdetails: SessionHostResourceNotAvailable Code: 10012 Troubleshooting steps taken so far: - Restarted the Cloud PC. - Initiated a Reprovision action. - Tried web version but that didn't help either. Since simple restarts and reprovisions have failed to resolve the SessionHostResourceNotAvailable (10012) error, the current VM instance is unusable. Any guidance on resolving this definitive Code 10012 error is highly appreciated.264Views0likes0CommentsSave the date: Windows 365 AMA - What’s new from Microsoft Ignite
Tune in on December 3 for a special Windows 365 AMA. Catch up on the latest capabilities for Windows 365 announced at Microsoft Ignite! Host Christian Montoya and members of the product team will answer your questions live and offer insights to help you configure, deploy, and manage Windows in the cloud with ease. Save the date and post your questions early at aka.ms/Windows365AMA!128Views0likes1CommentExpanded TURN relay regions for Windows 365 and Azure Virtual Desktop
We’re excited to share that the rollout of expanded TURN relay regions for Windows 365 and Azure Virtual Desktop is now complete. TURN relay is available in all regions listed below. This new range—51.5.0.0/16—enhances RDP Shortpath connectivity and delivers faster, more reliable performance for Azure Virtual Desktop and Windows 365 users in 39 regions worldwide. What is TURN? TURN (Traversal Using Relays around NAT) enables devices behind firewalls to establish reliable UDP connections. With RDP Shortpath for public networks, TURN acts as a fallback when a direct UDP-based connection isn’t possible—ensuring low-latency, high-reliability remote desktop sessions. This new TURN relay range is part of the ‘WindowsVirtualDesktop’ service tag in Azure, making it easier for you to manage access and security configurations at scale. Benefits of the new TURN relay This change isn’t just a technical update—it’s a regional expansion. We’re scaling from 14 to 39 regions globally, bringing the TURN relay infrastructure closer to users, reducing latency, and improving connection reliability. Combined with a dedicated IP range for Azure Virtual Desktop and Windows 365 traffic, this initiative offers you more control, optimized routing, and a higher success rate for UDP-based communications. Here are the benefits in more detail: Expanding regional coverage By expanding from 14 to 39 regions globally, organizations will benefit from: Lower latency: Data travels shorter distances, resulting in faster connections and reduced lag. Improved reliability: Fewer dropped connections and more stable sessions, especially for real-time applications. Higher UDP success rates: Better performance for voice, video, and real-time data—even under variable network conditions. Dedicated IP Range for Azure Virtual Desktop and Windows 365 traffic This rollout introduces a dedicated IP range tailored for Azure Virtual Desktop and Windows 365 traffic, distinct from the ACS TURN relay. Benefits of this improvement include: Optimized traffic flow for Azure Virtual Desktop and Windows 365. Improved control over network security configurations. Customers can navigate restrictive security setups without compromising performance. Enhanced quality and speed for traffic, free from generic filtering Supported regions Here is a list of supported regions with the new TURN relay. A TURN relay is selected based on the physical endpoints, not the Cloud PC or session host. For example, a user physically located in the UK will use a relay in the UK South or the UK West regions. If the user is far from a supported region, the connection may fall back to TCP, potentially impacting performance. For example, a user physically located in the UK will use a relay in the UK South or the UK West regions. If the client is far from a supported region, the connection may fall back to TCP, potentially impacting performance. Accessible Your environment should have this subnet accessible from all networks used for Windows 365 or Azure Virtual Desktop connectivity, both on the physical network and cloud side. For Microsoft Hosted Network deployments in Windows 365 this underlying connectivity is already in place. For Azure Virtual Desktop and Windows 365 – Azure network connection ANC deployments, the ‘WindowsVirtualDesktop’ service tag contains this subnet so connectivity may already be in place. Optimized The subnet should also be optimized to ensure this critical, latency sensitive traffic has the most performant path available, this means: No TLS inspection on the traffic. This traffic is TLS encrypted transport with a nested TLS encrypted tunnel. TLS inspection yields no benefit but carries high risk of performance and reliability impact and puts significant additional load on the inspecting device. Locally egressed, meaning traffic is sent to Microsoft via the most direct and efficient path. In Azure this means directly routed onto Microsoft’ backbone and for customer side networks, directly to the internet where it will be picked up by Microsoft’s infrastructure locally. Bypassed from VPN, Proxy and Secure Web Gateway (SWG) tunnels and sent directly to the service as demonstrated in the example here. On the Cloud side this may involve using a User Defined Route (UDR) to send the Windows Virtual Desktop traffic direct to ‘internet’ instead of traversing a virtual firewall as can be seen in the example here. Learn more To learn more about RDP Shortpath and how to configure it for public networks, see our documentation on RDP Shortpath for Azure Virtual Desktop.8.8KViews1like4CommentsWindows 365 Watermarking - QR Codes Missing in Screenshots/Teams from Within Session?
Hi all, I've implemented watermarking on our Windows 365 setup using the official Microsoft guide, and I'm seeing behaviour that I'd like to confirm is expected. Current Situation: Watermarking is enabled and working (QR codes appear when I screenshot from my local client PC) However, when taking screenshots FROM WITHIN the Cloud PC session itself, no QR codes appear Similarly, when screen sharing via Teams from within the Cloud PC session, participants don't see the QR codes My Question: Is this the intended behaviour? Should QR codes only appear when capturing externally (from the client device) but not when capturing internally (from within the Windows 365 session itself)? I've read through the Microsoft documentation but can't find explicit clarification on whether internal screenshots should show watermarks or if the protection is specifically designed for external capture attempts. Can anyone confirm this behaviour or point me to official documentation that explains the internal vs external capture distinction? Thanks in advance!57Views0likes0CommentsEdge Dev/Canary enterprise extension sync broken?
I noticed on my test system that setting up a new profile in Dev/Canary sync doesn't work correctly - specifically, extensions do not install properly. In my stable channel build everything is working, however when it attempts to sync to Dev/Canary only the "forced" extension I have enabled via policy shows/works, where as the others are not in the extension list, but are present in the AppData user data folder. I've attempted to reset sync and have everything get pulled down again, but the issue persists. I've tried clearing out the profile and re-installing Dev/Canary to see if having it force re-sync would help, but it does not. TLDR - syncing extensions to dev/canary with enterprise sync breaks any extension that is not 'forced' to install via policySolved194Views0likes2Comments