hyperscale
6 TopicsLessons Learned #540:Bulk Insert Throughput in Azure SQL Hyperscale with Partitioned Heap Tables
In this lesson learned, I would like to share an interesting scenario working on a service request where our customer was running a high-volume data load process in Azure SQL Database Hyperscale. The workload was based on a common pattern: Recreate a staging table. Load a large number of rows using bulk insert. The bulk insert showed unstable execution times and became the main area to investigate. The process was loading a very large number of rows into an Azure SQL Database Hyperscale database. The process used a staging table that was initially loaded as a heap. The main concern was the inconsistent execution time during the load process. Why Manually Adding Data Files Was Not the Right Direction In Azure SQL Database Hyperscale, the storage architecture is different from a traditional SQL Server deployment. The data layout and storage management are handled internally by the service. Because of this architecture, manually creating or pre-allocating multiple data files is not the same tuning option that we may consider in SQL Server on-premises or SQL Server running on Azure Virtual Machines. For this reason, the troubleshooting focus moved from manual file layout configuration to the actual workload pattern, waits, concurrency, batch size, and staging table design. What We Observed During the bulk insert phase, waits such as PAGELATCH_EX were observed. Since the staging table was loaded as a heap and the clustered primary key was created only after the bulk insert completed, OPTIMIZE_FOR_SEQUENTIAL_KEY was not directly applicable to the bulk insert phase. This changed the direction of the investigation. Instead of focusing on last-page insert contention on an existing clustered index, the analysis moved toward heap insert behavior, allocation contention, concurrency, batch size, and whether a different staging table design could help. First Recommendation: Start with Low-Impact Changes Before changing the table design, the first recommendation was to test the least intrusive changes: Reduce the number of concurrent bulk insert sessions. Increase the batch size, for example from 10,000 rows to 50,000 or 100,000 rows. Test TABLOCK on the dedicated heap staging table. The goal was to avoid assuming that more concurrency would always reduce the total execution time. In some high-volume load scenarios, excessive concurrency may increase contention and make the process less stable. The Interesting Design Option: Partitioned Heap Staging Table One of the most interesting design options was to evaluate a partitioned heap staging table. The idea is simple: instead of loading all rows into a non-partitioned heap staging table, the staging table can be created on the same partition scheme used by the target table, using the same partitioning column. This does not mean that a partitioned heap will always be faster. However, it can be a useful design option when: The bulk load phase is affected by allocation or latch contention. Concurrent load processes can naturally distribute rows across different partition ranges. The staging table is used only as an intermediate structure.Lessons Learned The main lessons from this scenario were: In Azure SQL Database Hyperscale, manually managing multiple data files is not the right tuning direction. PAGELATCH_EX during heap loading may point to concurrency or allocation-related contention. Reducing concurrency can sometimes improve total throughput. Larger batch sizes may provide better results than many small batches. TABLOCK on a dedicated heap staging table is a low-impact test worth evaluating. A partitioned heap staging table can be a valid second-phase design option when the load can be distributed across partition ranges. The best approach is to test small, measurable changes before introducing architectural redesigns. Final Thoughts A partitioned heap staging table can be a powerful option, but only when it is tested carefully and when the workload pattern can benefit from partition distribution.Unexpected PITR Charges from restorableDroppedDatabases After BC → Hyperscale Migration
Why This Behavior Is by Design When migrating an Azure SQL Database from Business Critical (BC) to Hyperscale using a manual cutover, some customers notice unexpected Point-in-Time Restore (PITR) backup storage charges appearing under the following resource: /Microsoft.Sql/servers/<server>/restorableDroppedDatabases/<database> At first glance, this can be confusing—especially when: No customer-initiated drop or delete was performed The database is online and healthy post-migration Test migrations may not have shown similar charges This post explains why this happens, why it is expected by design, and how these charges naturally expire. The Observed Scenario After a BC → Hyperscale manual cutover, customers may see PITR charges tied to: restorableDroppedDatabases/<database-name> Despite the database being active and available in Hyperscale, these charges start appearing immediately after the migration cutover and gradually decrease over time. Why Does the Database Appear as “Dropped”? During a manual cutover migration, Azure SQL performs an internal platform-driven workflow to complete the transition between architectures. From a control-plane perspective: The source Business Critical logical database is internally dropped This drop is not initiated by the customer It is a required system step to complete the Hyperscale migration Telemetry confirms that the migration workflow transitions through states such as: Internal drop of the source physical and logical database Cleanup of metadata and completion of the migration This entire sequence completes within seconds and is fully platform managed. Why Are Backup Charges Generated? Although the source BC database is internally dropped, its pre-migration PITR backups are still retained according to the configured backup retention period. Here’s the key point: Backups taken before upgrading to Hyperscale are retained and billed using the dropped-database backup billing model. Because the source database is now considered dropped (from the BC perspective): The 1× database-size discount no longer applies The full data file size is added to the billable backup size Charges appear under restorableDroppedDatabases This behavior is explicitly documented as expected in internal Azure SQL billing guidance. Why Do Charges Decrease Over Time? These charges are not permanent. They: Decrease daily Continue only while the pre-migration PITR backups are retained Automatically stop once the retention window expires In practical terms: Charges stop when: days_since_migration > configured_backup_retention_days No cleanup action is required from the customer—the platform handles this automatically. Why Didn’t Test Migrations Show Similar Charges? In many reported cases, test or smaller databases migrated using the same method did not generate noticeable charges. This can be explained by two documented optimizations: Backup size threshold – very small backup footprints are not charged Low activity optimization – inactive or low-change databases generate fewer snapshots As a result, smaller or lightly used test databases may fall below the billing threshold, while larger production databases do not. Is This a Billing Error or Credit Scenario? No. Although the operation is platform-driven: The behavior is by design The charges are for temporary retention of valid PITR backups They naturally expire based on retention Therefore, this scenario is not considered a billing defect and does not typically warrant credits. How Can Customers Reduce Charges Faster? If needed, customers can: Reduce the PITR backup retention period (minimum is 1 day) Wait up to 24 hours for billing to reflect the change This shortens how long the pre-migration backups are retained and billed. FAQ – restorableDroppedDatabases Charges After BC → Hyperscale Migration Q1: Why am I seeing PITR charges for restorableDroppedDatabases when my database is still online? A: During a Business Critical → Hyperscale manual cutover, Azure SQL internally drops the source BC database as part of the migration workflow. While the Hyperscale database is active and healthy, the pre‑migration BC backups are retained and billed under restorableDroppedDatabases. Q2: Did the customer initiate a drop or delete operation? A: No. This drop is platform‑driven and required to complete the migration. It is not initiated by the customer. Q3: What exactly is being billed? A: The charges are for Point‑in‑Time Restore (PITR) backups taken before the migration. These backups are retained according to the configured backup retention period and are billed using the dropped database billing model. Q4: Why does the cost appear higher than expected? A: Once a database is considered “dropped” (from the BC perspective), the 1× database-size discount no longer applies, and the full data file size is included in the billable backup size. Q5: Will these charges continue indefinitely? A: No. The charges decrease daily and automatically stop once the pre‑migration backups expire based on the configured PITR retention period. Q6: Why didn’t this happen with smaller or test databases? A: Smaller or low‑activity databases may fall below the backup billing threshold, or benefit from low‑activity snapshot optimizations, resulting in no visible charges. Q7: Is this a billing bug or credit-worthy scenario? A: No. This behavior is by design and expected. The charges reflect valid backup retention and do not typically qualify for credits. Q8: Can the customer reduce these charges sooner? A: Yes. The customer can reduce the PITR backup retention period (minimum 1 day). Billing changes usually reflect within up to 24 hours. Key Takeaways The behavior is expected and by design Charges come from pre-migration BC backups, not the active Hyperscale database The database was internally dropped as part of migration, not by the customer Charges decrease daily and stop automatically No action is required unless the customer wants to reduce retention early Final Note As of the time of writing, this behavior is not clearly described in public customer-facing documentation, which explains why it often appears unexpected. Awareness of this mechanism can help set correct expectations when planning BC → Hyperscale manual cutover migrations.Azure SQL Hyperscale: Understanding PITR Retention vs Azure Portal Restore UI
Overview Customers using Azure SQL Database – Hyperscale may sometimes notice a discrepancy between the configured Point-in-Time Restore (PITR) retention period and what the Azure Portal displays as available restore points. In some cases: PITR retention is configured (for example, 7 days), Yet the Azure Portal only shows restore points going back a shorter period (for example, 1–2 days), And the restore UI may allow selecting dates earlier than the configured retention window without immediately showing an error. This post explains why this happens, how to validate backup health, and what actions to take. Key Observation From investigation and internal validation, this behavior is not indicative of backup data loss. Instead, it is related to Azure Portal UI behavior, particularly for Hyperscale databases. The backups themselves continue to exist and are managed correctly by the service. Important Distinction: Portal UI vs Actual Backup State What the Azure Portal Shows The restore blade may show fewer restore points than expected. The date picker may allow selecting dates outside the PITR retention window. No immediate validation error may appear in the UI. What Actually Happens Backup retention is enforced at the service layer, not the portal. If a restore is attempted outside the valid PITR window, the operation will fail during execution, even if the UI allows selection. Hyperscale backup metadata is handled differently than General Purpose or Business Critical tiers. Why This Happens with Hyperscale There are a few important technical reasons: Hyperscale backup architecture differs Hyperscale uses a distributed storage and backup model optimized for scale and fast restore, which affects how metadata is surfaced. Some DMVs are not supported Views like sys.dm_database_backups, commonly used for backup visibility, do not support Hyperscale databases. Azure Portal relies on metadata projections The portal restore experience depends on backend projections that may lag or behave differently for Hyperscale, leading to UI inconsistencies. How to Validate Backup Health (Recommended) Instead of relying solely on the Azure Portal UI, use service-backed validation methods. Option 1: PowerShell – Earliest Restore Point You can confirm the earliest available restore point directly from the service: # Set your variables $resourceGroupName = "RG-xxx-xxx-1" $serverName = "sql-xxx-xxx-01" $databaseName = "database_Prod" # Get earliest restore point $db = Get-AzSqlDatabase -ResourceGroupName $resourceGroupName -ServerName $serverName -DatabaseName $databaseName $earliestRestore = $db.EarliestRestoreDate Write-Host "Earliest Restore Point: $earliestRestore" Write-Host "Days Available: $([math]::Round(((Get-Date) - $earliestRestore).TotalDays, 1)) days" This reflects the true PITR boundary enforced by Azure SQL. Option 2: Internal Telemetry / Backup Events (Engineering Validation) Internal monitoring confirms: Continuous backup events are present. Coverage aligns with configured PITR retention. Backup health remains ✅ Healthy even when the portal UI appears inconsistent. Key takeaway: Backup data is intact and retention is honored. Is There Any Risk of Data Loss? No. There is no evidence of backup loss or retention policy violation. This is a visual/UX issue, not a data protection issue. Recommended Actions For Customers ✅ Trust the configured PITR retention, not just the portal display. ✅ Use PowerShell or Azure CLI to validate restore boundaries. ❌ Do not assume backup loss based on portal UI alone. For Support / Engineering Capture a browser network trace when encountering UI inconsistencies. Raise an incident with the Azure Portal team for investigation and fix. Reference Hyperscale-specific behavior during troubleshooting. Summary Topic Status PITR retention enforcement ✅ Correct Backup data integrity ✅ Safe Azure Portal restore UI ⚠️ May be misleading Hyperscale backup visibility ✅ Validate via service tools Final Thoughts Azure SQL Hyperscale continues to provide robust, reliable backup and restore capabilities, even when the Azure Portal UI does not fully reflect the underlying state. When in doubt: Validate via service APIs Rely on enforcement logic, not UI hints Escalate portal inconsistencies appropriately146Views0likes0CommentsWhy PITR Restore for Azure SQL Hyperscale Can Take Longer Than Expected
Azure SQL Database Hyperscale is designed to deliver fast, storage‑optimized backup and restore operations. According to Microsoft documentation, most Point‑in‑Time Restore (PITR) operations for Hyperscale should complete within 10 minutes, regardless of database size, because the service uses metadata-based restore techniques rather than copying full data files. However, some real‑world scenarios can lead to unexpectedly long restore times, even when the source database is Hyperscale and even when no obvious configuration changes are made. This article explains one such scenario, outlines what happened, and provides guidance to avoid similar delays in the future. Expected Behavior for Hyperscale PITR Hyperscale databases use a unique architecture that separates compute and storage. Backups are taken from storage snapshots and do not require data copying during a typical PITR restore. From Microsoft Learn: “Most restores complete within minutes, even for large databases.” Ref: https://learn.microsoft.com/azure/azure-sql/database/hyperscale-automated-backups-overview?view=azuresql#backup-and-restore-performance This performance expectation applies as long as the Backup Storage Redundancy remains the same between the source DB and the target restore. Customer Scenario Overview A customer initiated PITR restore operations for Hyperscale DB: Source DB: Hyperscale (SLO: HS_PRMS_64) Target DB: Hyperscale (SLO: HS_PRMS_128) Same logical server Source DB Backup Storage Redundancy: Standard_RAGRS Customer enabled Zone Redundancy for the target database during restore The customer therefore expected the restore to finish within the normal Hyperscale window (~10 minutes). Instead, the restore took significantly longer. Why the Restore Took Longer Although the source database used Standard_RAGRS, enabling Zone Redundancy at restore time introduced a configuration change that affected the underlying Backup Storage Redundancy (BSR) for the newly restored database. 🔍 Key Point: Changing BSR Creates a Full "Size-of-Data" Restore When the target DB uses a different BSR type than the source DB, Azure SQL Database cannot perform a fast metadata-based restore. Instead, it must perform a full data copy, and the restore becomes proportional to the database size: More data → longer restore Effectively behaves like a physical data movement operation This overrides Hyperscale’s normally fast PITR workflow This behavior is documented here: https://learn.microsoft.com/azure/azure-sql/database/hyperscale-automated-backups-overview?view=azuresql#backup-and-restore-performance In the customer’s case: Customer-enabled Zone Redundancy changed the restore workflow. As a result, the system selected a backup storage redundancy configuration different than the source: Restore workflow chose: Standard_RAGZRS Source database actually used: Standard_RAGRS (non‑zone‑redundant) This mismatch triggered a size-of-data restore, leading to the observed delay. Summary of Root Cause ✔ Hyperscale PITR is fast only when BSR is unchanged ✔ Customer enabled Zone Redundant configuration during restore ✔ This resulted in a different Backup Storage Redundancy from the source ✔ Target restore had to perform a full data copy, not metadata-based restore ✔ Restore time scaled with database size → leading to long restore duration Key Takeaways 1. Do not change Backup Storage Redundancy during PITR unless necessary Any change (e.g., RAGRS → RAGZRS) converts the restore into a size‑of‑data operation. 2. Restores that involve cross‑region or cross‑redundancy conversions always take longer This applies equally to: PITR restore Restore to another server Restore with SLO changes Restore involving ZRS/RA‑GZRS transitions 3. Hyperscale PITR is extremely fast—when configuration is unchanged If the source and target BSR match, Hyperscale restores usually complete in minutes. 4. Enabling Zone Redundancy is valid, but do it after the restore If the customer wants ZRS for the restored DB: Perform PITR first (fast restore) Then update redundancy post‑restore (online operation) Conclusion While Hyperscale PITR restores are typically very fast, configuration changes during the restore—especially related to Backup Storage Redundancy—can trigger a full data copy and significantly increase restore duration. To get the best performance: Keep the same BSR when performing PITR Apply redundancy changes after the restore completes Use metadata-based restores whenever possible Understanding these nuances helps ensure predictable recovery times and aligns operational processes with Hyperscale’s architectural design.449Views0likes0CommentsLesson Learned #317: Update to service objective for Hub DBs
Today, we worked on a service request where our customer got the following error message scaling database error from General Purpose to Hyperscale "Error message: 'Update to service objective 'xyz' for Hub DBs' is not supported for entity 'DatabaseName'. Following, I would like to share with you our findings.1.9KViews1like0Comments