staging
2 TopicsLessons Learned #540:Bulk Insert Throughput in Azure SQL Hyperscale with Partitioned Heap Tables
In this lesson learned, I would like to share an interesting scenario working on a service request where our customer was running a high-volume data load process in Azure SQL Database Hyperscale. The workload was based on a common pattern: Recreate a staging table. Load a large number of rows using bulk insert. The bulk insert showed unstable execution times and became the main area to investigate. The process was loading a very large number of rows into an Azure SQL Database Hyperscale database. The process used a staging table that was initially loaded as a heap. The main concern was the inconsistent execution time during the load process. Why Manually Adding Data Files Was Not the Right Direction In Azure SQL Database Hyperscale, the storage architecture is different from a traditional SQL Server deployment. The data layout and storage management are handled internally by the service. Because of this architecture, manually creating or pre-allocating multiple data files is not the same tuning option that we may consider in SQL Server on-premises or SQL Server running on Azure Virtual Machines. For this reason, the troubleshooting focus moved from manual file layout configuration to the actual workload pattern, waits, concurrency, batch size, and staging table design. What We Observed During the bulk insert phase, waits such as PAGELATCH_EX were observed. Since the staging table was loaded as a heap and the clustered primary key was created only after the bulk insert completed, OPTIMIZE_FOR_SEQUENTIAL_KEY was not directly applicable to the bulk insert phase. This changed the direction of the investigation. Instead of focusing on last-page insert contention on an existing clustered index, the analysis moved toward heap insert behavior, allocation contention, concurrency, batch size, and whether a different staging table design could help. First Recommendation: Start with Low-Impact Changes Before changing the table design, the first recommendation was to test the least intrusive changes: Reduce the number of concurrent bulk insert sessions. Increase the batch size, for example from 10,000 rows to 50,000 or 100,000 rows. Test TABLOCK on the dedicated heap staging table. The goal was to avoid assuming that more concurrency would always reduce the total execution time. In some high-volume load scenarios, excessive concurrency may increase contention and make the process less stable. The Interesting Design Option: Partitioned Heap Staging Table One of the most interesting design options was to evaluate a partitioned heap staging table. The idea is simple: instead of loading all rows into a non-partitioned heap staging table, the staging table can be created on the same partition scheme used by the target table, using the same partitioning column. This does not mean that a partitioned heap will always be faster. However, it can be a useful design option when: The bulk load phase is affected by allocation or latch contention. Concurrent load processes can naturally distribute rows across different partition ranges. The staging table is used only as an intermediate structure.Lessons Learned The main lessons from this scenario were: In Azure SQL Database Hyperscale, manually managing multiple data files is not the right tuning direction. PAGELATCH_EX during heap loading may point to concurrency or allocation-related contention. Reducing concurrency can sometimes improve total throughput. Larger batch sizes may provide better results than many small batches. TABLOCK on a dedicated heap staging table is a low-impact test worth evaluating. A partitioned heap staging table can be a valid second-phase design option when the load can be distributed across partition ranges. The best approach is to test small, measurable changes before introducing architectural redesigns. Final Thoughts A partitioned heap staging table can be a powerful option, but only when it is tested carefully and when the workload pattern can benefit from partition distribution.Scenario: New AADconnect server in new Forest - All mailboxes in EXO O365
Hi all. A challenging one, even though I've done dozens are of similar complex migrations, but this one is slightly different… Scenario: Customer has moved IT suppliers. Existing environment is: AADConnect server syncing all users from Forest A into Azure. All mailboxes have been migrated to Exchange Online (approx. 500 mailboxes) All Distribution Groups are created and managed in Office 365 Source of Authority of users is Forest A's Active Directory All new mailboxes are created direct in Exchange Online by assigning a license MX Records and Autodiscover DNS records all point to Exchange Online (autodiscover.outlook.com) There is no mail routing to/from on-premises Exchange environment, although Hybrid is still available (not yet decommissioned by current supplier). The current Exchange environment is only being used for any Administration/Management purposes only (as opposed to using ADSIEdit etc.) It is worth mentioning that although all AD user objects are in Forest A, the Exchange hybrid server and the AADConnect server are in Forest B (they were linked mailboxes before they got migrated to Exchange Online (If that is relevant)) There is a non-transitive AD Trust between Forest A and Forest B End Target goal: All mailboxes to remain in Exchange Online MX Records and Autodiscover to remain pointing to Exchange Online Source of Authority to be Forest C Active Directory instead of Forest A (by this, we will have soft or hard match immutable ID for set-msoluser) - not an issue When new users join the business, they're given a new AD account in Forest C, which is synced via AADConnect. Their mailbox is created in Exchange Online New environment: New Active Directory Forest created (Forest C) Instead of migrating the AD user objects from Forest A to C, brand new AD user objects have been created in Forest C There is a two-way non-transitive trust between Forest A ('old' AD User objects) and Forest C ('new AD User objects) Robocopy has been used to copy over any file shares etc. Users are given new laptops in the new Forest C. They log in using their new Forest C credentials. Outlook clients are manually created in Forest C pointing Autodiscover to autodiscover.outlook.com We have now installed AADConnect in Forest C as Staging Mode. We have added Forest C (new users) OU's in Scope to sync Old supplier has added their Forest A credentials into our Azure AD Connect server and added OU's in Scope to sync We have now also installed Exchange 2016 server to extend schema for mail attributes Straight after installation, configured ServiceConnectionPoint to $null to ensure Outlook clients that are internal on network do not query Active Directory for Autodiscover (although they shouldn't do because their Outlook profiles are manually created to point to Exchange Online for Autodiscover) As it stands, we have not yet ran the Hybrid Configuration Wizard (and I'm looking for the best way to achieve our end target goal without adding complexities of configuring HCW Plan of action is: As the new AADconnect server in Forest C has the existing synced Forest A users and the new Forest C users in scope to sync, we will make the new AADConnect server as the primary AADconnect server (by asking the old supplier to enable their AADconnect server to Staging Mode) We will then set-msoluser accounts of all users to $null and then set to the immutable ID's of the Forest C synced AD Users Identity - Query: Forest A and Forest C both have the same domain suffix of externaldomain.com - this may cause issues with 2 Forests syncing to the same Azure AD? Or will there be no issue with this? Externaldomain.com is the UPN which a user logs in, which also matches their primarysmtpaddress (as per common practice) - the existing AD users from Forest A use externaldomain.com as per their UPN.. And Forest C accounts will also use this too. So we're thinking of keeping the Forest C users with their @domain.LOCAL account so by default they pick up a @tenantname.onmicrosoft.com UPN which we can manually change on a per user basis? Exchange - Query: We have recently only installed Exchange into the Forest C environment. We have not performed any sort of Hybrid config, because we believe we don't need a hybrid organizational relationship between new Exchange and O365, as all mailboxes are in O365, no mail is flowing via Exchange etc. As it stands, the on-premise Exchange environment doesn't know anything about the O365 tenant - ECP is empty; as we now have Exchange in the Forest, what would you guys suggest we configure for Exchange on-premise to see the O365 mailboxes in on-premise ECP (as maybe O365 mailboxes or as Contacts) without having to run a Full HCW? AFAIK a minimal Hybrid config is not recommended because it's for much smaller organisations who don't use AADC, so their SoA is not Active Directory. We also don't want to go down the route of procuring a new cert, having external FW and DNS entries implemented, if we won't use any of the hybrid features. We just want to be able to manage O365 mailboxes I guess, although the mailboxes hosted in O365 are using their AD accounts synced from Forest A, and although Hybrid is not being used, the user accounts still much have links to some sort of settings from Forest A's ADSIedit environment, such as the Email Address Policy and Accepted Domains list that is in the Exchange Organization? One of the big hurdles will be the fact that we're using the same externaldomain.com on-premises across both Exchange forests. Maybe we need to take a calculated risk of this? Any thoughts or questions for more clarity? It's a good one (challenging!) but will be good to get your ideas and concerns to anything that I haven't considered? Thanks in advance! Ron P.S - apologies if I have posted this in more than one forum.1.8KViews0likes1Comment