We are thrilled to announce that the MERGE T-SQL command for Azure Synapse Dedicated SQL pools is now Generally Available! MERGE has been a highly requested addition to the Synapse T-SQL library that encapsulates INSERTs/UPDATEs/DELETEs into a single statement, drastically easing migrations and making table synchronization a breeze.
A key scenario common to many data warehousing workloads is updating Slowly Changing Dimension tables. Previously, in Synapse Dedicated SQL pools, one table could only be synced with another through manual INSERT / UPDATE / DELETE operations. In addition, migrations from other codebases required a breakdown of each individual MERGE statement into these comprising commands, making the transition less than optimal.
Today, users of Azure Synapse can leverage the all-encompassing MERGE T-SQL statement to make the most out of their data processing operations. Let’s take a look at an example below to demonstrate the power and simplicity of MERGE.
Syncing HackneyLicense of the New York Taxicab dataset with new data
In this example, we’ll be using the New York Taxicab dataset, which you can load following this tutorial. In particular, we will show how MERGE can help sync our current working dimension table HackneyLicense with data from our daily refresh of the table (we call HackneyLicenseNew, a new table I have created based on HackneyLicense).
The above shows how our modified HackneyLicenseNew compares to the original dimension table. All records up to ID 42954 are the same, but beyond that we have a few differences:
Record with ID 42955 in HackneyLicenseNew has different Key and Code values than HackneyLicense. Perhaps the Key and Code was incorrectly inserted in our original dimension table.
HackneyLicenseNew has two new records for IDs 42959 and 42960. This means we have 2 new Taxi license IDs to begin tracking in Prod.
HackneyLicenseNew is missing rows for IDs 42956, 42957, and 42958. The Taxi licenses for these IDs may have expired, and we need to update our records.
We need to modify HackneyLicense (our working “Target” table) based on the new data in HackneyLicense (our “Source”). In most scenarios, I won’t know the exact differences between the Target and Source, and have to use LEFT OUTER JOINs, INNER JOINs, or other subqueries to identify the differences and sync our Target. With MERGE, we no longer need to manually specify that logic:
MERGE mergeDemo.HackneyLicense AS Target
USING mergeDemo.HackneyLicenseNew AS Source
ON Source.HackneyLicenseID = Target.HackneyLicenseID
-- Updates: For shared IDs, update Key and Code in Target with Source values
AND Source.HackneyLicenseBKey <> Target.HackneyLicenseBKey
AND Source.HackneyLicenseCode <> Target.HackneyLicenseCode
THEN UPDATE SET
Target.HackneyLicenseBKey = Source.HackneyLicenseBKey,
Target.HackneyLicenseCode = Source.HackneyLicenseCode
-- Inserts: Source has 2 new licenses we need to begin tracking
WHEN NOT MATCHED BY Target THEN
INSERT (HackneyLicenseID, HackneyLicenseBKey, HackneyLicenseCode)
VALUES (Source.HackneyLicenseID, Source.HackneyLicenseBKey, Source.HackneyLicenseCode)
-- Deletes: Target has 3 IDs that Source doesn't have
WHEN NOT MATCHED BY Source THEN
OPTION ( LABEL = 'MERGE test - Update 1, Insert 2, Delete 3)' );
First, we set our Production table HackneyLicense as the Target and HackneyLicenseNew as the Source, and join on the ‘HackneyLicenseID’ column to begin comparing.
On Line 5, we use the ‘WHEN MATCHED’ clause to find cases where Target and Source have matching IDs, then update our Target’s Key and Code columns to the values of Source. We also chose to purposely filter on cases where the Key and Code columns differ, to avoid extra work (but we could have also omitted this filter).
On Line 12, we use the ‘WHEN NOT MATCHED [BY Target]’ clause to find cases where IDs exist in Source, but don’t match to anything in Target. In this case, we choose to INSERT those rows from Source into Target.
Lastly, on Line 16, we use the ‘WHEN NOT MATCHED BY SOURCE’ clause to find cases where IDs exist in Target, but don’t match to anything in Source. This means we want to remove those records from Target, so we execute a DELETE statement on Target.
Now we look at the tailend of HackneyLicense and HackneyLicenseProd again. Without using a left outer join or writing even one subquery, we’ve effectively synced our Target working table based on the state of a Source table – all within a single, maintainable statement. Simplifying migrations onto Synapse and improving code readability, we hope you give the powerful MERGE statement a try.
To ensure you are using the official GA-supported version of MERGE, check that the `@@version` of your Synapse Dedicated SQL pool is on '10.0.17829.0' or beyond. To learn more about the MERGE statement in Synapse Dedicated SQL pools, check out MERGE (Transact-SQL).