Update! Azure Synapse Link for SQL is now generally available (GA) with the release of SQL Server 2022 announced at the PASS Data Community Summit. Learn more >
Data-driven, quality insights are increasingly critical for companies to stay competitive, and the speed to achieve those insights can make all the difference. The costly and time-consuming nature of traditional ETL and ELT pipelines are no longer enough. At Microsoft Build 2022, we’re announcing the public preview availability of Azure Synapse Link for SQL, for both SQL Server 2022 and Azure SQL Database. With this release, you can now take advantage of low- and no-code, near real-time data replication from your SQL-based operational stores into Azure Synapse Analytics. This makes it easier to run BI reporting on operational data in near real-time, with minimal impact on your operational store.
Synapse Link for SQL Server 2022 helps us to seamlessly replicate operational data in near real time to be able to have more powerful analytics. –Javier Villegas, IT Director for DBA and BI Services, Mediterranean Shipping Company
What is Azure Synapse Link for SQL?
Azure Synapse Link for SQL is an automated system for replicating data from your transactional databases (both SQL Server 2022 and Azure SQL Database) into a dedicated SQL pool in Azure Synapse Analytics. The process of setting up a link from your SQL data to Azure Synapse takes just a few clicks and a matter of minutes rather than hours or days for traditional ETL processes. Once configured, your initial data is replicated into the target dedicated SQL pool. After the initial table seeding, changes made to your source data are replicated in near real-time.
You control the size of the Azure Synapse Analytics dedicated SQL pool as well as the number of cores used to ingest the data, and your source system handles making the data available for ingestion.
Why use it?
Azure Synapse Link for SQL helps streamline data replication into Azure Synapse Analytics for any organization with data in SQL Server 2022 or Azure SQL Database. A few example scenarios where this applies:
Near real-time analytics: Traditional ETL system run on a periodic schedule. Once the initial data seeding (which is automated) has completed, Azure Synapse Link for SQL replicates your transactional data into Azure Synapse Analytics in near real-time.
Low-code/no-code solutions: Azure Synapse Link for SQL provides a low- or no-code solution to replicating data. There’s no need to build and manage ETL packages or warehouse schemas—choose which tables to replicate, provide a distribution method and storage architecture (heap/columnstore), and start the connection.
Data consolidation: Azure Synapse Link for SQL lets you easily combine data from multiple operational databases (both cloud-based and on-premises) into a single cloud-based analytical solution that you can use for workloads including—but not limited to—reporting and machine learning modeling.
Minimal impact on source systems: Batch-based ETL and ELT systems can put extra load on an operational system when they query for data changes. Azure Synapse Link for SQL introduces a new change feed processor that watches for changes and efficiently moves them to a temporary landing zone for processing into the target dedicated SQL pool, thereby minimizing the impact of extracting changes from your source system.
What makes it work?
Azure Synapse Link for SQL takes advantage of the latest technology to make data movement simple and efficient:
Change feed: Change feed is a new feature—in both SQL Server 2022 and Azure SQL Database—built to support data synchronization between the source transactional system and Azure Synapse Analytics.
Azure Data Lake Storage Gen2: Azure Synapse Link for SQL takes advantage of Azure Data Lake Storage Gen2 as a landing zone for the data coming from your source systems. This “buffer” helps minimize the impact of the connection on your source database systems.
Azure Synapse Analytics Dedicated SQL Pool: The ultimate target for the data coming from your source systems is a dedicated SQL pool in Azure Synapse Analytics—which you can size to meet the needs of both your ingestion data volume and your query workloads.
Ingestion Service: When you enable Azure Synapse Link for SQL, behind the scenes we deploy a cloud-based ingestion service to move data from the landing zone into the target dedicated SQL pool. This is fully managed for you and works in the background to make your data available.
Getting up and running
To use Azure Synapse Link for SQL, you first need to create the following elements in your Azure Synapse Analytics environment:
Linked Service to Source Database: This is a standard Azure Synapse Analytics linked service and can be created for either Azure SQL Database or SQL Server 2022.
Azure Synapse Analytics Dedicated SQL Pool: This will be used as the target for your replicated transactional data.
If you are using SQL Server 2022 as your source database, you also need to create the following (these are automatically created for Azure SQL Database):
Azure Data Lake Storage Gen2: This is the storage account that is used as the landing zone. While this account is managed by you, the files in the landing zone can only be used for Azure Synapse Link for SQL, to allow for changing the retention policy or file format as needed.
Linked Service to Landing Zone: In the Azure Synapse Analytics workspace, you’ll need to create a linked service pointing to the storage account created above.
Self-Hosted Integration Runtime: You’ll also need to configure and install a self-hosted integration runtime that can talk to both the SQL Server 2022 database and the Azure Synapse Analytics environment. This is used to broker commands between the two environments.
Once you have the prerequisites in place, you can create an Azure Synapse Link Connection in Azure Synapse Analytics. When you create the connection, you specify:
The source database.
Which tables you’d like to replicate from the source.
The target dedicated SQL pool.
For each table you set up in the connection, you can specify:
Table and schema names in the target dedicated SQL pool (these don’t need to be the same as the source table and schema names).
Distribution type—you can either use round robin, hash-distributed, or replicated tables.
Structure type—you can organize your table as either a heap, a clustered columnstore index, or a clustered index.
Once you have completed the initial setup, you can start the connection. At this point, here’s what happens:
An initial export of the schema and data from the source tables is executed. This work is actually done by the source database (either Azure SQL DB or SQL Server 2022), and the data is placed into the landing zone.
The ingestion service picks up the snapshots from the landing zone, creates the target tables in the dedicated SQL pool, and then loads the initial data.
Once the initial load is complete, the source database continually publishes changes for each table to the landing zone, where they are picked up by the ingestion service and applied to the target dedicated SQL pool.
While you cannot make modifications to the configuration of tables in a running link, you can add tables to and remove tables from it.
“With SQL Server 2022 and the arrival of Synapse Link, we can finally achieve near real-time analytics over our most important data, all without the need to build and manage complex packages and pipelines” – Nic Hopper, Data Architect, Atrium Underwriters
Want more information?
To learn more, sign up for our free webinar where the Azure Synapse Link engineering team will demo the new functionality and answer your questions live.