Fully automate DR instances of your SQL Pools and reduce the RPO in case of a failure.
This article will show how to use Azure Data Factory (or Azure Synapse Pipelines) to build a pipeline that can be used to fully automate DR SQL Pools after each automatic snapshot, which can then be paused to save on costs. The pipeline is ready to be used to replicate more than one SQL Pool because it is completely generic and parameterized.
What is needed
Azure Key Vault x1 - new or existing
Azure Data Factory v2 (or Azure Synapse Pipelines) x1 - new or existing
Azure Synapse Analytics workspace x2 - new or existing, to be used as source and target/DR
SQL Pool table x1 – metadata table to store the execution runs
SQL Pool stored procedure x 1 – to register each full run
Azure key vault secret
Create a new Azure key vault, or use an existing one, and create a new secret using:
|Name||<workspace_name>-<sql_pool_name> e.g. mySynapseWorkspace-MySQLPool|
|Value||<Connection_String> as found in the Azure Synapse SQL Pool connection string|
Screenshots of creating a new secret and getting the connection string from a SQL Pool:
Note: replace User ID and Password as needed.
You need to grant the Azure Data Factory Managed Identity access to your Azure Key vault. Learn more here.
ADF linked service to Azure Key vault
In ADF, create a new linked service to the Azure Key vault instance where the SQL Pool connection strings are saved. This will be used to prevent having credentials stored in the ADF pipelines as well as to allow us to build a generic process.
ADF linked service to Azure Synapse Analytics
In ADF, create a new linked service to Azure Synapse Analytics. This linked service will use the secrets in the Azure Key vault we referred before, making this a generic connector to Azure Synapse SQL Pools that we can use to connect to any SQL Pool as long as there’s a corresponding secret containing a valid connection string.
Here are the steps:
To configure the Secret name parameter, select Add dynamic content and in the new window click in the plus (+) sign to create a new parameter for the SQL Pool workspace name.
Click Save to return to the Add dynamic content window. Add another new parameter for the SQL Pool name:
@concat( linkedService().LS_SYNAPSE_WORKSPACE_NAME, '-', linkedService().LS_SYNAPSE_SQL_POOL_NAME )
Now click on finish to accept the dynamic content and then on Test connection, to confirm that everything is ok. For that you need to provide values for the 2 parameters of this linked service:
This is a dynamic connection to an Azure Synapse SQL Pool, meaning that we can use this linked service to connect to SQL Pools from different workspaces, as long as we have a defined connection string in the form of a secret in the above Azure Key vault.
Note: don’t forget to grant the ADF Managed Identity access to your Azure Synapse Analytics workspace. Learn more here.
Create metadata table and registration stored procedure
In each of the SQL Pools you want to automatically restore, create a new table. It will be used to store all the restores done by this ADF process and drive new ones.
CREATE TABLE [dbo].[restored_backups] ( [run_id] [int] NOT NULL, [name] [varchar](100) NOT NULL ) WITH ( DISTRIBUTION = ROUND_ROBIN, HEAP ) GO
In addition, create a new stored procedure. This process will use it to register full runs and drive new executions.
CREATE PROC [dbo].[p_register_restored_backup] @p_restore_id [INT], @p_name [varchar](100) AS BEGIN INSERT INTO dbo.restored_backups(run_id, name) VALUES(@p_restore_id, @p_name) END GO
ADF Dataset for metadata
Before we create our first pipeline, we need to create a new Azure Synapse Analytics ADF dataset to read the metadata.
Name the dataset “LastRestorePoint”, select the previously create linked service for the connection (SynapseFullParams) and click Ok to save it.
Note that the Dataset uses the previously configured generic Azure Synapse Analytics linked service, that needs 2 parameters (LS_SYNAPSE_WORKSPACE_NAME and LS_SYNAPSE_SQL_POOL_NAME). We now also need to create 2 similar parameters in the dataset (DS_SYNAPSE_WORKSPACE_NAME and DS_SYNAPSE_SQL_POOL_NAME):
Additionally, and in the Connection tab, pass them to the linked service:
Create first pipeline
The first pipeline will check if a SQL Pool restore is needed, by matching the latest automatic SQL Pool backup record (in sys.pdw_loader_backup_runs) and our metadata table.
Create a new pipeline, name it AutoDr and create the following parameters:
These will be used later and will help keep this process as generic as possible.
During testing, you can assign some default values to the parameters, to save some writing when you manually trigger the pipelines.
Now add a Lookup activity and configure as shown:
|Source dataset||the name of the previously created dataset for the metadata, in our case “LastRestorePoint”|
|First row only||checked|
|All other settings||as default|
Now publish your changes and test the pipeline (Add trigger / Trigger now). If you didn’t add default values to the pipeline parameters, know that for now only P_SOURCE_WORKSPACE_NAME and P_SOURCE_SQL_POOL_NAME are needed.
Note: The pipeline should run successfully no matter if there’s a restore needed or not. For now, it is only executing the above SQL query against a generically configured Azure Synapse Analytics SQL Pool. We will return to this pipeline later.
Create second pipeline
The goal for this pipeline is:
- delete the target DR SQL Pool, if any exists
- restore a running SQL Pool from the source workspace into the target DR workspace
- wait for the new SQL Pool to be created and restored
- pause the target DR SQL Pool, as an optional step to save costs if you don’t need to use this pool.
Create a new pipeline named "AutoRestoreSQLPool" and create the same parameters as for the first pipeline:
Add a Web activity, name it “Delete SQL Pool” and configure as follows:
This will call the SQL Pool REST API and send a DELETE request. For more information visit Sql Pools – Delete.
Note: the Azure Data Factory Managed Service Identity must have authorization to perform these types of requests, in this case to delete SQL Pools, on the involved Azure Synapse Analytics workspaces.
To grant the ADF MSI the Contributor role on a workspace using the Azure portal, open the “Access control (IAM)” of that workspace, click on “+ Add” / Add role assignment and populate as necessary. Here’s an example:
Note: don’t forget to select the MSI entry and click Save.
The pipeline can now be published and executed.
This pipeline will not abort if there’s no SQL Pool with the given name and workspace (P_DR_SQL_POOL_NAME and P_DR_WORKSPACE_NAME). Because we will create a new SQL Pool using the same parameters, we don’t need to care for this fact but, if the goal was to only delete the target SQL Pool, then a verification activity should be added after the delete is called.
Now, add a second Web activity and connect the output of the above created Web activity (Delete SQL Pool) to this one and name it “Restore SQL Pool”. To learn more about the method used in this call visit this link.
Configure the settings as:
Note: due to presentation limitations, some lines were split (recoverableDatabaseId and dependsOn). Either merge them or use the file attached at the end of this page ("Restore SQL Pool - body request.zip")
This activity will call the Azure Resource Manager REST API and request the incremental deployment of an Azure resource, in this case an Azure Synapse Analytics SQL Pool that will be recovered from another existing SQL Pool. More options can be found in the online documentation here.
Running this pipeline will trigger the deletion of the target DR SQL Pool and the creation of a new one by recovering from a source and existing SQL Pool.
At this point, the pipeline should look like this:
The next step is to pause the newly created target DR SQL Pool. We can’t do that immediately after calling the restore because it’s an asynchronous call that takes some time, depending on the size of the source SQL Pool. We need to wait and that’s exactly what the next activity will do.
Create 2 new variables that will be used to check if the target DR SQL Pool is created and restored. Notice the Boolean type for both:
Now, add a new Until activity and connect the output of the “Restore SQL Pool” activity to the input of this one. Name it “Wait for Creation and Restore”.
The Expression inside the Settings tab should be:
This means that this activity will loop through until the newly restored target DR SQL Pool is created and fully restored.
Click on the pencil to design the activities for the loop.
To better understand all the activities and how they are connected, let’s have a look at how it will look like at the end:
Start by adding a new Web activity, name it “Get SQL Pool Information” and configure as follows:
|All other settings||as default|
This activity sends a request for information on a SQL Pool to the Azure Synapse SQL Poll REST API. For more information visit this page.
If the request fails, we assume the target DR SQL Pool is not yet created (bottom part of the flow) and otherwise we will look at the status property, which will be Online when the SQL Pool is fully restored and ready to be used.
Add 2 new Set variable activities, connect them as the previous picture and rename them accordingly (“Set V_DR_CREATED to true” and “Set V_DR_CREATED to false”).
Note: to change a connection from green (Success) to red (Failure), select the connection you want to change, right click it and change its “Change To” value as desired.
The settings for these 2 activities should be:
To finish the bottom part of the flow (Failure), add a new Wait activity, name it “Wait For Creation - 30 secs”, connect it as depicted above and change its settings to 30:
This activity will wait 30 seconds before allowing the Until activity to continue.
Next, add an If Condition activity, name it “Check If Restored”, connect it as shown before and change its Expression to:
@equals( activity('Get SQL Pool Information').output.properties.status, 'Online' )
This activity will read the value of the status property of the newly created target DR SQL Pool and check if it is “Online”. If not, we need to wait some time before checking again, otherwise the SQL Pool is ready to be used and we can continue with the pipeline.
To achieve this, we configure the If Condition activity as follows.
a) When true, we add a new Set Variable activity, name it “Set V_DR_RESTORED to true” and configure as:
In this case we signal that the SQL Pool is fully restored. This part of the pipeline will look like:
a) When false, we configure as follows.
Add a Set Variable activity, name it “Set V_DR_RESTORED to false” and set the Variables tab as:
Then, add a new Wait activity, name it “Wait For Restore - 30 secs”, connect it to the previous activity and configure its Wait time in seconds to 30.
This part of the pipeline will look like:
If there’s no need to pause the target DR SQL Poll, then this second pipeline is finished and we can jump to Complete the second pipeline.
If, instead, the SQL Pool needs to be paused, then we need to add that functionality as described in this next step.
At the top level of the pipeline (outside the IF’s and Until’s), add a new Web activity, name it “Pause SQL Pool”, connect it to the end of the Until activity and configure as follows:
Note: when the Method is POST (or PUT), the Body cannot be empty. One way to avoid getting an error when publishing the pipeline is to add a space as the Body content.
This is the error when the Body is left empty:
Complete the first pipeline
At this point, the second pipeline is finished. What is left is to invoke it from the first pipeline and write back the current restore id in the metadata table.
Back to the first pipeline, add a new If Condition activity, name it “Last Backup different from Last Restored”, connect it after the existing lookup and configure as follows:
This activity will compare the last backup run id (from the source SQL Pool system tables) with the last restore run id (from our metadata table also in the source SQL Pool). Not being the same (as the If Condition activity checks), means that a snapshot happened since the last time it checked and a restore needs to be done. If they are the same, it means that the last snapshot taken was already restored by this process.
In the “True” definition of the If Condition add a new Execute Pipeline activity, name it “Execute AutoRestoreSQLPool” and configure as follows:
|Invoked pipeline||AutoRestoreSQLPool, which is the name of second pipeline|
|Wait on completion||Checked|
Then add a new Stored Procedure activity, name it “Register Restore ID”, connect it after the Execute Pipeline and configure as follows:
|Linked service||SynapseFullParams, which is the fully parameterized Linked Service we created at the beginning|
|The Linked service properties will show up:|
|Stored procedure name||[dbo].[ p_register_restored_backup], click Refresh for the stored procedures to show up in the selection box|
|Stored procedure parameters:||click Import for the stored procedure parameters to show up|
This part of the pipeline and the settings for the stored procedure call will look like:
And this completes the first pipeline and achieves the goal for this process.
As shown, it's very easy to put in place a process to automatically restore Azure Synapse Analytics SQL Pools to serve different use-cases, on of them being the reduction of the RPO in case of a failure.
Although this is a fully functional process, some suggestions for improvement include:
In the future, look for another post where I will show how to fully export SQL Pools into partitioned areas in a Data Lake storage, that can be used to cover compliance regulations for long periods of backup retention and, on top, apply some backup storage lifecycle rules to take advantage of different types of storage with the goal of reducing storage costs.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.