Author(s): Freddie Santos is a Program Manager in Azure Synapse Customer Success Engineering (CSE) team.
In my previous post, I discussed the basics of disaster recovery and high availability and how they can be implemented on Azure Synapse. You can revisit that post directly here: Creating a custom disaster recovery plan for your Synapse workspace Part 1.
This post will focus specifically on one of the engines in the Synapse workspace, the dedicated SQL Pools, and explore options for creating a custom disaster recovery plan for our databases. The aim is to help you create a plan that aligns with your business needs, which may require a more granular RPO and/or RTO than what is currently available as part of Azure Synapse Dedicated Pools.
To begin, we will delve into the connectivity endpoints and understand how the way we create our Synapse Workspace will impact our Disaster Recovery Plans.
In Azure, a database connection endpoint is a unique address used to connect to a database hosted on the Azure cloud platform. Endpoints are critical for allowing clients to interact with databases on Azure, and in summary, database connection endpoints act as the front door for connecting to your database.
Here are some key points to understand about database connection endpoints on Azure:
Now, let's explore the three different ways to create and ultimately connect to a dedicated SQL Pool.
The Dedicated SQL Pools was initially a separate service called Azure SQLDW, and it is still accessible as a standalone Dedicated SQL Pool.
When a Dedicated SQL Pool is created using this option, the connectivity endpoint that directs the connections to the logical database is sqlservername.database.windows.net
By examining the endpoint address, we can see that the redirection for the logical database begins with the name of the logical server that was assigned when the service was initially set up. To illustrate this visually:
SQLDB Endpoints support DNS Alias connections, so if DNS Switch Over is a requirement, we need to plan and use this method of connection. However, this does not mean that you should give up on using Synapse workspace. In this blog post, we will explore the alternative solutions.
As an alternative, we have the option to create the Dedicated SQL Pools through the Synapse Analytics service, as illustrated in the following picture.
If you create your dedicated SQL Pool using Synapse Analytics service, there will be a significant difference in the connectivity aspect compared to the Stand-alone SQLDW. Instead of using the sqlservername.database.windows.net endpoint to route connections to the logical database, a different connectivity gateway is utilized to direct connections to the Synapse Workspace. This means that a different connection endpoint is required to establish a database connection. However, there are other operational considerations associated with this choice that go beyond the scope of this post.
Another major difference is that currently, when connecting through the workspace endpoint, DNS alias is not supported. This limitation prevents us from using DNS Switchover in our disaster recovery plans, as we discussed briefly in the first part of this series. However, having the Dedicated Pools in the Synapse workspace does not necessarily exclude the usage of DNS Alias. We have an alternative that combines both worlds, allowing users to have a SQL Endpoint and our Dedicated Pool "inside" a Synapse Workspace.
As mentioned before, we still can create and connect on the Dedicated Pools by using the “old” method of creating a SQL DB Endpoint to resolve the connections, without using a Workspace.
To implement DNS Switch Over using DNS Alias, we can create a "Connected Workspace." This involves creating the Dedicated Pools using the "old" method, that is, by creating it using a SQL Endpoint, and then moving that server into a workspace. The name resolution in this case would be as follows:
To enable this feature, you just need to create the workspace on top of your SQL Endpoint. This allows you to combine the features of the Synapse Workspace with the ability to resolve connections using SQL Endpoints. This ultimately provides you with the capability of using DNS Alias and enables you to use DNS Switch Over for your Disaster Recovery Plan.
Now that we have a basic understanding of the connection endpoints, we can begin to discuss various scenarios for our disaster recovery (DR) plans.
As we discussed in a previous post, we must first determine our Recovery Point Objective (RPO) and Recovery Time Objective (RTO) based on our business requirements. Additionally, we must assess whether DNS Switchover is a technical requirement for our DR plan. With these factors in mind, we can start to draft our custom plan.
Let's consider the standard architecture of modern data warehouses:
The architecture shown above is composed of four stages and relies on Azure Data Factory (which can be interchanged with Synapse Pipelines) to ingest data. The data is then stored and transformed across various layers or zones within our Data Lake, utilizing Azure Data Lake Gen 2. Finally, the data is loaded into a dimensional model, either for reporting purposes or for other engines to consume the data through Power BI.
For the remainder of this article, we will concentrate on the Serve/Report stage, specifically on the Dedicated SQL Pool.
As mentioned earlier, the Synapse Native workspace utilizes its own connection endpoint (xxx.azuresynapse.com), which currently does not support DNS Alias. This is an important factor to consider when developing a disaster recovery plan. In order to create an effective plan, it is crucial to determine the Recovery Point Objective (RPO) and Recovery Time Objective (RTO) for your dedicated SQL Pools.
If DNS is not a requirement, and you have a DR plan in place that can accommodate manual steps, then the focus should be on the RPO and RTO for your dedicated SQL Pools. These are critical factors to consider when creating a custom DR plan for your native workspace.
Considering the impact of the database size on data transfer and restoration time, it is crucial to carefully plan a DR strategy for the dedicated SQL Pools with respect to RTO and RPO.
With that in mind the suggested approach on creating a custom DR plan for a native workspace will be to replicate or restore your data in a pair region. The intention here is to have your dedicated SQL Pool on the same region as your secondary region for your storage (ADLS Gen 2) account, meaning that if you need access your external tables it will be local, assuming that your DR plan is happening because the entire Azure Region goes down.
To achieve the same, assuming that we are creating these resources under the same resource group and Vnets, consider the following steps:
This architecture has the following advantages:
However, there are some disadvantages:
Implementing a custom DR plan can provide greater flexibility for RPO and RTO compared to the built-in DR provided by the service. However, it requires customization and a clearly documented process to redirect applications to the new gateway.
Additionally, if the ADLS storage account is not configured with Hierarchical Namespace, customers can test the DR solution using RA-GRS/RA-GZRS and manual database restoration. It is recommended to enable Hierarchical Namespace for better performance, but doing so eliminates the possibility of customer managed failovers. To learn more, check out Data redundancy - Azure Storage.
The connected workspace approach follows a similar approach to the "Native" workspace approach. However, it has an advantage over the Native workspace as it allows us to use Azure SQL endpoints, which in turn allows us to redirect DNS requests using DNS Alias.
The architecture for the connected workspace approach would be the same as that of the Native workspace approach:
One of my colleagues, Reshan Popli, has written a helpful guide on how to configure DNS aliases for dedicated SQL pools in Synapse workspaces to support disaster recovery. Check out the guide here: Create DNS alias for dedicated SQL pool in Synapse workspace for disaster recovery
This architecture has the following advantages:
But also have some disadvantages:
Additionally, using the connected workspace approach allows for the use of automation to streamline the disaster recovery process. The steps for setting up this automation can be found at Azure Synapse SQL Pools Auto DR.
To summarize, the custom plans outlined above offer greater flexibility in terms of RPO and RTO than the built-in disaster recovery options provided by the service. The next posts will cover disaster recovery aspects for Spark and Serverless pools.
Stay tuned for more information.
Our team publishes blog(s) regularly and you can find all these blogs here: https://aka.ms/synapsecseblog
For deeper level understanding of Synapse implementation best practices, please refer our Success by Design (SBD) site: https://aka.ms/Synapse-Success-By-Design
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.