“We would like to have the data flow to our clients like ‘water from the faucet’ stated the customer for which this solution was designed. This on-premises (on-prem) SQL Server based application contains crucial financial data for the external clients of this customer. External clients using this solution subscribe to this data and there is a need to deliver a subset of this data (8 Terabytes [TB] and growing) to them based on their subscription levels and requirements. A significant number of challenges are present with the current on-prem solution to meet these needs.
Currently, a custom solution is in place to manually export the subset of the data that each external client needs and to “ship” it to the individual client via File Transfer Protocol (FTP). It currently takes 3 to 4 weeks for onboarding a new external client with this process. This custom solution is also meant to identify and extract on-going changes and manually push these changes to the individual customers, which takes 1 to 2 days. These external clients are obviously not happy with the latency in data delivery.
Multiple cases of data loss and corruption have also been experienced with this solution. Moreover, the entire solution is disconnected as there is no control over the data sets that are being delivered which poses security concerns, not to mention the monitoring and maintenance challenges.
A cloud-based solution using SQL Server in the Azure layers, Infrastructure as a Service (IaaS) and Platform as a Service (PaaS), was proposed, utilizing a well-known and built-in SQL Server feature named “Transactional Replication.”
So, how does this solution work? Below is a picture of the solution built in the customer’s Azure subscription. Multiple SQL Server Virtual Machines (VMs) in the Azure IaaS layer were built to receive data from the on-prem SQL Server database. These VMs were built in an Availability Set with different Fault Domain and Update Domain IDs to increase availability. Transactional Replication was setup with the on-prem database server as Publisher/Distributor with the Azure VMs as individual Subscribers.
Following that, multiple Azure SQL DBs in the PaaS layer were built to receive data from the individual Azure VMs. These Azure SQL DBs were built in multiple tiers (Standard and Premium) and at multiple levels (S3, S5, P1, P11, P15 etc.) to compare/contrast features and performance metrics. Transactional Replication was setup with SQL Server VMs as Re-Publishers/Distributors and Azure SQL DBs as Subscribers. The filtered replication feature was used to ensure that the appropriate dataset is received by the individual Azure SQL databases.
An initial Snapshot with all the appropriate tables in the on-prem database was created and applied to each of the Azure SQL VMs. Ongoing changes to these tables in the on-prem database is continuously replicated via Transactional Replication to the databases in Azure SQL VMs. This ensures that the databases in the Azure VMs are kept up to date with data from the on-prem database.
A new external customer can be on-boarded by taking an initial snapshot of all appropriate tables (based on customer subscription requirements), applying this snapshot to a new Azure SQL DB and providing access to the external customer. All of this can be done anywhere from couple of hours to a day, based on the snapshot size. The tables in these Azure DBs will be kept continuously in-sync with the source tables via replication. Multiple Filtered conditions configured within the replication between Azure VM and Azure SQL DB will ensure that only the necessary data is received by the SQL DB and hence will be accessible to the external client. TDE (Transparent Data Encryption) has been enabled to secure the data at rest in Azure SQL DB.
This solution has proven to meet and exceed customer expectations and resolve/minimize multiple challenges with the current on-prem solution. The external customer on-boarding process now takes less than a day compared to multiple weeks. On-going data changes will be delivered in near-real time with very minimal latency. Built-in resiliency of the Transactional Replication feature ensures no data loss or corruption in transit. The entire solution is well connected and is in the control of the customer, which ensures better security and maintainability. Built-in Availability and Recoverability features of Azure improve HA/DR capabilities over the current on-prem solution. Monitoring and Altering can be improved with the built-in monitoring capabilities of the SQL Server feature set and Azure services. The fanout architecture of the solution ensures better performance and scalability as the replication workload for different Azure SQL DBs can be balanced between multiple Azure VMs and additional VMs can be added to share the workload as the client base increases. The elastic nature of Azure VMs also provides the flexibility to start small and scale up as the resource demand increases.
While Transparent Data Encryption feature takes care of securing the data at rest; firewall rules, role-based access and Azure Active Directory (AAD) integration ensures no unauthorized access to the database.
Finally, growing demands of external clients can also be met by the seamless scalability of Azure SQL DBs. Automation of scaling can scale/up Azure DBs as per the client needs, providing that perfect balance between price vs performance, without any service interruptions.
What are the customer’s final comments?
“You may have not realized this Microsoft, but you actually made my dream come true.”
The Data Migration Team would like to thank Kal Yella, Associate Architect with the Data Insights Center of Excellence team, for authoring this post, and the extended Jumpstart Program team members for serving as Technical Reviewers.