SQL 2019
27 TopicsSQL Backup using Azure File Share
SQL Server limits the maximum backup size supported using a page blob to 1 TB. The maximum backup size supported using block blobs is limited to approximately 200 GB (50,000 blocks * 4 MB MAXTRANSFERSIZE). In order to get over the above limitation, we can opt for Azure File Share. SQL Backup is more than 12TiB, you can choose to backup using Azure File Share Standard file shares can span up to 100 TiB, however this feature is not enabled by default. If you need a file share that is larger than 5 TiB, you will need to enable the large file share feature for your storage account. Premium file shares can span up to 100 TiB without any special setting, however premium file shares are provisioned, rather than pay as you go like standard file shares. This means that provisioning file share much larger than what you need will increase the total cost of storage. In local and zone redundant storage accounts, Azure file shares can span up to 100 TiB, however in geo- and geo-zone redundant storage accounts, Azure file shares can span only up to 5 TiB. Prerequisites Storage Account Azure PowerShell Storage Module https://www.powershellgallery.com/packages/Az.Storage/3.7.0 Steps 1. Storage Account with Premium FileShare Enable Large File Share 2. Register for the SMB Multichannel preview with the following commands. Connect-AzAccount # Setting your active subscription to the one you want to register for the preview. # Replace the <subscription-id> placeholder with your subscription id. $context = Get-AzSubscription -SubscriptionId <your-subscription-id> Set-AzContext $context Register-AzProviderFeature -FeatureName AllowSMBMultichannel -ProviderNamespace Microsoft.Storage Register-AzResourceProvider -ProviderNamespace Microsoft.Storage You can also verify if the feature registration is complete Get-AzProviderFeature -FeatureName AllowSMBMultichannel -ProviderNamespace Microsoft.Storage 3. Enable SMB Multichannel Once you have created a File Storage account, you can follow the instructions to update SMB Multichannel settings for your storage account Note: If the SMB Multichannel option is not visible under File share settings or you get a failed to update setting error while updating the configuration, please make sure that your subscription is registered, and your account is in one of the supported regions with supported account type and replication. 4. Create a file share You can set max capacity up to 100TB 5. Connect to FileShare from Window either using Active Directory or Storage Account Connecting to a share using the storage account key is only appropriate for admin access. But mounting the Azure file share with the Active Directory identity of the user is preferred. $connectTestResult = Test-NetConnection -ComputerName testsmbfileshare.file.core.windows.net -Port 445 if ($connectTestResult.TcpTestSucceeded) { # Save the password so the drive will persist on reboot cmd.exe /C "cmdkey /add:`"testsmbfileshare.file.core.windows.net`" /user:`"localhost\testsmbfileshare`" /pass:`"SxbRsNuwc1*******/8lk1TyUkqC+2+************==`"" # Mount the drive New-PSDrive -Name Z -PSProvider FileSystem -Root "\\testsmbfileshare.file.core.windows.net\sqlbackup" -Persist } else { Write-Error -Message "Unable to reach the Azure storage account via port 445. Check to make sure your organization or ISP is not blocking port 445, or use Azure P2S VPN, Azure S2S VPN, or Express Route to tunnel SMB traffic over a different port." } Copy this script and run this in PowerShell to map this as a network drive locally. This script will check to see if this storage account is accessible via TCP port 445, which is the port SMB uses. If port 445 is available, your Azure file share will be persistently mounted. Note: The script will only work on Windows Server 2012 and above Once we had the above script is executed, we could see the Z drive as network Drive in My computer / This PC 6. On the SQL Server you need to first enable XP_cmdshell so we can configure backups to this file share. Enable the Advance SQL Configuration EXECUTE sp_configure 'show advanced options', 1; GO -- To update the currently configured value for advanced options. RECONFIGURE; GO -- To enable the feature. EXECUTE sp_configure 'xp_cmdshell', 1; GO -- To update the currently configured value for this feature. RECONFIGURE; GO We must mount the Z Drive in SQL server and opt for it to be available for backups. Therefore, we map it using the script below xp_cmdshell 'net use Z: \\testsmbfileshare.file.core.windows.net\sqlbackup /u:localhost\testsmbfileshare SxbRsNuwc1*******/8lk1TyUkqC+2+************==`' Get the storage account, username and access key from Step 5 7. Backup the database now to the file share subdirectory using the below command BACKUP DATABASE [AdventureWorks2019] TO DISK = 'Z:\AdventureWorks2019.bak' with stats = 5 Reference (optional) SMB file shares in Azure Files | Microsoft Docs Create an Azure file share - Azure Files | Microsoft Docs https://technet.microsoft.com/en-us/library/dn435916(v=sql.120).aspx#limitations21KViews6likes4CommentsPolybase for beginners.
In regards to Polybase, I have seen some confusion from customers about what it is, the problems that it solves and when to use it; so I decided to come up with a brief blog post to explain Polybase in some short paragraphs. Hopefully after reading this you will walk out with a good enough grasp of it, even if you are completely new to this topic.16KViews6likes1CommentEffectively troubleshoot latency in SQL Server Transactional replication: Part 1
Are you struggling with latency issues in SQL Server Transactional replication? This article provides clear, step-by-step instructions to effectively troubleshoot and resolve these challenges. Dive into proven techniques and best practices that will help you enhance your SQL Server's performance and ensure seamless data replication. Don't let latency slow you down—master the art of SQL Server troubleshooting today! I hope you find this teaser engaging! If you need any adjustments or additional content, feel free to let me know. Thanks to Collin Benkler, Senior Escalation Engineer in Microsoft for SQL Server for his valuable feedbacks.6.7KViews4likes4CommentsCommon Causes and Troubleshooting Solutions for SQL AG Data Synchronization Latency
This article summarizes the common causes , solutions and troubleshooting mechanism for SQL Availability Group (AG) data synchronization latency between primary and secondary for both synchronous-commit and asynchronous-commit mode. The latency commonly happens at log harden phase or log redo phase. And sometimes, both. If it happens at log harden phase, you will likely to see HADR_SYNC_COMMIT wait type in your primary that waiting for your synchronous-commit secondary replica's acknowledgement. Also, your primary is likely to encounter latency on committing transactions sent from application because of this wait. If latency happens purely at log redo phase, you are likely to see a relatively low redo rate in your issued secondary replica with high value of redo queue. The key point is to first narrow down the actual phase of workflow (attached at the bottom of this article as also mentioned in a few other blogs thanks to great work of Simon Su )in data synchronization that induces the latency. For experienced DBA, if you have already narrowed down the latency to be caused by extensive HADR_SYNC_COMMIT wait type in your primary for your synchronous-commit secondary replica, you can refer to this article for resolving this bottleneck : https://techcommunity.microsoft.com/t5/sql-server/troubleshooting-high-hadr-sync-commit-wait-type-with-always-on/ba-p/385369 If you wish to capture detailed traces to narrow down the bottleneck, except for performance monitor counters where you can check log send and redo efficiency, blocking chain and its block header on primary replica, SQL AG dedicated X-events in both primary and issued secondary are also required to see the data synchronization workflow latency. We have an automatic tool "AGLatency Report Tool" to help you analyze the AG X-event data thanks to the great work of Simon Su . You can find this tool in this link with its user manual and the AG X-event script in this link Below are the common causes and its solution/troubleshooting mechanism for SQL AG Data Synchronization Latency: Long-running active transactions in read-only secondary replica ---- > When there is a long-running transaction in readable secondary replica, there can be blocking for the local redo thread in secondary replica when accessing the same data. High network latency / low network throughput ----> When the network transmission between primary and the issued secondary is slow or instable, the log send rate will be slow and impact the amount of data being transmitted to secondary(refer doc https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-2012/dn135335(v=sql.110)?redirectedfrom=MSDN) Flow control induced high log send queue size and low log send rate ---- > When your SQL AG's performance triggered flow control gate for specific SQL AG databases or at SQL AG replica level, there would be temporary termination and limitation of sending log data from primary to secondary. Flow control can be commonly caused by server overloading or slow network. We need to resolve the conditions in your environment that cause the flow control if we have constantly observed flow control gate being entered by your AG. You can check below performance monitor counters to see if your SQL AG Database or SQL AG replica has ever triggered flow control gate: AG Replica level Availability Replica : Flow Control Time , Availability Replica : Flow Control/sec , AG Database Level Database Replica: Database Flow Control Delay Database Replica: Database Flow Controls/Sec (For Flow Control Gate details as discussed below, please refer this link Monitor Performance for AlwaysOn Availability Groups | Microsoft Docs) AlwaysOn Availability Groups is designed with flow control gates on the primary replica to avoid excessive resource consumption, such as network and memory resources, on all availability replicas. These flow control gates do not affect the synchronization health state of the availability replicas, but they can affect the overall performance of your availability databases, including RPO. After the logs have been captured on the primary replica, they are subject to two levels of flow controls, as shown in the table below. Level Number of Gates Number of messages Useful Metrics Transport 1 per availabiltiy replica 8192 Extended event database_transport_flow_control_action Database 1 per availability database 11200 (x64) 1600 (x86) DBMIRROR_SEND Extended event hadron_database_flow_control_action FLOW CONTROL GATES Once the message threshold of either gate is reached, log messages are no longer sent to a specific replica or for a specific database. These can be sent once acknowledgement messages are received for the sent messages to bring the number of sent messages below the threshold. In addition to the flow control gates, there is another factor that can prevent the log messages from being sent. The synchronization of replicas ensures that the messages are sent and applied in the order of the log sequence numbers (LSN). Before a log message is sent, its LSN also checked against the lowest acknowledged LSN number to make sure that it is less than one of thresholds (depending on the message type). If the gap between the two LSN numbers is larger than the threshold, the messages are not sent. Once the gap is below the threshold again, the messages are sent. Blocked REDO thread ---- When there are other issues or performance bottlenecks in secondary replica, the REDO thread can be blocked and cause latency of data synchronization. Refer document https://techcommunity.microsoft.com/t5/sql-server/alwayson-minimizing-blocking-of-redo-thread-when-running/ba-p/383963 Shared REDO Target ----- This issue only happens when you have multiple secondary replicas. When one secondary replica is very slow in finishing the redo process, it will limit the other secondary replicas in proceeding further on the maximum allowable LSN with the redo process. Trace flag 9559 can be used to mitigate the issue. (refer doc https://docs.microsoft.com/en-us/archive/blogs/alwaysonpro/recovery-on-secondary-lagging-shared-redo-target) Running out of parallel redo threads in secondary replica --- When secondary replica is running out of parallel redo threads, you can use TF3478 to allow maximal number of parallel redo thread to increase with total number of CPUs. By default, A SQL Server instance uses up to 100 threads for parallel redo for secondary replicas. Each database uses up to one-half of the total number of CPU cores, but not more than 16 threads per database. If the total number of required threads for a single instance exceeds 100, SQL Server uses a single redo thread for every remaining database. Disk or I/O subsystem latency -- > when the disk or I/O subsystem in the secondary replica has critical bottleneck, the redo rate will be impacted and quite small in secondary replica. We are less likely to see it in your secondary before it hit on primary if your secondary replica is not readable and has comparable hardware for SQL database files' storage as that in primary and dedicated for this SQL AG's usage. However, if your secondary replica has high volume of read-only workload, it can bring extra IO overhead to your IO subsystem. Frequent database backup and VM snapshot can be another potential cause of extra I/O overhead.25KViews3likes0CommentsConfigure multiple-subnet AlwaysOn Availability Group by modifying CIB
In the Windows world, a Windows Server Failover Cluster (WSFC) natively supports multiple subnets and handles multiple IP addresses via an OR dependency on the IP address. On Linux, there is no OR dependency, but there is a way to achieve a proper multi-subnet natively with Pacemaker, as shown by the following. You cannot do this by simply using the normal Pacemaker command line to modify a resource. You need to modify the cluster information base (CIB). The CIB is an XML file with the Pacemaker configuration. Here is an example to create a SQL Server Linux Availability group in 4 nodes in 3 subnets in RHEL 7.6 If you are already familiar with the AG Group setup process, please just jump to step 16. 1.Register your subscription on for all servers (red1,red2,red3 and red4 in this case) subscription-manager register 2.List all available subscription, pick the one with High Availabiilty , notedown the pool id subscription-manager list --available --all 3.Register the subscription for all nodes (red1,red2,red3 and red4 in this case) sudo subscription-manager attach --pool=xxxxx 4.Enable the repository(red1,red2,red3 and red4 in this case) sudo subscription-manager repos --enable=rhel-ha-for-rhel-7-server-rpms 5.Install Pacemaker packages on all nodes. (red1,red2,red3 and red4 in this case) sudo yum install pacemaker pcs fence-agents-all resource-agents 6.Install SQL Server resource agent (red1,red2,red3 and red4 in this case) sudo yum install mssql-server-ha 7.Set the password for the default user that is created when installing Pacemaker and Corosync packages. All the password should be exactly same (red1,red2,red3 and red4 in this case) sudo passwd hacluster 8.Update /etc/hosts file in all servers, add IP and node name. All the servers should have the same entries. 192.168.2.103 red1 192.168.2.104 red2 192.168.4.100 red3 192.168.5.101 red4 9.Run following commands to Enable and start pcsd service and Pacemaker in all nodes. (red1,red2 and red3 and red4 in this case) sudo systemctl enable pcsd sudo systemctl start pcsd sudo systemctl enable pacemaker 10.Run following commands to Create Cluster in primary replica node (red1 in this case) sudo pcs cluster auth red1 red2 red3 red4 -u hacluster -p YouPasswordUsedinStep7 sudo pcs cluster setup --name sqlcluster1 red1 red2 red3 red4 sudo pcs cluster start --all sudo pcs cluster enable --all 11.Run following command to Enable cluster feature in all nodes(red1,red2 , red3 and red4 in this case) sudo /opt/mssql/bin/mssql-conf set hadr.hadrenabled 1 sudo systemctl restart mssql-server Create AG and Listener 1.Run following queries in red1 to create certificate use master go CREATE MASTER KEY ENCRYPTION BY PASSWORD = '**<Master_Key_Password>**'; go CREATE CERTIFICATE dbm_certificate WITH SUBJECT = 'dbm'; go BACKUP CERTIFICATE dbm_certificate TO FILE = '/var/opt/mssql/data/dbm_certificate.cer' WITH PRIVATE KEY ( FILE = '/var/opt/mssql/data/dbm_certificate.pvk', ENCRYPTION BY PASSWORD = '**<Private_Key_Password>**' ); 2.Run following commands in red1 to copy the certificate to rest of the servers(red2,red3 and red4 in this case) cd /var/opt/mssql/data scp dbm_certificate.* root@red2:/var/opt/mssql/data/ scp dbm_certificate.* root@red3:/var/opt/mssql/data/ scp dbm_certificate.* root@red4:/var/opt/mssql/data/ 3.Give permission to the mssql user to access the certificate files in rest of the servers(red2,red3 and red4 in this case) cd /var/opt/mssql/data chown mssql:mssql dbm_certificate.* 4.Run following T-SQL queries to create the certificate in rest of the nodes by restoring the certificate backup file (red2,red3 and red4 in this case) use master go CREATE MASTER KEY ENCRYPTION BY PASSWORD = '**<Master_Key_Password>**' go CREATE CERTIFICATE dbm_certificate FROM FILE = '/var/opt/mssql/data/dbm_certificate.cer' WITH PRIVATE KEY ( FILE = '/var/opt/mssql/data/dbm_certificate.pvk', DECRYPTION BY PASSWORD = '**<Private_Key_Password>**' ) 5.Create endpoint in all servers (red1,red2,red3 and red4) CREATE ENDPOINT [Hadr_endpoint] AS TCP (LISTENER_PORT = 5022) FOR DATABASE_MIRRORING ( ROLE = ALL, AUTHENTICATION = CERTIFICATE dbm_certificate, ENCRYPTION = REQUIRED ALGORITHM AES ); ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED; 6.Run following query in primary replica (red1) to create Availability group(Please note, it works for SQL 2019. If you are using SQL 2017, you need to change AVAILABILITY_MODE of one the replica to ASYNCHRONOUS_COMMIT) CREATE AVAILABILITY GROUP [ag1] WITH (DB_FAILOVER = ON, CLUSTER_TYPE = EXTERNAL) FOR REPLICA ON N'red1' WITH ( ENDPOINT_URL = N'tcp://red1:5022', AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, FAILOVER_MODE = EXTERNAL, SEEDING_MODE = AUTOMATIC) , N'red2' WITH ( ENDPOINT_URL = N'tcp://red2:5022', AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, FAILOVER_MODE = EXTERNAL, SEEDING_MODE = AUTOMATIC), N'red3' WITH ( ENDPOINT_URL = N'tcp://red3:5022', AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, FAILOVER_MODE = EXTERNAL, SEEDING_MODE = AUTOMATIC), N'red4' WITH ( ENDPOINT_URL = N'tcp://red4:5022', AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, FAILOVER_MODE = EXTERNAL, SEEDING_MODE = AUTOMATIC) ALTER AVAILABILITY GROUP [ag1] GRANT CREATE ANY DATABASE;--grant create any database permission Join the AG group, run the following T-SQL queries in all the secondary servers (red2,red3 and red4 in this case) ALTER AVAILABILITY GROUP [ag1] JOIN WITH (CLUSTER_TYPE = EXTERNAL); ALTER AVAILABILITY GROUP [ag1] GRANT CREATE ANY DATABASE 8.Run following T-SQL Queries to create database and add it to AG group in primary replica (red1 in this case). CREATE DATABASE [db1]; ALTER DATABASE [db1] SET RECOVERY FULL; BACKUP DATABASE [db1] TO DISK = N'/var/opt/mssql/data/db1.bak'; BACKUP log [db1] TO DISK = N'/var/opt/mssql/data/db1.trn'; GO ALTER AVAILABILITY GROUP [ag1] ADD DATABASE [db1]; 9.Create SQL login pacemaker in all servers (red1,red2,red3 and red4 in this case). CREATE LOGIN [pacemakerLogin] with PASSWORD= N'ComplexP@$$w0rd!' GO ALTER SERVER ROLE [sysadmin] ADD MEMBER [pacemakerLogin] 10.Run following bash command in red1 sudo pcs property set stonith-enabled=false In all SQL Server Linux servers , run following bash commands to save the credentials for the SQL Server login.(red1,red2,red3 and red4) (The password is as same as the one used in step 9) echo 'pacemakerLogin' >> ~/pacemaker-passwd echo 'ComplexP@$$w0rd!' >> ~/pacemaker-passwd sudo mv ~/pacemaker-passwd /var/opt/mssql/secrets/passwd sudo chown root:root /var/opt/mssql/secrets/passwd sudo chmod 400 /var/opt/mssql/secrets/passwd # Only readable by root 12.Create availability group resource at cluster level, run following command on any one of the nodes (just in one server and run just one time). sudo pcs resource create ag_cluster1 ocf:mssql:ag ag_name=ag1 meta failure-timeout=60s master notify=true ##check the status 13.Run following bash command in primary replica red1 to create one virtual IP resources. The resource name is 'vip1', and IP address is 192.168.2.111 sudo pcs resource create vip1 ocf:heartbeat:IPaddr2 ip=192.168.2.111 ##check the status Create Availability group listener for Availability group ag1. Run following T-SQL query in primary replica (red1 in this case). ALTER AVAILABILITY GROUP [ag1] ADD LISTENER 'aglistener' (WITH IP ( ('192.168.2.111','255.255.255.0'), ('192.168.4.111','255.255.255.0'), ('192.168.5.111','255.255.255.0') ),PORT = 1433); Run following bash commands to create constraints: sudo pcs constraint colocation add vip1 ag_cluster1-master INFINITY with-rsc-role=Master sudo pcs constraint order promote ag_cluster1-master then start vip1 16.Run following bash command to export the CIB.(you can run the command in any node) sudo pcs cluster cib <filename> 17.You will find following similar entries <primitive class="ocf" id="vip1" provider="heartbeat" type="IPaddr2"> <instance_attributes id="vip1-instance_attributes"> <nvpair id="vip1-instance_attributes-ip" name="ip" value="192.168.2.111"/> </instance_attributes> <operations> <op id="vip1-monitor-interval-10s" interval="10s" name="monitor" timeout="20s"/> <op id="vip1-start-interval-0s" interval="0s" name="start" timeout="20s"/> <op id="vip1-stop-interval-0s" interval="0s" name="stop" timeout="20s"/> </operations> </primitive> 18.Here is the modified version <primitive class="ocf" id="vip1" provider="heartbeat" type="IPaddr2"> <instance_attributes id="vip1-instance_attributes"> <rule id="Subnet1-IP" score="INFINITY" boolean-op="or"> <expression id="Subnet1-Node1" attribute="#uname" operation="eq" value="red1"/> <expression id="Subnet1-Node2" attribute="#uname" operation="eq" value="red2"/> </rule> <nvpair id="vip1-instance_attributes-ip" name="ip" value="192.168.2.111"/> </instance_attributes> <instance_attributes id="vip1-instance_attributes2"> <rule id="Subnet2-IP" score="INFINITY"> <expression id="Subnet2-Node1" attribute="#uname" operation="eq" value="red3"/> </rule> <nvpair id="vip1-instance_attributes-ip2" name="ip" value="192.168.4.111"/> </instance_attributes> <instance_attributes id="vip1-instance_attributes3"> <rule id="Subnet3-IP" score="INFINITY"> <expression id="Subnet3-Node1" attribute="#uname" operation="eq" value="red4"/> </rule> <nvpair id="vip1-instance_attributes-ip3" name="ip" value="192.168.5.111"/> </instance_attributes> <operations> <op id="vip1-monitor-interval-10s" interval="10s" name="monitor" timeout="20s"/> <op id="vip1-start-interval-0s" interval="0s" name="start" timeout="20s"/> <op id="vip1-stop-interval-0s" interval="0s" name="stop" timeout="20s"/> </operations> </primitive> Run following command to import the modified CIB and reconfigure Pacemaker. sudo pcs cluster cib-push <filename> Here are the takeaway points: 1).All nodes in same subnet should be in the same <Instance_attributes> 2).If there are more than one servers in the subnet, the keyword ‘boolean-op="or"’ is a must 3).The IP address of Alwayson Listener is addressed in <nvpair> . 4).The value of id property does not matter, you can specify any value as long as the value is unique. Optional, you can create three entries for the three IP addresses in the DNS server. Here is an screenshot of using SQLCMD to connect the AGListener11KViews3likes1CommentCreate a hybrid Distributed Availability Group Between SQL Windows and SQL Linux
This article describes how to create a Hybrid SQL Server Distributed Availability Group between SQL Linux and SQL Windows Here is the topology of the DAG environment Data Center 1 Data Center 2 Red1:192.168.3.102 (Primary) Red2:192.168.3.103 Availability group :AGLinux Listener:AGLinux-Lis IP:192.168.3.111 SQL Server: 2019 CU2 RHEL 8 Node1:192.168.2.101(Forwarder) Node2:192.168.2.102 Availability group :AGWindows Listener:AGWindows-Lis: IP:192.168.2.111 SQL Server: 2019 CU2 Windows 2016 Distributed Availability group:DAGHybrid Please make use following ports are opened on all the servers in both datacenters. TCP:5022,2224,3121,21064 UDP:5405 For RHEL, you need to have the 'high availability subscription' enabled. Please refer https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-availability-group-cluster-rhel?view=sql... for detail. Section 1:Create Availability group in datacenter1 1. Updates /etc/hosts on all the nodes Datacenter1, add IP , replica name and AG Listeners. Here is a screenshot. All servers in the datacenter1 should have same setting.(Although the IP resources of Listeners will be created in later steps, it’s ok to add at the very beginning) 2. Run following bash commands on all servers in datacenter1 to enable Alwayson Feature. sudo /opt/mssql/bin/mssql-conf set hadr.hadrenabled 1 sudo systemctl restart mssql-server 3.Run following T-SQL to Enable Always On Health check event session on all the servers in datacenter1. ALTER EVENT SESSION AlwaysOn_health ON SERVER WITH (STARTUP_STATE=ON); 4.Create a certificate, run the query on primary replica in datacenter1(red1 in this case) use master go CREATE MASTER KEY ENCRYPTION BY PASSWORD = '**<Master_Key_Password>**'; go CREATE CERTIFICATE dbm_certificate WITH SUBJECT = 'dbm'; go BACKUP CERTIFICATE dbm_certificate TO FILE = '/var/opt/mssql/data/dbm_certificate.cer' WITH PRIVATE KEY ( FILE = '/var/opt/mssql/data/dbm_certificate.pvk', ENCRYPTION BY PASSWORD = '**<Private_Key_Password>**' ); 5. Copy the certificate to rest of the servers in datacenter1(red2 in this case) cd /var/opt/mssql/data scp dbm_certificate.* root@**<nodeName>**:/var/opt/mssql/data/ 6. Give permission to the mssql user to access the certificate files in rest of the servers(red2 in this case).Run following bash command in red2. cd /var/opt/mssql/data chown mssql:mssql dbm_certificate.* 7. Create the certificate on rest of the servers by restoring the certificate backup(red2 in this case) use master go CREATE MASTER KEY ENCRYPTION BY PASSWORD = '**<Master_Key_Password>**' go CREATE CERTIFICATE dbm_certificate FROM FILE = '/var/opt/mssql/data/dbm_certificate.cer' WITH PRIVATE KEY ( FILE = '/var/opt/mssql/data/dbm_certificate.pvk', DECRYPTION BY PASSWORD = '**<Private_Key_Password>**' ) 8.Create endpoint on all servers in datacenter1. The port 5022 is used. CREATE ENDPOINT [Hadr_endpoint] AS TCP (LISTENER_PORT = 5022) FOR DATABASE_MIRRORING ( ROLE = ALL, AUTHENTICATION = CERTIFICATE dbm_certificate, ENCRYPTION = REQUIRED ALGORITHM AES ); ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED; 9.Create AG with 2 synchronous replicas in datacenter1. Run following query on primary replica (red1 in this case) CREATE AVAILABILITY GROUP [AGLinux] WITH (DB_FAILOVER = ON, CLUSTER_TYPE = EXTERNAL) FOR REPLICA ON N'red1' WITH ( ENDPOINT_URL = N'tcp://red1:5022', AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, FAILOVER_MODE = EXTERNAL, SEEDING_MODE = AUTOMATIC), N'red2' WITH ( ENDPOINT_URL = N'tcp://red2:5022', AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, FAILOVER_MODE = EXTERNAL, SEEDING_MODE = AUTOMATIC) go ALTER AVAILABILITY GROUP [AGLinux] GRANT CREATE ANY DATABASE;--grant create any database permission 10.Join the AG group, run the following T-SQL queries on all the secondary servers in datacenter1(red4 in this case) ALTER AVAILABILITY GROUP [AGLinux] JOIN WITH (CLUSTER_TYPE = EXTERNAL); ALTER AVAILABILITY GROUP [AGLinux] GRANT CREATE ANY DATABASE 11.Create database and add to AG group on primary replica.(red1 in this case). CREATE DATABASE [db2]; ALTER DATABASE [db2] SET RECOVERY FULL; BACKUP DATABASE [db2] TO DISK = N'db2.bak'; BACKUP log [db2] TO DISK = N'db2.trn'; GO ALTER AVAILABILITY GROUP [AGLinux] ADD DATABASE [db2]; 12.Create SQL login pacemaker on all servers in datacenter1 CREATE LOGIN [pacemakerLogin] with PASSWORD= N'ComplexP@$$w0rd!' GO ALTER SERVER ROLE [sysadmin] ADD MEMBER [pacemakerLogin] GO 13.Install pacemaker.Run the following bash command on all servers in datacenter1. sudo yum install pacemaker pcs fence-agents-all resource-agents 14.Install SQL Server resource agent for SQL Server. Run the following bash command on all servers in datacenter1. sudo yum install mssql-server-ha 15.Set the password for the default user that is created when installing Pacemaker and Corosync packages. Run following command on all servers in datacenter1(This command asks for password. The password should be same for the servers in same datacenter) sudo passwd hacluster 16.To allow nodes to rejoin the cluster after the reboot, enable and start pcsd service and Pacemaker. Run the following commands on all servers in datacenter1. sudo systemctl enable pcsd sudo systemctl start pcsd sudo systemctl enable pacemaker 17.Create a cluster in datacenter1. Run following commands on any one of the nodes in datacenter1(just on one server, and just one time). You will be asked to provide the username and password you specified in step 15. sudo pcs host auth red1 red2 sudo pcs cluster setup cluster1 red1 red2 sudo pcs cluster start --all sudo pcs cluster enable --all 18.Because the node level fencing configuration depends heavily on your environment, disable it for this tutorial (it can be configured later. In my test, if this value is not set to false, you can’t failover, and IP resource may fail). The following script disables node level fencing(run it on any nodes in datacenter1) sudo pcs property set stonith-enabled=false 20.On all SQL Server Linux servers in datacenter1 , save the credentials for the SQL Server login.(Use the same user name and password created in step12) echo 'pacemakerLogin' >> ~/pacemaker-passwd echo 'ComplexP@$$w0rd!' >> ~/pacemaker-passwd sudo mv ~/pacemaker-passwd /var/opt/mssql/secrets/passwd sudo chown root:root /var/opt/mssql/secrets/passwd sudo chmod 400 /var/opt/mssql/secrets/passwd # Only readable by root 21.Create availability group resource at OS level in datacenter1, run following command on any one of the nodes in datacenter1(just on one node, and just one time)Here are the resource and constraint before creating the resource(no resource/constraint) sudo pcs resource create ag_cluster ocf:mssql:ag ag_name=AGLinux meta failure-timeout=60s promotable notify=true 22.Create virtual IP resource. The resource name is ‘virtualip1’, and IP Address is 192.168.3.111 in this demonstration. This is the IP Address for the AG Listener. sudo pcs resource create virtualip1 ocf:heartbeat:IPaddr2 ip=192.168.3.111 23.Create Availability group listener for Availability group ag1. Run following T-SQL query in primary replica(red1 in this case). The ip address should be as same as the IP in step 22. ALTER AVAILABILITY GROUP [AGLinux] ADD LISTENER 'AGLinux-Lis' (WITH IP(('192.168.3.111','255.255.255.0')),PORT = 1433); a)It’s fine if you run step 23 before the step 22, however, you can’t use the IP Address to connect SQL Server until the resource in Pacemaker is created. b)There is a bug in DMV sys.availability_group_listener_ip_addresses showing the state is offline 24.Add colocation constraint. Please note, without the constraint, the ip resource does not failover when the AG resource failover. Hence we need to create a constraint to guarantee that the IP will failover when cluster resource failover. Run following command on any servers in datacenter1(just on one server, and just one time) sudo pcs constraint colocation add virtualip1 with master ag_cluster-clone INFINITY with-rsc-role=Master 25.Add ordering constraint sudo pcs constraint order promote ag_cluster-clone then start virtualip1 25.To update the property value to 2 minutes run. Run following bash command on any node(just one time) sudo pcs property set cluster-recheck-interval=2min sudo pcs property set start-failure-is-fatal=true pcs resource update ag_cluster meta failure-timeout=60s Section 2:Create Availability group in datacenter2(Actually, you can run this section first as long as the certificate is as same as the one used in Section1). Steps in Section 2 is exactly same to regular SQL Server AG in Windows, the only thing I’d like to highlight is that you have to create the endPoint authenticated by certificate, which is the same certificate you used in SQL Linux. 1.Copy certificate backup in SQL Linux to the two Windows servers. 2.Run following queries to create master key ,certificate and endpoint CREATE MASTER KEY ENCRYPTION BY PASSWORD = '**<Master_Key_Password>**' go CREATE CERTIFICATE dbm_certificate FROM FILE = 'C:\Backup\dbm_certificate.cer' WITH PRIVATE KEY ( FILE = 'C:\Backup\dbm_certificate.pvk', DECRYPTION BY PASSWORD = '**<Private_Key_Password>**' ) go create ENDPOINT [Hadr_endpoint] AS TCP (LISTENER_PORT = 5022) FOR DATABASE_MIRRORING ( ROLE = ALL, AUTHENTICATION = CERTIFICATE dbm_certificate, ENCRYPTION = REQUIRED ALGORITHM AES ); ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED; 3.Create login and grant permission to access the endpoint. CREATE LOGIN InstanceA_Login WITH PASSWORD = 'Strong Passworsdfg1#d'; CREATE USER InstanceA_User FOR LOGIN InstanceA_Login; go alter authorization on CERTIFICATE::dbm_certificate to InstanceA_User GO GRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO InstanceA_Login; 4.Create Availability group CREATE AVAILABILITY GROUP [AGWindows] FOR REPLICA ON N'node1' WITH ( ENDPOINT_URL = N'TCP://node1:5022', FAILOVER_MODE = AUTOMATIC, AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, SEEDING_MODE = AUTOMATIC ), N'node2' WITH ( ENDPOINT_URL = N'TCP://node2:5022', FAILOVER_MODE = AUTOMATIC, AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, SEEDING_MODE = AUTOMATIC ) 5.Create Listener ALTER AVAILABILITY GROUP [AGWindows] ADD LISTENER 'AGWindows-Lis' (WITH IP(('192.168.2.111','255.255.255.0')),PORT = 1433); 6.Run following T-SQL to connect the primary replica on all replicas(node2 in this case) ALTER AVAILABILITY GROUP [AGWindows] JOIN GO ALTER AVAILABILITY GROUP [AGWindows] GRANT CREATE ANY DATABASE; 7.Edit the hosts file on two windows servers, make sure they are able to access the listener AGLinux-Lis. Section 3:Create Hybrid Distributed Availability group 1.Create distributed availability group in datacenter1. Run following query in primary replica in datacenter1( red1 in this case) CREATE AVAILABILITY GROUP [DAGHybrid] WITH (DISTRIBUTED) AVAILABILITY GROUP ON 'AGLinux' WITH ( LISTENER_URL = 'tcp://AGLinux-lis:5022' ,AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT ,FAILOVER_MODE = MANUAL ,SEEDING_MODE = AUTOMATIC ) ,'AGWindows' WITH ( LISTENER_URL = 'tcp://AGWindows-Lis:5022' ,AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT ,FAILOVER_MODE = MANUAL ,SEEDING_MODE = AUTOMATIC ); 2.Run following query on the forward to join the Distributed AG ALTER AVAILABILITY GROUP [DAGHybrid] JOIN AVAILABILITY GROUP ON 'AGLinux' WITH ( LISTENER_URL = 'tcp://AGLinux-lis:5022' ,AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT ,FAILOVER_MODE = MANUAL ,SEEDING_MODE = AUTOMATIC ), 'AGWindows' WITH ( LISTENER_URL = 'tcp://AGWindows-Lis:5022' ,AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT ,FAILOVER_MODE = MANUAL ,SEEDING_MODE = AUTOMATIC ); The article for RHEL 7.6 is here:https://sqlserver.code.blog/2020/02/14/create-a-hybrid-distributed-availability-group-between-sql-windows-and-sql-linux/4.8KViews3likes0CommentsSQL Server Database Mail Failure Troubleshooting & Two Common Issues
This blog discusses troubleshooting steps to investigate the failure of SQL Server Database Mail service on sending email. It also discusses 2 special scenarios with SQL Server Database Mail that commonly cause mail sending failure General Troubleshooting Steps to Narrow Down RCA Special Scenarios: Scenario 1: Implicit SSL/TLS mode is not supported for SQL Server DB Mail. If your SMTP server requires implicit TLS, then SQL Server DB Mail will not be able to send email using this SMTP server. Scenario 2: SQL DB Mail failure caused by SMTP mail server requesting TLS 1.2. In this case, TLS 1.2 needs to be enabled at DB Mail server. Commonly Used Protocols and Port ======================== protocol No encryption TLS/SSL TLS/SSL Plain port Explicit port Implicit port FTP 21 21 990 SMTP 25 or 587 25 or 587 465 IMAP 143 143 993 POP3 110 110 995 Telnet 23 23 992 HTTP 80 - 443 General Troubleshooting Steps to Narrow Down RCA ===================================== Check SQL DB Mail errors from DMV and logs /*List all DB Mail event log*/ SELECT * FROM msdb.dbo.sysmail_event_log ORDER BY log_date DESC /*List all failed items */ SELECT er.log_id AS [LogID], er.event_type AS [EventType], er.log_date AS [LogDate], er.description AS [Description], er.process_id AS [ProcessID], er.mailitem_id AS [MailItemID], er.account_id AS [AccountID], er.last_mod_date AS [LastModifiedDate], er.last_mod_user AS [LastModifiedUser], fi.send_request_user, fi.send_request_date, fi.recipients, fi.subject, fi.body FROM msdb.dbo.sysmail_event_log er LEFT JOIN msdb.dbo.sysmail_faileditems fi ON er.mailitem_id = fi.mailitem_id ORDER BY [LogDate] DESC Check and ensure DB Mail has been enabled sp_configure 'show advanced options', 1; GO RECONFIGURE; GO sp_configure 'Database Mail XPs', 1; GO RECONFIGURE GO Check and ensure the SQL Server service account has permission to call the Database mail’s executable file (DatabaseMail.exe) Example: Install Path:\Program Files\Microsoft SQL Server\MSSQL1.SQL2019\MSSQL\Binn\DatabaseMail.exe Check DB Mail account profile’s security configuration options and port number you have chosen for further troubleshooting purpose The Port Number you have chosen for this account Whether or not you have checked the box “this server requires a secure connection(SSL)” The type of SMTP authentication you have chosen Steps: Open SSMS, select Management, right-click Database Mail, and select Configure Database Mail -> Manage Database Mail accounts and profiles > Next. Telnet Port and IP of SMTP Server. In SQL DB Mail server, telnet the SMTP server to see if SQL machine can communicate with the IP and port of your SMTP (the port number configured in your profile). Most SMTP servers use port 25. If the telnet is not successful, it means the network communication from DB Mail server to SMTP server has caused the problem. If the DB Mail server can communicate with other server without issue, then the problem lies on the network traffic with SMTP server Open CMD as administrator and run below command to telnet your target server IP and port telnet [domain name or ip] [port] For example, to verify connection to 192.168.0.10 on port 25, issue the command: telnet 192.168.0.10 25 Configure TELNET in your server: Windows 7, 8. 10: Open Windows Start menu > Type "Control Panel" > Press Enter > “Programs” > "Programs and Features" > Turn Windows features on or off > Select "Telnet Client" > Press “OK" Windows Server 2012, 2016: Open “Server Manager” > “Add roles and features” > click “Next” until reaching the “Features” step > tick “Telnet Client” > click “Install” > when the feature installation finishes, click “Close”. Run Testing Script to Send email. If telnet to the IP and port from DB Mail server to SMTP server is successful, it means the network from DB server to SMTP server is good. We will need to test and see if we can send DB Mail via PowerShell or other scripts (rule out impact of DB Mail service and see if we could directly send email to your SMTP server to isolate the issue further). Also, we suggest to test and see if we can switch to a public SMTP server for the same DB Mail profile as well as switch to different DB Mail profile to narrow down whether it is profiler issue, SMTP server issue, or SQL Server DB Mail service issue. Below is a sample PowerShell Script to send DB mail $UserCredential = Get-Credential enter this: user:test@chimex.onmicrosoft.com password:yourpassword Send-MailMessage -to "youremail" -from "test@chimex.onmicrosoft.com" -subject "powershell" -body "powershell" -BodyAsHtml -smtpserver smtp.office365.com -usessl -Credential $UserCredential If step 6 is not able to be directly tested in your environment, kindly collect a network monitor trace when you reproduce the issue (failed to send DB Mail) If you do not see clear evidence to find RCA in the network trace for client or SMTP server reset, based on the symptom, you may need to collect a TTT debug trace for dbmail.exe and work with Microsoft support. Special Scenarios: =============== Scenario 1: Implicit TLS mode is not supported for SQL Server DB Mail. If your SMTP server requires implicit TLS, then SQL Server DB Mail will not be able to send email with this SMTP server. You can consider the options of switching to a different SMTP Server (such as a public one) that support STARTTLS or making modification in your SMTP server to support STARTTLS If your DB Mail profile has enabled SSL encryption, then STARTTLS would be required in the secured communication between your DB Mail server and SMTP Server. For SQL Server Database Mail architecture, as our service relies on .Net System.Net.Mail (SmtpClient class) and System.Net.Mail does not support “Implicit SSL/TLS“mode, SQL Server Database Mail does not support it as well. In the past, port 465 can be used to support “Implicit SSL/TLS“mode (SMTP over SSL). Now it is no longer supported. (more details , kindly refer SmtpClient.EnableSsl Property (System.Net.Mail) | Microsoft Docs ) More Details with STARTTLS and “Implicit SSL/TLS” --------------------------------------------------------------------- “STARTTLS” is an email protocol command that would turn an insecure network connection into a secured one. If email client server has enabled/request SSL or TLS encryption, then STARTTLS would be required for SQL Server DB Mail service to successfully send email. “Implicit SSL/TLS” is another mode for secured client to server communication. The major two differences between “STARTTLS” and “Implicit SSL/TLS” are summarized as below (while there are a couple of other differences) : With the “Implicit SSL/TLS” mode, email client server connects to the SMTP server and TLS/SSL encryption is switched on implicitly as soon as the connection is established while under “STARTTLS” mode, client explicitly requests TLS/SSL encryption to be switched on after initial TCP handshake. With “Implicit SSL/TLS” mode, if the connection is not able to be built with encrypted security mode, the email will be prevented from being sent. However, with “STARTTLS”, if the SMTP mail server not able to support TLS encryption as requested by email client server, the email client server will negotiate with the SMTP server and agree to downgrade to an unencrypted connection. Thus, with “STARTTLS”, you can use the same port for encrypted and plain text mail. Scenario 2: SQL DB Mail failure caused by SMTP mail server requesting TLS 1.2. In this case, TLS 1.2 needs to be enabled at DB Mail server. Multiple DB Mail cases have seen its failure being caused by client DB Mail server not supporting TLS 1.2 while the SMTP mail server is requesting TLS 1.2. Unfortunately, most often the error observed in Database Mail Log Viewer (sysmail_event_log) is very generic as below. From network monitor trace, you may not see any STARTTLS traffic as the connection can be reset by DB mail server after initial handshake. The mail could not be sent to the recipients because of the mail server failure. (Sending Mail using Account 2 (2022-03-14T16:58:54). Exception Message: Cannot send mails to mail server. (Failure sending mail.). ) If Schannel errors are detected in Windows system event logs around the same time when DB Mail failure happens, you are suggested to use below troubleshooting steps to see if the failure is related to TLS 1.2 requested by SMTP server. Use the first 7 steps mentioned earlier to narrow down other issues that can play a role in the scene. Check system event log for any Schannel errors (or any errors) from the same time as the database mail failure occurred Below are a few examples but there could be others. Any Schannel error occurring at the same time as the database mail failure should be cautiously checked. {timestamp},Error,{servername},36887,Schannel,A fatal alert was received from the remote endpoint. The TLS protocol defined fatal alert code is 70. {timestamp},Error,0,36871,Schannel,{servername},A fatal error occurred while creating a TLS client credential. The internal error state is 10013. If Schannel errors are spotted at the same time when DB Mail fails which can easy to be reproduced, check and ensure your SQL Server and Windows has the necessary hotfixes to support TLS 1.2 (refer link https://support.microsoft.com/en-us/topic/kb3135244-tls-1-2-support-for-microsoft-sql-server-e4472ef8-90a9-13c1-e4d8-44aad198cdbe ) Check registry settings in DB Mail server and make sure the needful ones are present. Reboot is needed for the change to take effects. (Please first take a backup of your registry key and store them in another machine before you make any change! ) [HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\.NETFramework\v2.0.50727] "SystemDefaultTlsVersions"=dword:00000001 "SchUseStrongCrypto"=dword:00000001 [HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Microsoft\.NETFramework\v4.0.30319] "SystemDefaultTlsVersions"=dword:00000001 "SchUseStrongCrypto"=dword:00000001 [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework\v2.0.50727] "SystemDefaultTlsVersions"=dword:00000001 "SchUseStrongCrypto"=dword:00000001 [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework\v4.0.30319] "SystemDefaultTlsVersions"=dword:00000001 "SchUseStrongCrypto"=dword:00000001 [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS 1.2] [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS 1.2\Client] "DisabledByDefault"=dword:00000000 "Enabled"=dword:00000001 [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS 1.2\Server] "DisabledByDefault"=dword:00000000 "Enabled"=dword:00000001 DISCLAIMER : THIS CODE AND INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.45KViews2likes0CommentsTutorial: Create SQL Cluster(FCI) on RHEL
In the Windows world, SQL Server integrates into Windows Server Failover Cluster (WSFC) natively, and we have a dedicated installation for SQL Server Cluster. However, on Linux, you need to install standalone SQL Server instance in the nodes first, and then configure the instance as a SQL Server cluster instance. I use SQL Server 2019 with RHEL 8.x in this tutorial, but it is possible to use SQL Server 2017 in RHEL 7.x or RHEL 8 to configure FCI. Here is the step by step Video Topology 1.ISCSI target server configuration The two highlighted disks will be the used as Shared Storage. sdc is for the database files in /var/opt/mssql/data sdd is for the user databases files. If all your databases are stored /var/opt/mssql/data, feel free to ignore all the steps link to device sdd. 2.Run fdisk to create partition. fdisk /dev/sdc fdisk /dev/sdd Run lsblk again 3.Install targetcli package. yum -y install targetcli 4.Run targetcli to create ISCSI target server. I created two ACLs entries for the node1 and node2: iqn.2020-08.com.contoso:node1 iqn.2020-08.com.contoso:node2 I’m not going to dig into the targetcli command, please review this article for detail. And I have following iscsi settings: 5.Run following bash command to expose tcp port 3260 firewall-cmd --add-port=3260/tcp --permanent firewall-cmd --reload Enable and restart the target service. systemctl enable target.service systemctl restart target.service 2.ISCSI initiator configuration. Choose one of the servers that will participate in the FCI configuration. It does not matter which one. I use node1 in this tutorial. Please note, All the steps in this section are performed in node1, unless stated otherwise. 1.Install iscsi-initiator-utils in all nodes. sudo yum install iscsi-initiator-utils -y 2.Edit the /etc/iscsi/initiatorname.iscsi , replace the existing value with following keywords, the one I used in step 4 of section[ISCSI target server configuration] InitiatorName=iqn.2020-08.com.contoso:node1 3.Discover iSCSI disk of the target. iscsiadm -m discovery -t st -p <ip of iscsi target server> Here is the command in this tutorial. iscsiadm -m discovery -t st -p 10.1.0.8 (Optional step)After the discovery below database is updated. ls -l /var/lib/iscsi/nodes 6.Make the connection to iscsi target. iscsiadm --mode node --targetname iqn.2020-08.com.contoso:servers --login (Optional step)After logging in, a session with the iSCSI target is established. iscsiadm --mode node -P 1 (Optional step)If you review the messages file, you will see following keywords sudo grep "Attached SCSI" /var/log/messages Aug 6 01:38:21 localhost kernel: sd 3:0:1:0: [sdb] Attached SCSI disk Aug 6 01:38:21 localhost kernel: sd 2:0:0:0: [sda] Attached SCSI disk Aug 6 04:26:01 localhost kernel: sd 6:0:0:0: [sdc] Attached SCSI disk Aug 6 04:26:01 localhost kernel: sd 6:0:0:1: [sdd] Attached SCSI disk 9.Create physical volumes on the iSCSI disks. sudo pvcreate /dev/sdc sudo pvcreate /dev/sdd 10.Create volume groups ‘FCIDataVG1’ and ‘FCIDataVG2’ on the iSCSI disk. sudo vgcreate FCIDataVG1 /dev/sdc sudo vgcreate FCIDataVG2 /dev/sdd Create logical name for the two groups. Run following commands in node1. sudo lvcreate -L599G -n FCIDataLV1 FCIDataVG1 sudo lvcreate -L499G -n FCIDataLV2 FCIDataVG2 Check the device Format the logical volume with a supported filesystem. sudo mkfs.xfs /dev/FCIDataVG1/FCIDataLV1 sudo mkfs.xfs /dev/FCIDataVG2/FCIDataLV2 13.Repeat the step1~step6 in rest of the nodes. Please note, 1)Do not do step6 in rest of node2 before step12 is completed in node1. Else you maybe not able to failover. 2)All the steps but step 2 are exactly same. Here is the value for node2 3)After the step6 is executed in rest of the nodes, you will see the same devices as node1. Here is a screenshot of node2 after step 6 is executed. 3.SQL Server configuration. Please note, All the steps in section are performed in node1, unless stated otherwise. 1.Run following queries to create login used by pacemaker CREATE LOGIN [sqlpackmaker] with PASSWORD= N'YourStrongP@ssword1' ALTER SERVER ROLE [sysadmin] ADD MEMBER [sqlpackmaker] 2.Drop the default server name and create a new server name. The new server name is SQL Virutal name. exec sp_dropserver node1 go exec sp_addserver 'sqlvirtualname1','local' Restart SQL Server to take effect. sudo systemctl stop mssql-server sudo systemctl restart mssql-server 4.Run following queries to check change. select @@servername, SERVERPROPERTY('ComputernamephysicalNetBIOS') 5.Stop SQL Server in all nodes(node1,node2 ). sudo systemctl stop mssql-server 6.Copy the /var/opt/mssql/secrets/machine-key of node1 to node2. 7.Create temporary directories to store the SQL Server data and log files. mkdir /var/opt/mssql/tempdir mkdir /var/opt/mssql/tempuserdata 8.Copy the SQL Server data and log files to the temporary directories. cp /var/opt/mssql/data/* /var/opt/mssql/tempdir/ cp /var/opt/mssql/userdata/* /var/opt/mssql/tempuserdata/ 9.Delete the files from the existing SQL Server data directory in node1. rm -f /var/opt/mssql/data/* rm -f /var/opt/mssql/userdata/* 10.Mount the iSCSI logical volume in the SQL Server data folder. mount /dev/<VolumeGroupName>/<LogicalVolumeName> <FolderName> Here are the commands in this tutorial. mount /dev/FCIDataVG1/FCIDataLV1 /var/opt/mssql/data mount /dev/FCIDataVG2/FCIDataLV2 /var/opt/mssql/userdata 11.Change the owner of the mount to mssql. chown mssql:mssql /var/opt/mssql/data chown mssql:mssql /var/opt/mssql/userdata 12.Change ownership of the group of the mount to mssql. chgrp mssql /var/opt/mssql/data chgrp mssql /var/opt/mssql/userdata 13.Copy the files from temp folders back to /var/opt/mssql/data and /var/opt/mssql/userdata. cp /var/opt/mssql/tempdir/* /var/opt/mssql/data cp /var/opt/mssql/tempuserdata/* /var/opt/mssql/userdata 14.Check the two temp folders and make sure the files are copied. Change ownership of files to mssql. chown mssql:mssql /var/opt/mssql/data/* chown mssql:mssql /var/opt/mssql/userdata/* 16.Configure auto mount to make the OS mount the devices automatically. 1) makeGet the UUID. Please downdown the UUID ,TYPE and directory blkid /dev/FCIDataVG1/FCIDataLV1 blkid /dev/FCIDataVG2/FCIDataLV2 2).Edit /etc/fstab to configure auto mount in node1,node2. Here is a screenshot in node1. Please review this article for more detail 4.Cluster configuration. 1.Edit /etc/hosts to speicfy the node and ips in node1 and node1. Do the same thing in node1 and node2 2.Create a file to store the SQL Server username and password for the Pacemaker login. Run the following command in node1 and node2 (The same login name and password specified in step 1 of section [SQL Server configuration] sudo touch /var/opt/mssql/secrets/passwd sudo echo 'sqlpackmaker' >> /var/opt/mssql/secrets/passwd sudo echo 'YourStrongP@ssword1' >> /var/opt/mssql/secrets/passwd sudo chown root:root /var/opt/mssql/secrets/passwd sudo chmod 600 /var/opt/mssql/secrets/passwd 3.On both cluster nodes, open the Pacemaker firewall ports. To open these ports with firewalld, run the following command in node 1 and node2 sudo firewall-cmd --permanent --add-service=high-availability sudo firewall-cmd --reload 4.Install Pacemaker packages in node 1 and node2 sudo yum install pacemaker pcs fence-agents-all resource-agents 5.Set the password for the default user that is created when installing Pacemaker and Corosync packages. Use the same password in node 1 and node2 sudo passwd hacluster 6.To allow nodes to rejoin the cluster after the reboot, enable and start pcsd service and Pacemaker. Run the following command in node 1 and node2 sudo systemctl enable pcsd sudo systemctl start pcsd sudo systemctl enable pacemaker 7.Create the cluster. Run following command in node1. The password should be as same as the one in step 5. sudo pcs host auth node1 node2 sudo pcs cluster setup sqlcluster node1 node2 sudo pcs cluster start --all sudo pcs cluster enable --all 8.Disable the stonith-enabled for test purpose. Run following command in node1. sudo pcs property set stonith-enabled=false 9.Install the FCI resource agent for SQL Server. Run the following commands in node1 and node2 sudo yum install mssql-server-ha 10.Create disk resource and this resource belongs to a resource group(RGfci in this demo). Run following command in node1 sudo pcs resource create iSCSIDisk1 Filesystem device="/dev/FCIDataVG1/FCIDataLV1" directory="/var/opt/mssql/data" fstype="xfs" --group fci sudo pcs resource create iSCSIDisk2 Filesystem device="/dev/FCIDataVG2/FCIDataLV2" directory="/var/opt/mssql/userdata" fstype="xfs" --group fci 11. Create IP resource that will be used by FCI, and this resource belongs to the same resource group created in previous step. sudo pcs resource create vip2 ocf:heartbeat:IPaddr2 ip=10.1.0.111 nic=eth0 cidr_netmask=24 --group fci 12.Create FCI resource. The resource name should be exactly same to the SQL Virtual name created in step 2 in section [SQL Server configuration] sudo pcs resource create sqlvirtualname1 ocf:mssql:fci --group fci Failover === sudo pcs resource move sqlvirtualname1 <NodeName> More:Add a node to existing cluster3.8KViews2likes0Comments