High Availability
114 TopicsTroubleshooting REDO queue build-up (data latency issues) on AlwaysOn Readable Secondary Replicas using the WAIT_INFO Extended Event
First published on MSDN on Jan 06, 2015 PROBLEM You have confirmed excessive build up of REDO queue on an AlwaysOn Availability Group secondary replica by one of the following methods: Querying sys.52KViews1like5CommentsConfigure multiple-subnet AlwaysOn Availability Group by modifying CIB
In the Windows world, a Windows Server Failover Cluster (WSFC) natively supports multiple subnets and handles multiple IP addresses via an OR dependency on the IP address. On Linux, there is no OR dependency, but there is a way to achieve a proper multi-subnet natively with Pacemaker, as shown by the following. You cannot do this by simply using the normal Pacemaker command line to modify a resource. You need to modify the cluster information base (CIB). The CIB is an XML file with the Pacemaker configuration. Here is an example to create a SQL Server Linux Availability group in 4 nodes in 3 subnets in RHEL 7.6 If you are already familiar with the AG Group setup process, please just jump to step 16. 1.Register your subscription on for all servers (red1,red2,red3 and red4 in this case) subscription-manager register 2.List all available subscription, pick the one with High Availabiilty , notedown the pool id subscription-manager list --available --all 3.Register the subscription for all nodes (red1,red2,red3 and red4 in this case) sudo subscription-manager attach --pool=xxxxx 4.Enable the repository(red1,red2,red3 and red4 in this case) sudo subscription-manager repos --enable=rhel-ha-for-rhel-7-server-rpms 5.Install Pacemaker packages on all nodes. (red1,red2,red3 and red4 in this case) sudo yum install pacemaker pcs fence-agents-all resource-agents 6.Install SQL Server resource agent (red1,red2,red3 and red4 in this case) sudo yum install mssql-server-ha 7.Set the password for the default user that is created when installing Pacemaker and Corosync packages. All the password should be exactly same (red1,red2,red3 and red4 in this case) sudo passwd hacluster 8.Update /etc/hosts file in all servers, add IP and node name. All the servers should have the same entries. 192.168.2.103 red1 192.168.2.104 red2 192.168.4.100 red3 192.168.5.101 red4 9.Run following commands to Enable and start pcsd service and Pacemaker in all nodes. (red1,red2 and red3 and red4 in this case) sudo systemctl enable pcsd sudo systemctl start pcsd sudo systemctl enable pacemaker 10.Run following commands to Create Cluster in primary replica node (red1 in this case) sudo pcs cluster auth red1 red2 red3 red4 -u hacluster -p YouPasswordUsedinStep7 sudo pcs cluster setup --name sqlcluster1 red1 red2 red3 red4 sudo pcs cluster start --all sudo pcs cluster enable --all 11.Run following command to Enable cluster feature in all nodes(red1,red2 , red3 and red4 in this case) sudo /opt/mssql/bin/mssql-conf set hadr.hadrenabled 1 sudo systemctl restart mssql-server Create AG and Listener 1.Run following queries in red1 to create certificate use master go CREATE MASTER KEY ENCRYPTION BY PASSWORD = '**<Master_Key_Password>**'; go CREATE CERTIFICATE dbm_certificate WITH SUBJECT = 'dbm'; go BACKUP CERTIFICATE dbm_certificate TO FILE = '/var/opt/mssql/data/dbm_certificate.cer' WITH PRIVATE KEY ( FILE = '/var/opt/mssql/data/dbm_certificate.pvk', ENCRYPTION BY PASSWORD = '**<Private_Key_Password>**' ); 2.Run following commands in red1 to copy the certificate to rest of the servers(red2,red3 and red4 in this case) cd /var/opt/mssql/data scp dbm_certificate.* root@red2:/var/opt/mssql/data/ scp dbm_certificate.* root@red3:/var/opt/mssql/data/ scp dbm_certificate.* root@red4:/var/opt/mssql/data/ 3.Give permission to the mssql user to access the certificate files in rest of the servers(red2,red3 and red4 in this case) cd /var/opt/mssql/data chown mssql:mssql dbm_certificate.* 4.Run following T-SQL queries to create the certificate in rest of the nodes by restoring the certificate backup file (red2,red3 and red4 in this case) use master go CREATE MASTER KEY ENCRYPTION BY PASSWORD = '**<Master_Key_Password>**' go CREATE CERTIFICATE dbm_certificate FROM FILE = '/var/opt/mssql/data/dbm_certificate.cer' WITH PRIVATE KEY ( FILE = '/var/opt/mssql/data/dbm_certificate.pvk', DECRYPTION BY PASSWORD = '**<Private_Key_Password>**' ) 5.Create endpoint in all servers (red1,red2,red3 and red4) CREATE ENDPOINT [Hadr_endpoint] AS TCP (LISTENER_PORT = 5022) FOR DATABASE_MIRRORING ( ROLE = ALL, AUTHENTICATION = CERTIFICATE dbm_certificate, ENCRYPTION = REQUIRED ALGORITHM AES ); ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED; 6.Run following query in primary replica (red1) to create Availability group(Please note, it works for SQL 2019. If you are using SQL 2017, you need to change AVAILABILITY_MODE of one the replica to ASYNCHRONOUS_COMMIT) CREATE AVAILABILITY GROUP [ag1] WITH (DB_FAILOVER = ON, CLUSTER_TYPE = EXTERNAL) FOR REPLICA ON N'red1' WITH ( ENDPOINT_URL = N'tcp://red1:5022', AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, FAILOVER_MODE = EXTERNAL, SEEDING_MODE = AUTOMATIC) , N'red2' WITH ( ENDPOINT_URL = N'tcp://red2:5022', AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, FAILOVER_MODE = EXTERNAL, SEEDING_MODE = AUTOMATIC), N'red3' WITH ( ENDPOINT_URL = N'tcp://red3:5022', AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, FAILOVER_MODE = EXTERNAL, SEEDING_MODE = AUTOMATIC), N'red4' WITH ( ENDPOINT_URL = N'tcp://red4:5022', AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, FAILOVER_MODE = EXTERNAL, SEEDING_MODE = AUTOMATIC) ALTER AVAILABILITY GROUP [ag1] GRANT CREATE ANY DATABASE;--grant create any database permission Join the AG group, run the following T-SQL queries in all the secondary servers (red2,red3 and red4 in this case) ALTER AVAILABILITY GROUP [ag1] JOIN WITH (CLUSTER_TYPE = EXTERNAL); ALTER AVAILABILITY GROUP [ag1] GRANT CREATE ANY DATABASE 8.Run following T-SQL Queries to create database and add it to AG group in primary replica (red1 in this case). CREATE DATABASE [db1]; ALTER DATABASE [db1] SET RECOVERY FULL; BACKUP DATABASE [db1] TO DISK = N'/var/opt/mssql/data/db1.bak'; BACKUP log [db1] TO DISK = N'/var/opt/mssql/data/db1.trn'; GO ALTER AVAILABILITY GROUP [ag1] ADD DATABASE [db1]; 9.Create SQL login pacemaker in all servers (red1,red2,red3 and red4 in this case). CREATE LOGIN [pacemakerLogin] with PASSWORD= N'ComplexP@$$w0rd!' GO ALTER SERVER ROLE [sysadmin] ADD MEMBER [pacemakerLogin] 10.Run following bash command in red1 sudo pcs property set stonith-enabled=false In all SQL Server Linux servers , run following bash commands to save the credentials for the SQL Server login.(red1,red2,red3 and red4) (The password is as same as the one used in step 9) echo 'pacemakerLogin' >> ~/pacemaker-passwd echo 'ComplexP@$$w0rd!' >> ~/pacemaker-passwd sudo mv ~/pacemaker-passwd /var/opt/mssql/secrets/passwd sudo chown root:root /var/opt/mssql/secrets/passwd sudo chmod 400 /var/opt/mssql/secrets/passwd # Only readable by root 12.Create availability group resource at cluster level, run following command on any one of the nodes (just in one server and run just one time). sudo pcs resource create ag_cluster1 ocf:mssql:ag ag_name=ag1 meta failure-timeout=60s master notify=true ##check the status 13.Run following bash command in primary replica red1 to create one virtual IP resources. The resource name is 'vip1', and IP address is 192.168.2.111 sudo pcs resource create vip1 ocf:heartbeat:IPaddr2 ip=192.168.2.111 ##check the status Create Availability group listener for Availability group ag1. Run following T-SQL query in primary replica (red1 in this case). ALTER AVAILABILITY GROUP [ag1] ADD LISTENER 'aglistener' (WITH IP ( ('192.168.2.111','255.255.255.0'), ('192.168.4.111','255.255.255.0'), ('192.168.5.111','255.255.255.0') ),PORT = 1433); Run following bash commands to create constraints: sudo pcs constraint colocation add vip1 ag_cluster1-master INFINITY with-rsc-role=Master sudo pcs constraint order promote ag_cluster1-master then start vip1 16.Run following bash command to export the CIB.(you can run the command in any node) sudo pcs cluster cib <filename> 17.You will find following similar entries <primitive class="ocf" id="vip1" provider="heartbeat" type="IPaddr2"> <instance_attributes id="vip1-instance_attributes"> <nvpair id="vip1-instance_attributes-ip" name="ip" value="192.168.2.111"/> </instance_attributes> <operations> <op id="vip1-monitor-interval-10s" interval="10s" name="monitor" timeout="20s"/> <op id="vip1-start-interval-0s" interval="0s" name="start" timeout="20s"/> <op id="vip1-stop-interval-0s" interval="0s" name="stop" timeout="20s"/> </operations> </primitive> 18.Here is the modified version <primitive class="ocf" id="vip1" provider="heartbeat" type="IPaddr2"> <instance_attributes id="vip1-instance_attributes"> <rule id="Subnet1-IP" score="INFINITY" boolean-op="or"> <expression id="Subnet1-Node1" attribute="#uname" operation="eq" value="red1"/> <expression id="Subnet1-Node2" attribute="#uname" operation="eq" value="red2"/> </rule> <nvpair id="vip1-instance_attributes-ip" name="ip" value="192.168.2.111"/> </instance_attributes> <instance_attributes id="vip1-instance_attributes2"> <rule id="Subnet2-IP" score="INFINITY"> <expression id="Subnet2-Node1" attribute="#uname" operation="eq" value="red3"/> </rule> <nvpair id="vip1-instance_attributes-ip2" name="ip" value="192.168.4.111"/> </instance_attributes> <instance_attributes id="vip1-instance_attributes3"> <rule id="Subnet3-IP" score="INFINITY"> <expression id="Subnet3-Node1" attribute="#uname" operation="eq" value="red4"/> </rule> <nvpair id="vip1-instance_attributes-ip3" name="ip" value="192.168.5.111"/> </instance_attributes> <operations> <op id="vip1-monitor-interval-10s" interval="10s" name="monitor" timeout="20s"/> <op id="vip1-start-interval-0s" interval="0s" name="start" timeout="20s"/> <op id="vip1-stop-interval-0s" interval="0s" name="stop" timeout="20s"/> </operations> </primitive> Run following command to import the modified CIB and reconfigure Pacemaker. sudo pcs cluster cib-push <filename> Here are the takeaway points: 1).All nodes in same subnet should be in the same <Instance_attributes> 2).If there are more than one servers in the subnet, the keyword ‘boolean-op="or"’ is a must 3).The IP address of Alwayson Listener is addressed in <nvpair> . 4).The value of id property does not matter, you can specify any value as long as the value is unique. Optional, you can create three entries for the three IP addresses in the DNS server. Here is an screenshot of using SQLCMD to connect the AGListener10KViews3likes1CommentRecommendations for Index Maintenance with AlwaysOn Availability Groups
First published on MSDN on Mar 03, 2015 SYMPTOMSConsider the following scenarioThe database is part of AlwaysOn Availability GroupsYou run long and log-intensive transactions like Index maintenance/rebuildsYou observe one or more of the following symptoms:Poor performing DML operations in availability databases on the primary replica if synchronous secondary replicas are present.39KViews0likes2CommentsWhat is causing the Always On AG issue? Is it cluster, AD, DNS, or SQL?
AG (Always On Availability Groups) is a logical container that holds one or more user databases that failover together. So, if there is an issue with AG, where do you look? This blog post is an index map of the resources that are currently available and the ones to come. This post also provides pointers on which support topic to select if you choose to open a support case with CSS but why open a CSS case if you can quickly resolve the issue with minimal business impact based on the helpful resources included here?23KViews4likes2CommentsLooksAlive and IsAlive Implementation of Availability Groups failure_condition_level
First published on MSDN on Sep 12, 2013 AlwaysOn availability groups introduce the new flexible failover policy for SQL Server instance health monitoring for the AlwaysOn availability group resource.5.5KViews0likes1CommentConnection Timeouts in Multi-subnet Availability Group
First published on MSDN on Jun 03, 2014 THE DEFINITION One of the issues that generates a lot of call volume we see on the AlwaysOn team is dealing with connectivity issues to the availability group listener in multi-subnet environments.57KViews0likes2CommentsHow It Works: Always On–When Is My Secondary Failover Ready?
First published on MSDN on Apr 22, 2013 I keep running into the question: “When will my secondary allow automatic failover?” Based on the question I did some extended research and I will try to summarize in is blog post.5.9KViews0likes1Comment