Create Distributed Availability Group in Sql Server Linux(Redhat) --- step by step
Published Nov 14 2019 10:33 AM 8,193 Views
Microsoft

This article describes how to create a SQL Server Distributed Availability Group (AG) for high availability on RHEL 7.6.

Here is the topology of the DAG environment

Data Center 1

Data Center 2

  • Cen1:192.168.1.61 (Primary replica)
  • Cen2:192.168.1.62
  • Cen3:192.168.1.63
  • Availability group name:AG1
  • ag1-listener:192.168.1.65
  • SQL Server: 2017 CU17
  • Cen4:192.168.1.71 (Forwarder)
  • Cen5:192.168.1.72
  • Availability group name:AG2
  • ag2-listener:192.168.1.75
  • SQL Server: 2017 CU17

Please make use following ports are opened on all the servers in both datacenters.

TCP:5022,2224,3121,21064

UDP:5405

 

For RHEL, you need to have the 'high availability subscription' enabled. Please refer https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-availability-group-cluster-rhel?view=sql... for detail.

 

Reference:

1.Some of commands are changed in RHEL 8, please refer this article if you'd like to create AG in RHEL 8.

2.Create Hybrid DAG in RHEL 7.6

 

Section 1:Create Availability group AG1 in datacenter1

1. Updates /etc/hosts on all the nodes Datacenter1, add IP , replica name and AG Listener name.

Here is a screenshot. All servers in the datacenter1 should have same setting.(Please note, the IP resources of Listeners will be created in later steps, it's ok to add at the very beginning)

clipboard_image_0.png

2. Run following bash commands on all servers in datacenter1 to enable Alwayson Feature

sudo /opt/mssql/bin/mssql-conf set hadr.hadrenabled 1
sudo systemctl restart mssql-server    

clipboard_image_1.png

3. Run following T-SQL to Enable Always On Health check event session on all the servers in datacenter1

ALTER EVENT SESSION AlwaysOn_health ON SERVER WITH (STARTUP_STATE=ON);

4. Create a certificate, run the query on primary replica in datacenter1(cen1 in this case)

use master
go
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '**<Master_Key_Password>**';
go
CREATE CERTIFICATE dbm_certificate WITH SUBJECT = 'dbm';
go
BACKUP CERTIFICATE dbm_certificate TO FILE = '/var/opt/mssql/data/dbm_certificate.cer'
WITH PRIVATE KEY (
FILE = '/var/opt/mssql/data/dbm_certificate.pvk',
ENCRYPTION BY PASSWORD = '**<Private_Key_Password>**'
);

5. Copy the certificate to rest of the servers in datacenter1(cen2,cen3 in this case)

cd /var/opt/mssql/data
scp dbm_certificate.* root@**<nodeName>**:/var/opt/mssql/data/    
clipboard_image_2.png

6. Give permission to the mssql user to access the certificate files in rest of the servers(cen2,cen3 in this case)

cd /var/opt/mssql/data
chown mssql:mssql dbm_certificate.*    

7. Create the certificate on rest of the servers by restoring the certificate backup(cen2,cen3 in this case)

use master
go
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '**<Master_Key_Password>**'
go
CREATE CERTIFICATE dbm_certificate
FROM FILE = '/var/opt/mssql/data/dbm_certificate.cer'
WITH PRIVATE KEY (
FILE = '/var/opt/mssql/data/dbm_certificate.pvk',
DECRYPTION BY PASSWORD = '**<Private_Key_Password>**'
)

8. Create endpoint on all servers in datacenter1. The port 5022 is used.

CREATE ENDPOINT [Hadr_endpoint]
    AS TCP (LISTENER_PORT = 5022)
    FOR DATABASE_MIRRORING (
    ROLE = ALL,
    AUTHENTICATION = CERTIFICATE dbm_certificate,
    ENCRYPTION = REQUIRED ALGORITHM AES
);
ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED;

9. Create AG with 3 synchronous replicas in datacenter1. Run following query on primary replica (cen1 in this case)

CREATE AVAILABILITY GROUP [ag1]
    WITH (DB_FAILOVER = ON, CLUSTER_TYPE = EXTERNAL)
    FOR REPLICA ON
    N'cen1'
        WITH (
        ENDPOINT_URL = N'tcp://cen1:5022',
        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
        FAILOVER_MODE = EXTERNAL,
        SEEDING_MODE = AUTOMATIC) ,
    N'cen2'
        WITH (
        ENDPOINT_URL = N'tcp://cen2:5022',
        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
        FAILOVER_MODE = EXTERNAL,
        SEEDING_MODE = AUTOMATIC),
    N'cen3'
        WITH (
        ENDPOINT_URL = N'tcp://cen3:5022',
        AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,
        FAILOVER_MODE = EXTERNAL,
        SEEDING_MODE = AUTOMATIC)
GO

ALTER AVAILABILITY GROUP
[ag1] GRANT CREATE ANY DATABASE;--grant create any database permission

Please make sure the port 5022 is in the firewall exception list, else you may run into following error messages

2019-09-11 16:10:18.560        spid40s        Error: 8471, Severity: 16, State: 2.

2019-09-11 16:10:18.560        spid40s        An SNI call failed during a Service Broker/Database Mirroring transport operation. SNI error '10022(An invalid argument was supplied.)'.

2019-09-11 16:10:18.570        spid40s        Error: 9642, Severity: 16, State: 1.

2019-09-11 16:10:18.570        spid40s        An error occurred in a Service Broker/Database Mirroring transport connection endpoint, Error: 8471, State: 2. (Near endpoint role: Initiator, far endpoint address: '1')

10. You will see below message in errorlog file of primary replica. The timeout happens because the secondary replicas have not joined the AG group

2019-09-05 20:00:44.530 spid62       The state of the local availability replica in availability group 'ag1' has changed from 'NOT_AVAILABLE' to 'RESOLVING_NORMAL'. The state changed because the local availability replica is joining the availability group. For more information, see the SQL

2019-09-05 20:00:44.570 spid62       Always On: The local replica of availability group 'ag1' is preparing to transition to the primary role. This is an informational message only. No user action is required.

2019-09-05 20:00:44.610 spid62       The state of the local availability replica in availability group 'ag1' has changed from 'RESOLVING_NORMAL' to 'PRIMARY_PENDING'. The state changed because the availability group is coming online. For more information, see the SQL Server error log or clu

2019-09-05 20:00:44.620 Server       The state of the local availability replica in availability group 'ag1' has changed from 'PRIMARY_PENDING' to 'PRIMARY_NORMAL'. The state changed because the local replica has completed processing Online command from Windows Server Failover Clustering (WS

2019-09-05 20:00:54.630 spid39s     A connection timeout has occurred while attempting to establish a connection to availability replica 'cen2' with id [6C1CD374-CB97-477A-B8DB-589D94A0FCB6]. Either a networking or firewall issue exists, or the endpoint address provided for the replica is no

2019-09-05 20:00:54.650 spid40s     A connection timeout has occurred while attempting to establish a connection to availability replica 'cen3' with id [D8E1CE2A-10CF-409B-98B1-9D3B35FEE91A]. Either a networking or firewall issue exists, or the endpoint address provided for the replica is no

11. Join the AG group, run the following T-SQL queries on all the secondary servers in datacenter1(cen2,cen3 in this case)

ALTER AVAILABILITY GROUP [ag1] JOIN WITH (CLUSTER_TYPE = EXTERNAL);
ALTER AVAILABILITY GROUP [ag1] GRANT CREATE ANY DATABASE

clipboard_image_3.png

12. You will see below message in errorlog file of primary replica. Connection established with two secondary replicas.

2019-09-05 20:01:23.820 spid11s     A connection for availability group 'ag1' from availability replica 'cen1' with id [03D58956-AAC2-4A1D-975E-3F3B788752A2] to 'cen2' with id [6C1CD374-CB97-477A-B8DB-589D94A0FCB6] has been successfully established. This is an informational message only. N

2019-09-05 20:02:04.950 spid31s     A connection for availability group 'ag1' from availability replica 'cen1' with id [03D58956-AAC2-4A1D-975E-3F3B788752A2] to 'cen3' with id [D8E1CE2A-10CF-409B-98B1-9D3B35FEE91A] has been successfully established. This is an informational message only. N

13. Create database and add to AG group on primary replica.(cen1 in this case).

CREATE DATABASE [db1];
ALTER DATABASE [db1] SET RECOVERY FULL;
BACKUP DATABASE [db1] TO DISK = N'/var/opt/mssql/data/db1.bak';
BACKUP log [db1] TO DISK = N'/var/opt/mssql/data/db1.trn';
GO
ALTER AVAILABILITY GROUP [ag1] ADD DATABASE [db1];

The database db1 will created in the rest of the servers by auto-seeding.

clipboard_image_4.png

14. Create SQL login pacemaker on all servers in datacenter1

CREATE LOGIN [pacemakerLogin] with PASSWORD= N'ComplexP@$$w0rd!'
GO
ALTER SERVER ROLE [sysadmin] ADD MEMBER [pacemakerLogin]
GO

The Availability group has been created. Unlike SQL Server in Windows, SQL Server Linux does not create any resource in Linux, you need to create these resource manually. Step 15~29 will cover the detail:

15. Install SQL Server resource agent for SQL Server. Run the following bash command on all servers in datacenter1.

sudo yum install mssql-server-ha

16. Install pacemaker.Run the following bash command on all servers in datacenter1.

sudo yum install pacemaker pcs fence-agents-all resource-agents

17. Set the password for the default user that is created when installing Pacemaker and Corosync packages. Run following command on all servers in datacenter1(This command asks for password. The password should be same for the servers in same datacenter)

sudo passwd hacluster

clipboard_image_5.png

18. To allow nodes to rejoin the cluster after the reboot, enable and start pcsd service and Pacemaker. Run the following commands on all servers in datacenter1.

sudo systemctl enable pcsd
sudo systemctl start pcsd
sudo systemctl enable pacemaker

19. Create a cluster in datacenter1. Run following commands on any one of the nodes in datacenter1(just on one server, and just one time). The 'YourStrongPassword' highlighted should be replaced with the password specified in step 17.

pcs cluster auth cen1 cen2 cen3 -u hacluster -p YourStrongPassword
sudo pcs cluster setup --name cluster1 cen1 cen2 cen3 sudo pcs cluster start --all sudo pcs cluster enable --all

clipboard_image_7.png

20. Check the cluster status

pcs cluster status

 

clipboard_image_8.png

21. Because the node level fencing configuration depends heavily on your environment, disable it for this tutorial (it can be configured later. In my test, if this value is not set to false, you can't failover, and IP resource may fail). The following script disables node level fencing(run it on any nodes in datacenter1)

sudo pcs property set stonith-enabled=false
22. On all SQL Server Linux servers in datacenter1 , save the credentials for the SQL Server login.(Use the same user name and password created in step14)

echo 'pacemakerLogin' >> ~/pacemaker-passwd 
echo 'ComplexP@$$w0rd!' >> ~/pacemaker-passwd 
sudo mv ~/pacemaker-passwd /var/opt/mssql/secrets/passwd 
sudo chown root:root /var/opt/mssql/secrets/passwd 
sudo chmod 400 /var/opt/mssql/secrets/passwd # Only readable by root

 

23. Create availability group resource at cluster level in datacenter1, run following command on any one of the nodes in datacenter1(just on one node, and just one time)

Here are the resource and constraint before creating the resource(no resource/constraint)

clipboard_image_9.png

Here is the command to create cluster resource.

sudo pcs resource create ag_cluster1 ocf:mssql:ag ag_name=ag1 meta failure-timeout=60s master notify=true

-------------note begin------------------

It creates a resource called ag_cluster1-master(it add the '-master' keyword automatically to the name)

And bind the resource to availability group 'ag1'

And it set the current primary replica as Master, the rest of nodes are Slaves(it check the status of AG group and make the decision

Please note, it does not return any error if you put an incorrect Availability group name (in this case, the ag1 is used, make sure you put correct name)

Internally, pacemaker issues T-SQL queries to SQL Server to check if the Ag exists

-----------note end------------------

clipboard_image_10.png

Please note. Pacemaker starts to connect the SQL Server instance using the login 'pacemakerLogin' created in step 14. If the step 14 or step 22 is skipped, the AG resource in both SQL and Cluster will fail.

24. Create virtual IP resource. The resource name is 'virtualip1', and IP Address is 192.168.1.65 in this demonstration. This is the IP Address for the AG Listener.

sudo pcs resource create virtualip1 ocf:heartbeat:IPaddr2 ip=192.168.1.65

The resource is will start automatically after it' created

clipboard_image_11.png

25. Create Availability group listener for Availability group ag1. Run following T-SQL query in primary replica(cen1 in this case). The ip address should be as same as the IP in step 24.

ALTER AVAILABILITY GROUP [ag1] 
ADD LISTENER 'ag1-listener' (WITH IP(('192.168.1.65','255.255.255.0')),PORT = 1433);

a)It's fine if you run step 25 before the step 24, however, you can't use the IP Address to connect SQL Server until the resource in Pacemaker is created.

b)There is a bug in DMV sys.availability_group_listener_ip_addresses showing the state is offline

 

26. Add colocation constraint. Please note, without the constraint, the ip resource does not failover when the AG resource failover. Hence we need to create a constraint to guarantee that the IP will failover when cluster resource failover. Run following command on any servers in datacenter1(just on one server, and just one time)

sudo pcs constraint colocation add virtualip1 ag_cluster1-master INFINITY with-rsc-role=Master

clipboard_image_12.png

27. Add ordering constraint

sudo pcs constraint order promote ag_cluster1-master then start virtualip1

clipboard_image_13.png

28. To update the property value to 2 minutes run. Run it on any node

sudo pcs property set cluster-recheck-interval=2min
sudo pcs property set start-failure-is-fatal=true
pcs resource update ag_cluster1 meta failure-timeout=60s  

29. -----note-- The order of 'ordering constraint' matters-----------

I have one customer who had a wrong order constraint. In his case, he had following constraint that promoted virtualip first, then start node. This will make the resource fail(may happen, and may caused failover issue)

sudo pcs constraint order promote virtualip1 then start ag_cluster1-master ##wrong one

 

The availability group ag1 and the listener in Datacenter are fully functional right now. The following section will cover the forwarder.

 

Section 2:Create Availability group AG2 in datacenter2

1. Updates /etc/hosts on all the nodes Datacenter2, add IP , replica name and AG Listener name.

Here is an screenshot. All servers in the datacenter2 should have same setting.

clipboard_image_14.png

2. Run following commands on all servers in datacenter2 to enable Alwayson Feature

 

sudo /opt/mssql/bin/mssql-conf set hadr.hadrenabled 1
sudo systemctl restart mssql-server

clipboard_image_15.png

3. Run following T-SQL to Enable Always On Health check event session on all the servers in datacenter2

ALTER EVENT SESSION AlwaysOn_health ON SERVER WITH (STARTUP_STATE=ON);

4. Remember the certificate backup files created in step4 in section 1? Copy them to all the servers in datacenter2(cen4,cen5 in this case)

cd /var/opt/mssql/data
scp dbm_certificate.* root@**<nodeIPAddress>**:/var/opt/mssql/data/

clipboard_image_16.png

5. Give permission to the mssql user to access the certificate files in all servers in datacenter2(cen4,cen5 in this case)

cd /var/opt/mssql/data
chown mssql:mssql dbm_certificate.*

clipboard_image_17.png

6. Create the certificate in all the servers in datacenter2, by restoring the certificate backup(cen4,cen5 in this case)

USE master
GO
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '**<Master_Key_Password>**';
GO
CREATE CERTIFICATE dbm_certificate
FROM FILE = '/var/opt/mssql/data/dbm_certificate.cer'
WITH PRIVATE KEY (
        FILE = '/var/opt/mssql/data/dbm_certificate.pvk'
        ,DECRYPTION BY PASSWORD = '**<Private_Key_Password>**'
        );

7. Create endpoint on all servers in datacenter2(cen4,cen5 in this case)

CREATE ENDPOINT [Hadr_endpoint] AS TCP (LISTENER_PORT = 5022)
FOR DATABASE_MIRRORING(ROLE = ALL, AUTHENTICATION = CERTIFICATE dbm_certificate, ENCRYPTION = REQUIRED ALGORITHM AES);
GO
ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED;
GO

8. Create AG with 2 synchronous replicas in datacenter2. Run following T-SQL query on primary replica (cen4 in this case)

CREATE AVAILABILITY GROUP [ag2]
WITH (
        DB_FAILOVER = ON
        ,CLUSTER_TYPE = EXTERNAL
        )
FOR REPLICA ON 
     N'cen4'
WITH (
        ENDPOINT_URL = N'TCP://cen4:5022'
        ,FAILOVER_MODE = EXTERNAL
        ,AVAILABILITY_MODE = SYNCHRONOUS_COMMIT
        ,SEEDING_MODE = AUTOMATIC
        )
    ,N'cen5'
WITH (
        ENDPOINT_URL = N'TCP://cen5:5022'
        ,FAILOVER_MODE = EXTERNAL
        ,AVAILABILITY_MODE = SYNCHRONOUS_COMMIT
        ,SEEDING_MODE = AUTOMATIC
        );
GO

ALTER AVAILABILITY GROUP [ag2] GRANT CREATE ANY DATABASE;

9. Join the AG group, run the following queries on all the secondary servers in datacenter2( cen5 in this case)

ALTER AVAILABILITY
GROUP [ag2]
JOIN WITH (CLUSTER_TYPE = EXTERNAL);
GO
ALTER AVAILABILITY GROUP [ag2] GRANT CREATE ANY DATABASE;
GO

The Availability group has been created. I'll create the resources in Linux

 

10. Install SQL Server resource agent for SQL Server. Run the following commands on all servers in datacenter2.

 

sudo yum install mssql-server-ha

11. Install pacemaker. Run the following commands on all servers in datacenter2.

 

sudo yum install pacemaker pcs fence-agents-all resource-agents

12. Set the password for the default user that is created when installing Pacemaker and Corosync packages. Run following command on all servers in datacenter2(This command asks for password. The password should be same for the servers in same datacenter)

$ sudo passwd hacluster

clipboard_image_18.png

13. To allow nodes to rejoin the cluster after the reboot, enable and start pcsd service and Pacemaker. Run the following commands on all servers in datacenter2(cen4,cen5).

sudo systemctl enable pcsd
sudo systemctl start pcsd
sudo systemctl enable pacemaker

clipboard_image_19.png

14. Create cluster, run it on one of the nodes(just on one node). The 'YourStrongPassword' highlighted should be replaced with the password specified in step 12.

sudo pcs cluster auth cen4 cen5 -u hacluster -p YourStrongPassword
sudo pcs cluster setup --name cluster2 cen4 cen5 sudo pcs cluster start --all sudo pcs cluster enable --all

clipboard_image_20.png

15. Because the node level fencing configuration depends heavily on your environment, disable it for this tutorial (it can be configured later. In my test, if this value is not set, you can't failover, and IP resource may fail). The following script disables node level fencing:

sudo pcs property set stonith-enabled=false

From <https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-availability-group-cluster-rhel?view=sql...>

16. Create availability group resource at cluster level, on one of nodes in DC2(only one time).

sudo pcs resource create ag_cluster2 ocf:mssql:ag ag_name=ag2 meta failure-timeout=60s master notify=true

clipboard_image_21.png

17. Create virtual IP resource. The resource name is 'virtualip2'. Please replace with valid value

sudo pcs resource create virtualip2 ocf:heartbeat:IPaddr2 ip=192.168.1.75

clipboard_image_22.png

18. Create Availability group listener for Availability group ag2. Run following T-SQL query in primary replica(cen1 in this case). The ip address should be as same as the IP in step 17.

ALTER AVAILABILITY
GROUP [ag2] ADD LISTENER 'ag2-listener' (
        WITH IP(('192.168.1.75'    ,'255.255.255.0'))
            ,PORT = 1433
        );

19. Add colocation constraint. Please note, without the constraint, the ip resource does not failover when the AG resource failover. Hence we need to create a constraint to guarantee that the IP will failover when cluster resource failover. Run following command on any servers in datacenter2(just on one server, and just one time)

sudo pcs constraint colocation add virtualip2 ag_cluster2-master INFINITY with-rsc-role=Master

clipboard_image_23.png

20. Add ordering constraint. Run following command on any servers in datacenter2(just on one server, and just one time)

sudo pcs constraint order promote ag_cluster2-master then start virtualip2

clipboard_image_24.png

21. To update the property value to 2 minutes run. Run it on any node

sudo pcs property set cluster-recheck-interval=2min
sudo pcs property set start-failure-is-fatal=true
pcs resource update ag_cluster1 meta failure-timeout=60s  

Section 3: Create Distributed Availability group

I suggest to test the Listener connections on both primary replica and forwarder. Do not proceed until the connections are good on both sides. If you proceed when the connection fails, the following steps will be good, but the DAG will be 'Not Healthy'

 

1.Create distributed availability group in datacenter1. Run following query in primary replica in datacenter1( cen1 in this case)

CREATE AVAILABILITY
GROUP [distributedag]
WITH (DISTRIBUTED) AVAILABILITY
GROUP ON 'ag1'
WITH (
        LISTENER_URL = 'tcp://ag1-listener:5022'
        ,AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT
        ,FAILOVER_MODE = MANUAL
        ,SEEDING_MODE = AUTOMATIC
        )
    ,'ag2'
WITH (
        LISTENER_URL = 'tcp://ag2-listener:5022'
        ,AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT
        ,FAILOVER_MODE = MANUAL
        ,SEEDING_MODE = AUTOMATIC
        );

2. Join the distributed availability group . Run following query in the forwarder server(cen4 in this case)

ALTER AVAILABILITY
GROUP [distributedag]
JOIN AVAILABILITY
GROUP ON 'ag1'
WITH (
        LISTENER_URL = 'tcp://ag1-listener:5022'
        ,AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT
        ,FAILOVER_MODE = MANUAL
        ,SEEDING_MODE = AUTOMATIC
        )
    ,'ag2'
WITH (
        LISTENER_URL = 'tcp://ag2-listener:5022'
        ,AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT
        ,FAILOVER_MODE = MANUAL
        ,SEEDING_MODE = AUTOMATIC
        );

The databases in AG group in datacenter1 will be propagated to servers in datacenter2 by auto-seeding after step 2 is complete.

dagview.png

11 Comments
Copper Contributor

こんにちは。

Step7では、証明書をリストアと書いたが、FROM FILE xxx.cerのはずでは?クエリ構文は間違っています。

Microsoft

Hi 佳易 楊,

 

Thanks for your comment.

I don't speak Japanese, can you write in English?

 
Copper Contributor

Hi Liwei,

It is the contents of Section1 DC1 Step7, but the certificate is not restored. It contains instructions for backing up certificates. I think the content is incomplete.

Is correct as below.

USE master
GO
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '**<Master_Key_Password>**';GO
CREATE CERTIFICATE dbm_certificateFROM FILE = '/var/opt/mssql/data/dbm_certificate.cer'WITH PRIVATE KEY (
        FILE = '/var/opt/mssql/data/dbm_certificate.pvk'
        ,DECRYPTION BY PASSWORD = '**<Private_Key_Password>**'
        );

 

Copper Contributor

Hi Liwei,

I have a warnings

Resource is master/slave but has not been promoted to master on any node.

I want your help..

Which settings should I check?

Thanks.

clipboard_image_0.png

Microsoft

Thanks for pointing out,

 

The step7 has been revised.

Microsoft

Hi 佳易 楊 ,

 

Which step you run into the issue?

please show me the resources and constraints status.

 

Copper Contributor

Hi Liwei,

Reached the section1 DC1 last step..

I pressed the Clearnup button.

clipboard_image_0.png

clipboard_image_1.png

clipboard_image_2.png

clipboard_image_3.png

From now on, we will make the DC2 part, including the failover test. 

I'll ask again if I don't know otherwise.

I am Chinese.I live in Japan,Tokyo.

https://blog.jssug.jp

 

Thank you for your support!

Copper Contributor

Hi LiWei

The DAG configuration is ready. Thank you very much.

clipboard_image_0.png

Microsoft

You are welcome:)

Copper Contributor

Hi Liwei,

clipboard_image_0.png

I configured it with 8 SQL Server 2019 CU1s. Pacemaker recommends a minimum of three SQL Server configurations.

The high availability group is divided into two groups of four units.

Copper Contributor

hi,

This is my memo..

Share to here!

--Azure構築前の注意点
/****************************************
AGのリスナーを構築するため、BackendのVMは必ず最初から可能性セットとして構成しないといけない。
仮想マシンのパブリックIPのSKUはStandardをおすすめ
LBのSKUもStandardをおすすめ
二つ仮想ネットワークをペアリングを行う
*****************************************/

--DAG構成
/****************************************
DC1以下AG1という
SERVER:sql001.azurevm.net
IP:192.168.1.190
SERVER:sql002.azurevm.net
IP:192.168.1.191
SERVER:sql003.azurevm.net
IP:192.168.1.192
SERVER:sql004.azurevm.net
IP:192.168.1.193

LB IP:192.168.1.50

DC2以下AG2という
SERVER:sql003.azurevm.net
IP:192.168.1.194
SERVER:sql004.azurevm.net
IP:192.168.1.195
SERVER:sql005.azurevm.net
IP:192.168.1.196
SERVER:sql006.azurevm.net
IP:192.168.1.197

LB IP:192.168.1.62
*****************************************/

--★(Azureの場合、Rootのパスワードを変更しておく)
--後続の構成プロセスの中に、scpコマンドでレプリカへコピーするため、あらかじめに、Rootのパスワードを変更しておく
--そして、RHELを最新化しておく
/****************************************
sudo su
passwd
sudo yum update -y
*****************************************/

--★
--AG1、AG2のすべてのRHELサーバーにSQL Server2019+GDR1、SQL Server HAパッケージををインストールする
--そして、SQL Server Agentを有効にしておく
/****************************************
sudo curl -o /etc/yum.repos.d/mssql-server.repo <a href="https://packages.microsoft.com/config/rhel/7/mssql-server-2019.repo" target="_blank">https://packages.microsoft.com/config/rhel/7/mssql-server-2019.repo</a>
sudo yum install -y mssql-server
sudo /opt/mssql/bin/mssql-conf setup

sudo yum install mssql-server-ha -y

sudo /opt/mssql/bin/mssql-conf set hadr.hadrenabled 1
sudo /opt/mssql/bin/mssql-conf set sqlagent.enabled true
sudo /opt/mssql/bin/mssql-conf set network.tcpport 2433
sudo systemctl restart mssql-server

sudo firewall-cmd --zone=public --add-port=2433/tcp --permanent
sudo firewall-cmd --zone=public --add-port=5022/tcp --permanent
sudo firewall-cmd --zone=public --add-port=2224/tcp --permanent
sudo firewall-cmd --zone=public --add-port=3123/tcp --permanent
sudo firewall-cmd --zone=public --add-port=21064/tcp --permanent
sudo firewall-cmd --zone=public --add-port=5405/udp --permanent
sudo firewall-cmd --permanent --add-service=high-availability
sudo firewall-cmd --reload

systemctl status mssql-server
*****************************************/

--★
--AG1、AG2のRHELサーバーのHostsを編集する。Hostsには、それぞれのAGのサーバー名とリスナー名のIPを登録しておく
/****************************************
vi /etc/hosts
192.168.1.190 SQL001.azurevm.net
192.168.1.191 SQL002.azurevm.net
192.168.1.192 SQL003.azurevm.net
192.168.1.193 SQL004.azurevm.net
192.168.1.194 SQL005.azurevm.net
192.168.1.195 SQL006.azurevm.net
192.168.1.196 SQL007.azurevm.net
192.168.1.197 SQL008.azurevm.net
192.168.1.190 SQL001
192.168.1.191 SQL002
192.168.1.192 SQL003
192.168.1.193 SQL004
192.168.1.194 SQL005
192.168.1.195 SQL006
192.168.1.196 SQL007
192.168.1.197 SQL008
192.168.1.50 ag1.azurevm.net
192.168.1.62 ag2.azurevm.net
:wq

*****************************************/

--★
--AG1、AG2のそれぞれのサーバーにpacemakerインストールする
--AG1、AG2のそれぞれのサーバーのhaclusterのパスワードを変更しておく
--AG1、AG2のそれぞれのサーバーのpcsdサービスとpacemakerを有効にする
/****************************************
sudo yum install pacemaker pcs fence-agents-all resource-agents -y
sudo passwd hacluster

sudo systemctl enable pcsd
sudo systemctl start pcsd
sudo systemctl enable pacemaker

*****************************************/

--★
--pacemakerを設定する前に、AG1、AG2のSQLサーバーに、pacemakerLoginを作成しておく。そして、AlwaysOnのXEventのジョブを開始する。
USE [master]
GO
CREATE LOGIN [pacemakerLogin] with PASSWORD= N'<yourpassword>'
GO
ALTER SERVER ROLE [sysadmin] ADD MEMBER [pacemakerLogin]
GO

ALTER EVENT SESSION AlwaysOn_health ON SERVER WITH (STARTUP_STATE=ON)
GO

--★
--Pacemakerでリソースを作成する際に、SQL Serverのシステムオブジェクトにアクセスがあるため、Pacemaker用のSQLユーザーの資格情報を保存する
--chmod 400はReadonly、Chmod 777は編集可能
/****************************************
echo 'pacemakerLogin' >> ~/pacemaker-passwd
echo '<yourpassword>' >> ~/pacemaker-passwd
sudo mv ~/pacemaker-passwd /var/opt/mssql/secrets/passwd
sudo chown root:root /var/opt/mssql/secrets/passwd
sudo chmod 400 /var/opt/mssql/secrets/passwd
cat /var/opt/mssql/secrets/passwd
*****************************************/

--★
--クラスタを構成するために、AG1のサーバーで、実行する、そして、クラスタ状況を確認する
/****************************************
pcs cluster auth sql001.azurevm.net sql002.azurevm.net sql003.azurevm.net sql004.azurevm.net -u hacluster -p '<yourpassword>'
sudo pcs cluster setup --name cluster1 sql001.azurevm.net sql002.azurevm.net sql003.azurevm.net sql004.azurevm.net --force
sudo pcs cluster start --all
sudo pcs cluster enable --all
pcs cluster status

--Ref:<a href="https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-availability-group-cluster-rhel?view=sql-server-ver15" target="_blank">https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-availability-group-cluster-rhel?view=sql-server-ver15</a>
--AG1のすべてのサーバーで実行する
--Shoot The Other Node In The Head
--スプリットブレインを防ぐための機構で、ノード間の通信に異常が発生した場合に、強制的に対向のノードを再起動(フェンシング) することで、両ノードがマスターになる(リソースを同時にアクセスする) ことを防ぐ
sudo pcs property set stonith-enabled=false
*****************************************/

--★
--クラスタを構成するために、AG2のサーバーで、実行する、そして、クラスタ状況を確認する
/****************************************
pcs cluster auth sql005.azurevm.net  sql006.azurevm.net sql007.azurevm.net  sql008.azurevm.net -u hacluster -p '<yourpassword>'
sudo pcs cluster setup --name cluster2 sql005.azurevm.net  sql006.azurevm.net sql007.azurevm.net  sql008.azurevm.net --force
sudo pcs cluster start --all
sudo pcs cluster enable --all
pcs cluster status

--Ref:<a href="https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-availability-group-cluster-rhel?view=sql-server-ver15" target="_blank">https://docs.microsoft.com/en-us/sql/linux/sql-server-linux-availability-group-cluster-rhel?view=sql-server-ver15</a>
--AG2のすべてのサーバーで実行する
sudo pcs property set stonith-enabled=false
--Shoot The Other Node In The Head
--スプリットブレインを防ぐための機構で、ノード間の通信に異常が発生した場合に、強制的に対向のノードを再起動(フェンシング) することで、両ノードがマスターになる(リソースを同時にアクセスする) ことを防ぐ
sudo pcs property set stonith-enabled=false
*****************************************/

--★
--構成済みのクラスタに、リソースを追加する。プロパティのパラメータを変更する。
--AG1、Clusterのアクティブ側のサーバーで実行する、要注意
/****************************************
sudo pcs resource create ag_cluster1 ocf:mssql:ag ag_name=ag1 port=2433 meta failure-timeout=60s master notify=true
sudo pcs resource create virtualip1 ocf:heartbeat:IPaddr2 ip=192.168.1.50 cidr_netmask=24
sudo pcs constraint colocation add virtualip1 ag_cluster1-master INFINITY with-rsc-role=Master
sudo pcs constraint order promote ag_cluster1-master then start virtualip1
sudo pcs property set cluster-recheck-interval=2min
sudo pcs property set start-failure-is-fatal=true
sudo pcs resource update ag_cluster1 meta failure-timeout=60s
sudo pcs resource update ag_cluster1 required_synchronized_secondaries_to_commit=2
*****************************************/

--★
--構成済みのクラスタに、リソースを追加する。
--AG2、Clusterのアクティブ側のサーバーで実行する、要注意
/****************************************
sudo pcs resource create ag_cluster2 ocf:mssql:ag ag_name=ag2 port=2433 meta failure-timeout=60s master notify=true
sudo pcs resource create virtualip2 ocf:heartbeat:IPaddr2 ip=192.168.1.62 cidr_netmask=24
sudo pcs constraint colocation add virtualip2 ag_cluster2-master INFINITY with-rsc-role=Master
sudo pcs constraint order promote ag_cluster2-master then start virtualip2
sudo pcs property set cluster-recheck-interval=2min
sudo pcs property set start-failure-is-fatal=true
sudo pcs resource update ag_cluster2 meta failure-timeout=60s
sudo pcs resource update ag_cluster2 required_synchronized_secondaries_to_commit=2
*****************************************/

--★
--AG1のプライマリーレプリカ(SQL001)で、証明書を作成する
USE master
GO
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<yourpassword>';
GO
CREATE CERTIFICATE dbm_certificate WITH SUBJECT = 'dbm';
GO
BACKUP CERTIFICATE dbm_certificate TO FILE = '/var/opt/mssql/data/dbm_certificate.cer'
WITH PRIVATE KEY (
FILE = '/var/opt/mssql/data/dbm_certificate.pvk',
ENCRYPTION BY PASSWORD = '<yourpassword>')
GO

--★
--AG1のプライマリーレプリカ(SQL001)で作った証明書をSQL002にコピーする、そして、AG2のそれぞれのサーバーにもコピーする
--この操作はAG1プライマリーレプリカ(SQL001)で実行する
/****************************************
cd /var/opt/mssql/data
scp dbm_certificate.* root@sql002.azurevm.net:/var/opt/mssql/data/
scp dbm_certificate.* root@sql003.azurevm.net:/var/opt/mssql/data/
scp dbm_certificate.* root@sql004.azurevm.net:/var/opt/mssql/data/
scp dbm_certificate.* root@sql005.azurevm.net:/var/opt/mssql/data/
scp dbm_certificate.* root@sql006.azurevm.net:/var/opt/mssql/data/
scp dbm_certificate.* root@sql007.azurevm.net:/var/opt/mssql/data/
scp dbm_certificate.* root@sql008.azurevm.net:/var/opt/mssql/data/
*****************************************/

--★
--AG1のセカンダリレプリカ(SQL002)、そして、AG2のそれぞれのサーバーにログインし、コピーされた証明書をmssqlユーザーにアクセス権限を与える
--この操作はAG1プライマリーレプリカ(SQL001)以外のサーバーで実行する
/****************************************
cd /var/opt/mssql/data
chown mssql:mssql dbm_certificate.*
*****************************************/

--★
--AG1のセカンダリレプリカ(SQL002)、そして、AG2のそれぞれのSQLサーバーに証明書をリストアする
--この操作はAG1プライマリーレプリカ(SQL001)以外のサーバーで実行する
USE master
GO
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<yourpassword>'
GO
CREATE CERTIFICATE dbm_certificate
    FROM FILE = '/var/opt/mssql/data/dbm_certificate.cer'
    WITH PRIVATE KEY (
    FILE = '/var/opt/mssql/data/dbm_certificate.pvk',
    DECRYPTION BY PASSWORD = '<yourpassword>')
GO

--★
--AG1、AG2それぞれのサーバーのEndPointを作成する
USE master
GO
CREATE ENDPOINT [Hadr_endpoint]
    AS TCP (LISTENER_PORT = 5022)
    FOR DATABASE_MIRRORING (
    ROLE = WITNESS,
    AUTHENTICATION = CERTIFICATE dbm_certificate,
    ENCRYPTION = REQUIRED ALGORITHM AES
)
GO
ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED
GO

--★
--SQL001で可用性グループを作する
USE [master]
GO
CREATE AVAILABILITY GROUP [ag1]
WITH (AUTOMATED_BACKUP_PREFERENCE = SECONDARY,
DB_FAILOVER = ON,
DTC_SUPPORT = NONE,
CLUSTER_TYPE = EXTERNAL,
REQUIRED_SYNCHRONIZED_SECONDARIES_TO_COMMIT = 0)
FOR 
REPLICA ON 
N'sql001' WITH (ENDPOINT_URL = N'TCP://sql001.azurevm.net:5022', 
FAILOVER_MODE = EXTERNAL, 
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, 
SESSION_TIMEOUT = 10, BACKUP_PRIORITY = 50, SEEDING_MODE = AUTOMATIC, 
PRIMARY_ROLE(ALLOW_CONNECTIONS = ALL), SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL)),
N'sql002' WITH (ENDPOINT_URL = N'TCP://sql002.azurevm.net:5022', 
FAILOVER_MODE = EXTERNAL, 
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, 
SESSION_TIMEOUT = 10, BACKUP_PRIORITY = 50, SEEDING_MODE = AUTOMATIC,
PRIMARY_ROLE(ALLOW_CONNECTIONS = ALL), SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL)),
N'sql003' WITH (ENDPOINT_URL = N'TCP://sql003.azurevm.net:5022', 
FAILOVER_MODE = EXTERNAL, 
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, 
SESSION_TIMEOUT = 10, BACKUP_PRIORITY = 50, SEEDING_MODE = AUTOMATIC,
PRIMARY_ROLE(ALLOW_CONNECTIONS = ALL), SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL)),
N'sql004' WITH (ENDPOINT_URL = N'TCP://sql004.azurevm.net:5022', 
FAILOVER_MODE = EXTERNAL, 
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, 
SESSION_TIMEOUT = 10, BACKUP_PRIORITY = 50, SEEDING_MODE = AUTOMATIC,
PRIMARY_ROLE(ALLOW_CONNECTIONS = ALL), SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL))
GO

ALTER AVAILABILITY GROUP [ag1]
ADD LISTENER N'ag1-listnener' (
WITH IP
((N'192.168.1.50', N'255.255.255.0')
)
, PORT=2433)
GO

ALTER AVAILABILITY GROUP [ag1] GRANT CREATE ANY DATABASE
GO

--★
--SQL002、SQL003、SQL004で可用性グループに参加する
USE [master]
GO
ALTER AVAILABILITY GROUP [ag1] JOIN WITH (CLUSTER_TYPE = EXTERNAL)
GO
ALTER AVAILABILITY GROUP [ag1] GRANT CREATE ANY DATABASE
GO

--★
--pacemakerのリソース状況を確認できたら、次へ(クラスタの切り替えで、プライマレプリカやセカンダリレプリカが変わったりするので)

--★
--SQL005で可用性グループを作する
USE [master]
GO
CREATE AVAILABILITY GROUP [ag2]
WITH (AUTOMATED_BACKUP_PREFERENCE = SECONDARY,
DB_FAILOVER = ON,
DTC_SUPPORT = NONE,
CLUSTER_TYPE = EXTERNAL,
REQUIRED_SYNCHRONIZED_SECONDARIES_TO_COMMIT = 0)
FOR 
REPLICA ON 
N'sql005' WITH (ENDPOINT_URL = N'TCP://sql005.azurevm.net:5022', 
FAILOVER_MODE = EXTERNAL, 
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, 
SESSION_TIMEOUT = 10, BACKUP_PRIORITY = 50, SEEDING_MODE = AUTOMATIC, 
PRIMARY_ROLE(ALLOW_CONNECTIONS = ALL), SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL)),
N'sql006' WITH (ENDPOINT_URL = N'TCP://sql006.azurevm.net:5022', 
FAILOVER_MODE = EXTERNAL, 
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, 
SESSION_TIMEOUT = 10, BACKUP_PRIORITY = 50, SEEDING_MODE = AUTOMATIC,
PRIMARY_ROLE(ALLOW_CONNECTIONS = ALL), SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL)),
N'sql007' WITH (ENDPOINT_URL = N'TCP://sql007.azurevm.net:5022', 
FAILOVER_MODE = EXTERNAL, 
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, 
SESSION_TIMEOUT = 10, BACKUP_PRIORITY = 50, SEEDING_MODE = AUTOMATIC,
PRIMARY_ROLE(ALLOW_CONNECTIONS = ALL), SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL)),
N'sql008' WITH (ENDPOINT_URL = N'TCP://sql008.azurevm.net:5022', 
FAILOVER_MODE = EXTERNAL, 
AVAILABILITY_MODE = SYNCHRONOUS_COMMIT, 
SESSION_TIMEOUT = 10, BACKUP_PRIORITY = 50, SEEDING_MODE = AUTOMATIC,
PRIMARY_ROLE(ALLOW_CONNECTIONS = ALL), SECONDARY_ROLE(ALLOW_CONNECTIONS = ALL))
GO

ALTER AVAILABILITY GROUP [ag2]
ADD LISTENER N'ag2-listnener' (
WITH IP
((N'192.168.1.62', N'255.255.255.0')
)
, PORT=2433)
GO

ALTER AVAILABILITY GROUP [ag2] GRANT CREATE ANY DATABASE
GO

--★
--SQL006、SQL007、SQL008で可用性グループに参加する
USE [master]
GO
ALTER AVAILABILITY GROUP [ag2] JOIN WITH (CLUSTER_TYPE = EXTERNAL)
GO
ALTER AVAILABILITY GROUP [ag2] GRANT CREATE ANY DATABASE
GO

--★
--pacemakerのリソース状況を確認できたら、次へ(クラスタの切り替えで、プライマレプリカやセカンダリレプリカが変わったりするので)

--★
--ag1-listenerとag2-listenerの動作を確認する。DAG構成は、リスナーが動作していることは前提条件。

--★
--AG1のプライマリーレプリカで実行する
USE [master]
GO
CREATE AVAILABILITY
GROUP [distributedag]
WITH (DISTRIBUTED) AVAILABILITY
GROUP ON
'ag1' WITH (
        LISTENER_URL = 'tcp://ag1.azurevm.net:5022'
        ,AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT
        ,FAILOVER_MODE = MANUAL
        ,SEEDING_MODE = AUTOMATIC)
,'ag2' WITH (
        LISTENER_URL = 'tcp://ag2.azurevm.net:5022'
        ,AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT
        ,FAILOVER_MODE = MANUAL
        ,SEEDING_MODE = AUTOMATIC)
GO

--AG2のプライマリーレプリカ(フォーワーダーレプリカ)で実行する
USE [master]
GO
ALTER AVAILABILITY
GROUP [distributedag]
JOIN AVAILABILITY
GROUP ON
'ag1' WITH (
        LISTENER_URL = 'tcp://ag1.azurevm.net:5022'
        ,AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT
        ,FAILOVER_MODE = MANUAL
        ,SEEDING_MODE = AUTOMATIC)
,'ag2' WITH (
        LISTENER_URL = 'tcp://ag2.azurevm.net:5022'
        ,AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT
        ,FAILOVER_MODE = MANUAL
        ,SEEDING_MODE = AUTOMATIC)
GO

--★
--AG1のプライマリーレプリカで実行する
USE [master]
GO

CREATE DATABASE [db_JSSUG]
GO
ALTER DATABASE [db_JSSUG] SET RECOVERY FULL
GO

USE [db_JSSUG]
GO

CREATE TABLE [tbl_JSSUG] (
    col1 INT PRIMARY KEY
)
GO

INSERT INTO [tbl_JSSUG]
VALUES (1)
GO

BACKUP DATABASE [db_JSSUG] TO DISK = N'/var/opt/mssql/data/db_JSSUG.bak'
GO
BACKUP log [db_JSSUG] TO DISK = N'/var/opt/mssql/data/db_JSSUG.trn'
GO

--★
--[db_JSSUG]をAG1に追加する
USE [master]
GO
ALTER AVAILABILITY GROUP [ag1] ADD DATABASE [db_JSSUG]
GO

--★
--分散型可用性グループの[db_JSSUG]を確認する
--終わり

 

Version history
Last update:
‎Aug 17 2020 02:31 PM
Updated by: