One common comment I've heard at various conferences and customer interactions is, "setting up Always On availability groups (AGs) on pacemaker takes very long, has multiple steps and commands that need to be run, so sometimes it can be error prone, can you folks make it easier and automate the AG setup for SQL Server on Linux?"
Well, we went back and collaborated with the RHEL Engineering team to create a SQL Server role capable of creating the pacemaker cluster, configuring the SQL Server Always On availability groups (AG), adding databases to the AG, and creating the pacemaker resources with fencing agents. To top it all off, after creating the necessary infrastructure, I was able to deploy SQL Server Always On availability groups (AG), with pacemaker cluster and AG specific resources all in under 8 minutes!! Yes, eight minutes:). If you want to give it a shot, keep reading!
To know more about the Ansible based SQL Server roles, please refer to this latest blog: Preview the new system role support for Microsoft SQL Server Always On availability groups (redhat.c... from the Red Hat team.
RHEL 8.5 added support for Microsoft SQL Server system role, which automates the installation, configuration and tuning of Microsoft SQL Server on RHEL. Please see the RHEL documentation for more information.
For today’s setup, I am going to use the SQL Server role from the Ansible, listed here: Ansible Galaxy.
This is currently in preview and allows you to configure and deploy Always On availability groups (AG). This SQL Server role is provided as a sample and for reference only. It is not supported by Microsoft or Red Hat. However, we welcome your feedback and suggestions for improving the system role here: Issues linux-system-roles/mssql (github.com).
Okay, let's get started. Here are the steps we would take to create the infrastructure required to run the Ansible role to configure AG for SQL Server RHEL based VMs.
#To list all the SQL based RHEL offers, you can run the following command
az vm image list --all --offer "sql2019-rhel8"
# Run the following commands to create the VMs, using the SQL Server PAYG(Pay-as-you-go) # images pre-configured on RHEL 8 base OS image, thus avoiding the steps to install and # configure SQL Server deployments. We are using the Standard_B4ms as an example
az vm create --resource-group testgroup --name sqlrhel1 --size "Standard_B4ms" --location "west india" --image "MicrosoftSQLServer:sql2019-rhel8:sqldev:15.0.220712" --vnet-name amvindomain-vnet --subnet default --admin-username "adminuser" --admin-password "StrongPa$$w0rd" --authentication-type all --generate-ssh-keys
az vm create --resource-group testgroup --name sqlrhel2 --size "Standard_B4ms" --location "west india" --image "MicrosoftSQLServer:sql2019-rhel8:sqldev:15.0.220712" --vnet-name amvindomain-vnet --subnet default --admin-username "adminuser" --admin-password "StrongPa$$w0rd" --authentication-type all --generate-ssh-keys
az vm create --resource-group testgroup --name sqlrhel3 --size "Standard_B4ms" --location "west india" --image "MicrosoftSQLServer:sql2019-rhel8:sqldev:15.0.220712" --vnet-name amvindomain-vnet --subnet default --admin-username "adminuser" --admin-password "StrongPa$$w0rd" --authentication-type all --generate-ssh-keys
Before I get into the specifics of the Ansible role, here's a quick rundown of the steps:
# My controller node is sqlrhel1, that is why on this node I am installing the
# ansible-core package and also updating the RHEL to the latest version, please note that # my controller node is also a target node.
az vm run-command invoke -g testgroup -n sqlrhel1 --command-id RunShellScript --scripts "sudo yum update -y && sudo yum install ansible-core -y"
# On my other target nodes sqlrhel2 and sqlrhel3 I am just updating the RHEL to latest
# version
az vm run-command invoke -g testgroup -n sqlrhel2 --command-id RunShellScript --scripts "sudo yum update -y"
az vm run-command invoke -g testgroup -n sqlrhel3 --command-id RunShellScript --scripts "sudo yum update -y"
[root@sqlrhel1 sql]# ansible-galaxy collection install microsoft.sql
To configure passwordless SSH between the controller and all target nodes, see the section:” Setup passwordless ssh access between the Control node and the managed nodes” in the blog post Deploy SQL Server – The Ansible way! - Microsoft Tech Community. For more information on troubleshooting SSH access errors, please see this Red Hat article: https://access.redhat.com/solutions/9194.
In Ansible, the inventory file defines the target hosts that you want to configure with the Ansible playbook; in this case, my controller node (sqlrhel1) is also a target node. Here is an example of how to define the inventory file for the microsoft.sql role to setup the pacemaker cluster and assign the various roles to each of the availability group replicas.
all:
hosts:
# host1 is defined by a short name
# There is no need to specify ha_cluster names explicitly
host1:
mssql_ha_replica_type: primary
# host2 is defined by FQDN
# You must define ha_cluster names to be in the short name format
host2.example.com:
mssql_ha_replica_type: synchronous
ha_cluster:
node_name: host2
pcs_address: host2
# host3 is defined by an ip address
# You must define ha_cluster names to be in the short name format
# In the case where the default host's IP address differs from the IP
# address that Pacemaker must use to set up cluster, you must define
# ha_cluster corosync_addresses
192.XXX.XXX.333:
mssql_ha_replica_type: witness
ha_cluster:
node_name: host3
pcs_address: host3
corosync_addresses:
- 10.XXX.XXX.333
In my case, because I'm using short names, I don't have to specify the ha_cluster names explicitly, so I use the following inventory, to create this, feel free to use your favorite text editor and copy the content below; I used a command like ‘nano inventory’ and pasted the below content into the file.
all:
hosts:
sqlrhel1:
mssql_ha_replica_type: primary
sqlrhel2:
mssql_ha_replica_type: synchronous
sqlrhel3:
mssql_ha_replica_type: synchronous
If I had decided to define the inventory by IP address, I would have used an inventory as shown below:
all:
hosts:
172.22.0.14:
ansible_user: root
mssql_ha_replica_type: primary
ha_cluster:
node_name: sqlrhel1
pcs_address: sqlrhel1
corosync_addresses:
- 172.22.0.14
172.22.0.15:
ansible_user: root
mssql_ha_replica_type: synchronous
ha_cluster:
node_name: sqlrhel2
pcs_address: sqlrhel2
corosync_addresses:
- 172.22.0.15
172.22.0.16:
ansible_user: root
mssql_ha_replica_type: synchronous
ha_cluster:
node_name: sqlrhel3
pcs_address: sqlrhel3
corosync_addresses:
- 172.22.0.16
Here is an example playbook that you can create, that performs the following tasks, which are not necessarily in the order listed below:
To create the playbook.yaml file, I ran the command ‘nano playbook.yaml’ and then pasted the following content into it,
- hosts: all
vars:
mssql_accept_microsoft_odbc_driver_17_for_sql_server_eula: true
mssql_accept_microsoft_cli_utilities_for_sql_server_eula: true
mssql_accept_microsoft_sql_server_standard_eula: true
mssql_manage_firewall: true
mssql_password: "SQLp@55w0rD1"
mssql_edition: Developer
mssql_ha_configure: true
mssql_ha_listener_port: 5022
mssql_ha_cert_name: ExampleCert
mssql_ha_master_key_password: "p@55w0rD1"
mssql_ha_private_key_password: "p@55w0rD2"
mssql_ha_reset_cert: true
mssql_ha_endpoint_name: ag_endpoint
mssql_ha_ag_name: ag_test
mssql_ha_db_names:
- test
mssql_ha_login: pacemakerLogin
mssql_ha_login_password: "Pacemakerp@55w0rD1"
# Set mssql_ha_virtual_ip to the frontend IP address configured in the Azure
# load balancer
mssql_ha_virtual_ip: 172.22.0.22
mssql_ha_cluster_run_role: true
ha_cluster_cluster_name: "{{ mssql_ha_ag_name }}"
ha_cluster_hacluster_password: "p@55w0rD4"
ha_cluster_extra_packages:
- fence-agents-azure-arm
ha_cluster_cluster_properties:
- attrs:
- name: cluster-recheck-interval
value: 2min
- name: start-failure-is-fatal
value: true
- name: stonith-enabled
value: true
- name: stonith-timeout
value: 900
ha_cluster_resource_primitives:
- id: rsc_st_azure
agent: stonith:fence_azure_arm
instance_attrs:
- attrs:
- name: login
value: <ApplicationID> for your application registration in Azure
- name: passwd
value: <ServiceprincipalPassword> value from the client secret
- name: resourceGroup
value: <resourcegroupname> in Azure
- name: tenantId
value: <tenantID> in Azure
- name: subscriptionId
value: <subscriptionID> in Azure
- name: power_timeout
value: 240
- name: pcmk_reboot_timeout
value: 900
- id: azure_load_balancer
agent: azure-lb
instance_attrs:
- attrs:
# probe port configured in Azure
- name: port
value: 59999
- id: ag_cluster
agent: ocf:mssql:ag
instance_attrs:
- attrs:
- name: ag_name
value: "{{ mssql_ha_ag_name }}"
meta_attrs:
- attrs:
- name: failure-timeout
value: 60s
- id: virtualip
agent: ocf:heartbeat:IPaddr2
instance_attrs:
- attrs:
- name: ip
value: "{{ mssql_ha_virtual_ip }}"
operations:
- action: monitor
attrs:
- name: interval
value: 30s
ha_cluster_resource_groups:
- id: virtualip_group
resource_ids:
- azure_load_balancer
- virtualip
ha_cluster_resource_clones:
- resource_id: ag_cluster
promotable: yes
meta_attrs:
- attrs:
- name: notify
value: true
ha_cluster_constraints_colocation:
- resource_leader:
id: ag_cluster-clone
role: Promoted
resource_follower:
id: azure_load_balancer
options:
- name: score
value: INFINITY
ha_cluster_constraints_order:
- resource_first:
id: ag_cluster-clone
action: promote
resource_then:
id: azure_load_balancer
action: start
# Variables to open the probe port configured in Azure in firewall
firewall:
- port: 59999/tcp
state: enabled
permanent: true
runtime: true
roles:
- fedora.linux_system_roles.firewall
- microsoft.sql.server
When you list the contents of the directory “/root/.ansible/collections/ansible collections/microsoft/sql”, you should see the inventory and the playbook.yaml that were created, as shown below:[root@sqlrhel1 sql]# ll
total 48
-rw-r--r-- 1 root root 6431 Aug 29 21:30 CHANGELOG.md
-rw-r--r-- 1 root root 17218 Aug 29 21:30 FILES.json
-rw-r--r-- 1 root root 175 Aug 29 21:31 inventory
-rw-r--r-- 1 root root 1053 Aug 29 21:30 LICENSE-server
-rw-r--r-- 1 root root 892 Aug 29 21:30 MANIFEST.json
drwxr-xr-x 2 root root 25 Aug 29 21:30 meta
-rw-r--r-- 1 root root 3839 Aug 29 21:34 playbook.yaml
-rw-r--r-- 1 root root 1278 Aug 29 21:30 README.md
drwxr-xr-x 3 root root 20 Aug 29 21:30 roles
drwxr-xr-x 3 root root 20 Aug 29 21:30 tests
To run the playbook, I executed the following command on the controller node (sqlrhel1):
[root@sqlrhel1 sql]# ansible-playbook -i inventory playbook.yaml
This should start the deployment, and you should have the entire setup completed in about 8 minutes or less. When the setup is finished, you should see the sample output shown below.
To view the cluster resources that have been created, use the command:
[root@sqlrhel1 sql]# sudo pcs status
Cluster name: ag_test1
Cluster Summary:
* Stack: corosync
* Current DC: sqlrhel2 (version 2.1.2-4.el8_6.2-ada5c3b36e2) - partition with quorum
* Last updated: Mon Aug 29 21:36:10 2022
* Last change: Mon Aug 29 20:50:36 2022 by root via cibadmin on sqlrhel1
* 3 nodes configured
* 6 resource instances configured
Node List:
* Online: [ sqlrhel1 sqlrhel2 sqlrhel3 ]
Full List of Resources:
* rsc_st_azure (stonith:fence_azure_arm): Started sqlrhel1
* Resource Group: virtualip_group:
* azure_load_balancer (ocf::heartbeat:azure-lb): Started sqlrhel1
* virtualip (ocf::heartbeat:IPaddr2): Started sqlrhel1
* Clone Set: ag_cluster-clone [ag_cluster] (promotable):
* Masters: [ sqlrhel1 ]
* Slaves: [ sqlrhel2 sqlrhel3 ]
all:
hosts:
sqlrhel1:
mssql_ha_replica_type: primary
sqlrhel2:
mssql_ha_replica_type: synchronous
sqlrhel3:
mssql_ha_replica_type: synchronous
You can view the AG and its dashboard in SQL Server Management Studio (SSMS), as shown below:
You can connect using the listener IP Address in this case it is 172.22.0.22 ( mssql_ha_virtual_ip) and test the failover as suggested in this article. Also you can test the fencing as per the steps documented here.
Please let me know if you have questions on any section of this article and I’ll be more than happy to explain this further!
Happy Learning!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.