SQLServerAlwaysOn
57 TopicsSQL Server Licensing: High Availability and Disaster Recovery benefits
Learn about how the new SQL Server High Availability and Disaster Recovery Software Assurance benefits can help you architect an optimal business continuity and disaster recovery strategy with the best TCO.84KViews4likes15CommentsLift and Shift Always On SQL Server Failover Cluster Instance (SQL FCI) to Azure VMs
Today, we are announcing two new features enabling lift and shift of Always On SQL server Failover Cluster instances (SQL FCI) from on-premises to Azure Virtual Machines: Distributed Network Name (DNN) for SQL FCI and Azure Shared Disks for SQL FCI.23KViews9likes7CommentsHow to create multiple listeners for same availability group (Goden Yao)
First published on MSDN on Feb 03, 2012 Blog : How to create multiple listeners for the same availability groupAs the PM who designed the availability group listener feature, when I talked to some customers , there’re 2 scenarios I realized may need to have multiple listeners for a single availability group.20KViews2likes1CommentSQL Server 2019 Intelligent Performance -Worker Migration
SQL Server task scheduling basics A few basic terms: Scheduler – each scheduler is mapped to an individual logical processor (DMV sys.dm_os_schedulers) Worker – each worker is mapped to a thread 1 (DMV sys.dm_os_workers) Task – each task is mapped to a series of programming instructions that form a unit of work. The task is then bound to a worker in order to execute the instructions on a CPU. (DMV sys.dm_os_tasks) Please reference SQL Server Thread and Task Architecture Guide for comprehensive explanation. 1 Unless the SQL Server instance is set to fiber mode, which is generally not recommended. The following graph is a high-level life cycle workflow of a task from the moment that it is bound to a worker to the time that it is unbound from its worker after task completion. Mechanism to balance CPU usage in previous versions of SQL Server SQL Server internally tracks the current task count as load factor for each scheduler and always assigns a newly created task to the least loaded scheduler for balanced CPU usage. This mechanism works well for typical short-running query tasks. For long-running tasks, this mechanism may not always sustain evenly distributed CPU usage across schedulers. It is possible for more than one long-running task to be assigned to the same scheduler when other schedulers have higher task loads (which are all short-running tasks) at the moment the long-running task is assigned to a scheduler. Workers are bound to the same scheduler for the life of the worker, and a task is bound to a worker for its entire execution duration. This means when other schedulers become idle or have fewer tasks, long-running tasks cannot be re-assigned to these schedulers because they are bound to their originally assigned scheduler. This can lead to an unbalanced workload with multiple competing runnable workers on a single scheduler unable to utilize available CPU cycles on other idle schedulers. With computer hardware evolution, SQL Server customers gradually switch to host machines with a higher number of CPUs. Imbalanced CPU usage in the SQL Server engine becomes a more visible problem these days. What is worker migration Worker migration (AKA “worker stealing”) allows an idle SOS scheduler to migrate a worker from the runnable queue of another scheduler on the same NUMA node and immediately resume the task of the migrated worker. This enhancement provides more balanced CPU usage and reduces the amount of time long-running tasks spend in the runnable queue. A long-running task that is enabled for worker migration is no longer bound to a fixed scheduler. Instead, it will frequently move across schedulers within the same NUMA node which naturally results in less loaded schedulers. Together with the existing load factor mechanism, worker migration provides SQL Server with an enriched solution for balanced CPU usage. Availability group parallel redo tasks with worker migration In SQL Server 2019, workers associated with availability group parallel redo tasks are enabled for worker migration to address a commonly observed scheduler contention issue among redo tasks on secondary replicas. Availability group parallel redo is the default redo mode starting with SQL Server 2016 RTM and later versions. For each database in an availability group secondary replica, a set of parallel redo worker tasks are created for this database in addition to its main redo task. The main redo task is dedicated to analyzing incoming log records and dispatching them to different parallel redo workers. Each parallel redo worker processes assigned log records and applies changes to data pages. Both the main redo task and parallel redo worker tasks are long-running tasks. After a database successfully starts its parallel redo worker tasks during database startup phase, these parallel redo worker tasks will stay alive until the database is shut down. The main redo task also keeps running as long as there are transaction logs to process and does not stay idle for more than 60 seconds. For a heavy transaction workload database on the availability group primary replica, the main redo task of the database on a secondary replica is generally CPU-intensive. Its parallel redo worker tasks may have high CPU demand depending on database configuration, table schema and transaction types. It is possible for one or more parallel redo worker tasks of a database to be assigned to the same scheduler as its main redo task. If more than one long-running redo task competes for CPU cycles on the same scheduler, scheduler contention may occur. When this happens, the overall transaction log redo throughput of the impacted database can drop 20-30%. When several availability group databases have heavy redo workloads during the same period, this scheduler contention pattern is more common because of the increased chance for one or more database main redo tasks to be allocated to the same scheduler with parallel redo worker tasks. Worker migration avoids this performance degradation and maintains more consistent redo throughput for a database. For multiple database concurrent redo workloads, the overall redo throughput has about 10-15% improvement compared to the situation when there is scheduler contention between some main redo tasks and parallel redo worker tasks. To monitor worker migration occurrence, a new column “worker_migration_count” is added into the existing DMV sys.dm_os_workers. It is a cumulative indicator of the number of migrations that have happened to a worker since the last SQL Server service startup. Also, for parallel redo worker task records (command=”PARALLEL REDO TASK” or “PARALLEL REDO HELP TASK”) from the output of sys.dm_exec_requests, their scheduler_id values are expected to change frequently when the corresponding database has ongoing redo workloads. T-SQL query samples: -- Query worker_migration_count column in DMV sys.dm_os_workers SELECT worker_address, worker_migration_count FROM sys.dm_os_workers ORDER BY worker_migration_count DESC -- Query parallel redo tasks in DMV sys.dm_exec_requests -- (scheduler_id values may change for some rows when re-running this query) SELECT scheduler_id, session_id, task_address, command FROM sys.dm_exec_requests WHERE command = 'PARALLEL REDO TASK' OR command = 'PARALLEL REDO HELP TASK' ORDER BY session_id Next Steps In the future, other SQL Server system long-running tasks may be evaluated for enabling worker migration to avoid scheduler contention. One such example would be the availability group main redo task, which can also be impacted by scheduler contention. When the main redo tasks of two databases are assigned to the same scheduler, or a main redo task shares the same scheduler with a long-running and CPU intensive query task, redo performance can drop 30% or more. Dong Cao Principal Software Engineer16KViews10likes1CommentConfigure SQL Server AG (Read-Scale) for SQL Containers deployed on Kubernetes using Helm
If you are trying to setup Always On availability group between SQL instances deployed as SQL containers on Kubernetes platform, then I hope that this blog provides you the required reference to successfully setup the environment. Target: By end of this blog, we should have three SQL Server instances deployed on the Kubernetes aka k8s cluster. With Always On availability group configured amongst the three SQL Server instances in Read scale mode. We will also have the READ_WRITE_ROUTING_URL setup to provide read/write connection redirection. References: Refer Use read-scale with availability groups - SQL Server Always On | Microsoft Docs to read more about read scale mode. To prepare your machine to run helm charts please refer this blog where I talk how you can setup your environment including AKS and preparing your windows client machine with helm and other tools to deploy SQL Server instances on AKS (Azure Kubernetes Service). Environment layout: 1) To set this environment up, in my case I am using Azure Kubernetes Service as my Kubernetes platform. 2) I will deploy three SQL Server container instances using helm chart in a Statefulset mode you can also deploy this even using deployment mode. 3) I will use T-SQL scripts to setup and configure the always on availability group. Let's get the engine started: Step 1: Using helm deploy three instances of SQL Server on AKS with Always on enabled and create external services of type load balancer to access the deployed SQL Servers. Download the helm chart and all its files to your windows client, switch to the directory where you have downloaded and after you have done modification to the downloaded helm chart to ensure it is as per your requirement and customization, deploy SQL Servers using the command as shown below, you can change the deployment name ("mssql") to anything that you'd like. helm install mssql. --set ACCEPT_EULA.value=Y --set MSSQL_PID.value=Developer Within few minutes, you should see the pods coming up, the number of pods that would be started depends on the "replicas" value you set in the values.yaml file, if you use it as is, then you should have three pods starting up, as the replicas value is set to three. So you have three SQL Server instances using its own pvc's up and running as shown below We also need a way to access these SQL Servers outside the kubernetes cluster, and since I am using AKS as my kubernetes cluster, to access the SQL Server instances, we have created three services one each for the SQL Server pod. The yaml file for the services is also shared with the helm chart under the folder "external services" and the yaml file name is : "ex_service.yaml". If you are using the sample helm chart, you can create the services using the command shown below: kubectl apply -f "D:\helm-charts\sql-statefull-deploy\external services\ex_service.yaml" Apart from the three external services, we will also need the pods to be able to talk to each other on port 5022 (default port used by AG for endpoints on all the replicas) so we create one clusterip service for each pod, the yaml file for this is also available in the sample helm chart under the folder "external services" and the file name is "ag_endpoint.yaml". If you have not made any changes then you can create the service using the command: kubectl apply -f "D:\helm-charts\sql-statefull-deploy\external services\ag_endpoint.yaml" If all the above steps are followed you should have the following resources in the kubernetes cluster: Note: On our cluster, we already have a secret created to store sa password using the command below, the same sa password is being used by all the three SQL Server instances. It is always recommended to change the sa password after the SQL container deployment so the same sa password is not used for all three instances. kubectl create secret generic mssql --from-literal=SA_PASSWORD="MyC0m9l&xP@ssw0rd" Step 2: Create Certificates on primary and secondary replicas followed by creation of endpoints on all replicas. Now it's time for us to create the certificate and endpoints on all the replicas. Please use the External IP address to connect to SQL Server primary instance and run the below T-SQL command to create the certificate and endpoint. --In the context of master database, please create a master key use master go CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<'mycomplexpassword'>'; --under the master context, create a certificate that will be used by endpoint for --authentication. We then backup the created certificate -- to copy the certificate to all the other replicas CREATE CERTIFICATE dbm_certificate WITH SUBJECT = 'dbm'; BACKUP CERTIFICATE dbm_certificate TO FILE = '/var/opt/mssql/data/dbm_certificate.cer' WITH PRIVATE KEY ( FILE = '/var/opt/mssql/data/dbm_certificate.pvk',ENCRYPTION BY PASSWORD = '<'mycomplexpassword'>'); --Now create the endpoint and authenticate using the certificate we created above. CREATE ENDPOINT [Hadr_endpoint] AS TCP (LISTENER_PORT = 5022) FOR DATABASE_MIRRORING ( ROLE = ALL, AUTHENTICATION = CERTIFICATE dbm_certificate, ENCRYPTION = REQUIRED ALGORITHM AES ); ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED; On the primary SQL Server instance pod, we should have the dbm_certificate.pvk and dbm_certificate.cer files at the location : /var/opt/mssql/data. As shown below We need to copy these files to the other pods, you can use the kubectl cp commands to copy from the primary pod to your local client and then from the local client to the secondary pods. Sample commands are shown below --Please ensure on the local machine you create the certificates folder and then run the --below command to copy the files from primary pod to the local machine kubectl cp mssql-sql-statefull-deploy-0:/var/opt/mssql/data/dbm_certificate.pvk "certificate\pvk" kubectl cp mssql-sql-statefull-deploy-0:/var/opt/mssql/data/dbm_certificate.cer "certificate\cer" --Now copy the files from the local machine to the secondary pods kubectl cp "certificate\certs" mssql-sql-statefull-deploy-1:/var/opt/mssql/data/dbm_certificate.cer kubectl cp "certificate\pvk" mssql-sql-statefull-deploy-1:/var/opt/mssql/data/dbm_certificate.pvk kubectl cp "certificate\certs" mssql-sql-statefull-deploy-2:/var/opt/mssql/data/dbm_certificate.cer kubectl cp "certificate\pvk" mssql-sql-statefull-deploy-2:/var/opt/mssql/data/dbm_certificate.pvk Post this the files should be available on every pod as shown below Please create the certificates and endpoints on the secondary replica pods by connecting to the secondary replicas and running the below T-SQL commands: --Run the below command on secondary 1&2 : mssql-sql-statefull-deploy-1 & mssql-sqlstatefull-deploy-2 --once the cert and pvk files are copied create the cert here on secondary and alsocreate the endpoint CREATE MASTER KEY ENCRYPTION BY PASSWORD ='<'mycomplexpassoword'>'; CREATE CERTIFICATE dbm_certificate FROM FILE = '/var/opt/mssql/data/dbm_certificate.cer' WITH PRIVATE KEY ( FILE = '/var/opt/mssql/data/dbm_certificate.pvk', DECRYPTION BY PASSWORD = '<'mysamecomplexpassword'>' ); CREATE ENDPOINT [Hadr_endpoint] AS TCP (LISTENER_PORT = 5022) FOR DATABASE_MIRRORING ( ROLE = ALL, AUTHENTICATION = CERTIFICATE dbm_certificate, ENCRYPTION = REQUIRED ALGORITHM AES ); ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED; Step 3: Create the AG on the primary replica and then join the secondary replicas using T-SQL On the primary replica run the below command to create the AG which has Read_only_routing_list configured and also has Read_write_routing_url configured to redirect connection to primary irrespective of the instance that you connect provided you pass the database name to which you want to connect. --run the below t-sql on the primary SQL server pod CREATE AVAILABILITY GROUP MyAg WITH ( CLUSTER_TYPE = NONE ) FOR DATABASE test REPLICA ON N'mssql-sql-statefull-deploy-0' WITH ( ENDPOINT_URL = 'TCP://mssql-mirror-0:5022', AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT, FAILOVER_MODE = MANUAL, SEEDING_MODE = AUTOMATIC, SECONDARY_ROLE (ALLOW_CONNECTIONS = ALL, READ_ONLY_ROUTING_URL = 'TCP://104.211.231.206:1433' ), PRIMARY_ROLE (ALLOW_CONNECTIONS = READ_WRITE, READ_ONLY_ROUTING_LIST = ('mssql-sql-statefull-deploy-1','mssql-sql-statefull-deploy-2'), READ_WRITE_ROUTING_URL = 'TCP://104.211.231.206:1433' ), SESSION_TIMEOUT = 10 ), N'mssql-sql-statefull-deploy-1' WITH ( ENDPOINT_URL = 'TCP://mssql-mirror-1:5022', AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT, FAILOVER_MODE = MANUAL, SEEDING_MODE = AUTOMATIC, SECONDARY_ROLE (ALLOW_CONNECTIONS = ALL, READ_ONLY_ROUTING_URL = 'TCP://104.211.203.78:1433' ), PRIMARY_ROLE (ALLOW_CONNECTIONS = READ_WRITE, READ_ONLY_ROUTING_LIST = ('mssql-sql-statefull-deploy-0','mssql-sql-statefull-deploy-2'), READ_WRITE_ROUTING_URL = 'TCP://104.211.203.78:1433' ) , SESSION_TIMEOUT = 10 ), N'mssql-sql-statefull-deploy-2' WITH ( ENDPOINT_URL = 'TCP://mssql-mirror-2:5022', AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT, FAILOVER_MODE = MANUAL, SEEDING_MODE = AUTOMATIC, SECONDARY_ROLE (ALLOW_CONNECTIONS = ALL, READ_ONLY_ROUTING_URL = 'TCP://104.211.203.159:1433' ), PRIMARY_ROLE (ALLOW_CONNECTIONS = READ_WRITE, READ_ONLY_ROUTING_LIST = ('mssql-sql-statefull-deploy-0','mssql-sql-statefull-deploy-1'), READ_WRITE_ROUTING_URL = 'TCP://104.211.203.159:1433'), SESSION_TIMEOUT = 10 ); GO ALTER AVAILABILITY GROUP [MyAg] GRANT CREATE ANY DATABASE; Note: In the above command, please ensure that you pass the service names that you created in step 1 for the enpoint_url and you pass the external IP address of the SQL Server pods when configuring the read_write_routing_url option. Any error here can result in the secondary's not able to join the AG. Now on the secondary replicas please run the T-SQL command to join the AG, sample shown below --On both the secondaries run the below T-SQL commands ALTER AVAILABILITY GROUP [MyAg] JOIN WITH (CLUSTER_TYPE = NONE); ALTER AVAILABILITY GROUP [MyAg] GRANT CREATE ANY DATABASE; The AG should be configured and the dashboard should look as shown below Step 4: Read_write_routing_url in action You can now try connecting to any of the secondary replicas and provide the AG database as the database context, you will automatically get routed to the current primary even without the presence of listener. As you can see we are connecting to 104.211.203.78 which is the external IP address for pod: mssql-sql-statefull-deploy-1 which is secondary server, but the connection got re-routed to the current primary which is : mssql-sql-statefull-deploy-0 @ 104.211.231.206 To try manual failover, please follow the steps documented here. Hope this helps!10KViews3likes2CommentsBulk Insert causing duplicates in target table
We use ETL tool(C# based) which uses Bulkcopy method to do bulk insert to target table. Recently we are facing 2 Issues in our daily loads 1. Target table with no primary key - the load returns success but it inserts 5x times of the same record. It loads same records 5 times. 2. Target table with Primary Key - The load returns failure with primary key constraint but it inserts the rows into target table. DBA Team is unable to find anything in there logs ,So I am confused why this error occurs. This occurs daily in 10% of the jobs and once you rerun after 2-3 hrs the same job finishes successfully without above mentioned issues. Please suggest how can I debug this issue.6.5KViews0likes3CommentsQuorum vote configuration check in AlwaysOn Availability Group Wizards (Andy Jing)
First published on MSDN on Mar 13, 2012 Quorum vote configuration check in AlwaysOn Availability Group WizardsBy Andy JingIn this blog, I try to delve deeper into the guidelines of adjusting the quorum voting in the Windows Server Failover Cluster (WSFC) for the availability groups and explain the reasons behind them with a specific example.6.3KViews0likes0Comments