First published on MSDN on May 17, 2017
My customer had recently experienced 2 interesting CDC issues and I would like to share the troubleshooting process and root cause for these issues today.
The issue:
During the weekend, my customer tried to promote some changes to production server and while they tried to enable CDC on production server, they first encountered a security error:
#1: Security error:
Msg 22832, Level 16, State 1, Procedure sp_cdc_enable_table_internal, Line 622 Could not update the metadata that indicates table [dbo].[Table_Name] is enabled for Change Data Capture. The failure occurred when executing the command '[sys].[sp_cdc_add_job] @job_type = N'capture''. The error returned was 916: 'The server principal "S-1-9-3-1293060401-1231192665-3834407059-1208013724." is not able to access the database "msdb" under the current security context.'. Use the action and error to determine the cause of the failure and resubmit the request
My customer soon found a workaround for this issue by manually creating 2 CDC jobs:
EXEC [sys].[sp_cdc_add_job] @job_type = N'capture'; EXEC [sys].[sp_cdc_add_job] @job_type = N'cleanup';
After manually creating these 2 jobs, they encountered 2 nd issue:
#2: Nullable column:
Msg 22832, Level 16, State 1, Procedure sp_cdc_enable_table_internal, Line 622 Could not update the metadata that indicates table [dbo].[Table_Name] is enabled for Change Data Capture. The failure occurred when executing the command 'insert into [cdc].[change_tables]'. The error returned was 515: 'Cannot insert the value NULL into column 'has_drop_pending', table 'LLCProduction.cdc.change_tables'; column does not allow nulls. INSERT fails.'. Use the action and error to determine the cause of the failure and resubmit the request.
Again, this issue was workaround by updating database default setting ANSI_NULL_DEFAULT to ON:
ALTER DATABASE DB_Name SET ANSI_NULL_DEFAULT ON;
The unanswered questions:
Although this customer now has CDC enabled, the questions remain unanswered – why the issues happened? How can DBAs avoid running into these issues in the future?
The troubleshooting:
We have verified a few things on production server:
And we tried to reproduce these issues in test environment with same build number (10.0.4064) and CDC enabled successfully.
It appears we are running out of leads. I decided to take a look at the CDC creation process, especially agent job creation process and here is what I found out:
Stored Procedure "sys.sp_cdc_enable_table" calls "sys.sp_cdc_enable_table_internal" who then calls a few other stored procedures. During these calls, there are 2 times security context switched.
Below is pseudocode for this process:
"sys.sp_cdc_enable_db"
"sys.sp_cdc_enable_table"
The error raised in "sys.sp_cdc_add_job" and the error message is "not able to access the database 'msdb'". Access to msdb is in "sys.sp_cdc_add_job_internal" and this stored procedure is executed as user "dbo" in user database. Now this issue comes down to why 'dbo' user in user database doesn't have access to msdb database? While reviewing the msdb in the production server, my DBA friend Frank Fan pointed out the "guest" user in msdb is disabled.
And on other instances that CDC enabled successfully, all of which have "guest" user enabled! To be sure about msdb "guest" user could cause CDC failure, I quickly ran below test in my SQL 2016SP1 instance and suceessfully reproduce the same issue:
#1: Diable "guest" user in msdb by revoking connect permission:
REVOKE CONNECT TO guest GO
#2: Create test database and try enable CDC:
CREATE DATABASE cdcdb GO USE cdcdb GO CREATE TABLE t1 ( id INT PRIMARY KEY ,msg NVARCHAR(20) ) GO SET ANSI_NULL_DFLT_OFF OFF GO EXEC sys.sp_cdc_enable_db GO EXEC sys.sp_cdc_enable_table @source_schema = N'dbo' ,@source_name = N't1' ,@role_name = N'cdc_Admin'; GO
CDC failed to enable and gave same error message:
Msg 22832, Level 16, State 1, Procedure sp_cdc_enable_table_internal, Line 639 [Batch Start Line 27] Could not update the metadata that indicates table [dbo].[t1] is enabled for Change Data Capture. The failure occurred when executing the command '[sys].[sp_cdc_add_job] @job_type = N'capture''. The error returned was 916: 'The server principal "S-1-9-3-330084520-1159009536-4224874419-612529348" is not able to access the database "msdb" under the current security context.'. Use the action and error to determine the cause of the failure and resubmit the request.
To fix this error, enabling "guest" user in msdb by grant connect permission to guest user, "sys.sp_cdc_enabled_table" finishes successfully.
Lesson learned : Leave "guest" user enabled in master, msdb and tempdb. See Microsoft recommendation: You should not disable the guest user in the msdb database in SQL Server
Hope this helps you in similar cases. We will discuss 2 nd issue in the next blog.
Special thanks for my SQL DBA friend Frank Fan for pointing the guest user was disabled in msdb, which saved us significant time and effort during this investigation.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.