utilities
96 TopicsSQL Server Database Corruption: Causes, Detection, and some details behind DBCC CHECKDB
SQL Server Database Corruption: Causes, Detection, and some details behind DBCC CHECKDB Database corruption in SQL Server is rare but usually high-impact. When it occurs, it threatens ACID compliance - the foundation of transactional integrity - and can lead to downtime, data loss, and operational risk. This article explores: Common causes of corruption How DBCC CHECKDB works under the hood Performance tuning tips for running CHECKDB Sample error messages and what they mean Best practices for prevention and recovery Why ACID Matters in Corruption Scenarios Before diving into causes and detection, remember that SQL Server guarantees Atomicity, Consistency, Isolation, and Durability: Atomicity: Transactions are all-or-nothing. If any part fails, the whole transaction has to fail. Corruption can break this, leaving partial writes. Consistency: Every transaction moves the database from one valid state to another. Corruption violates this by possibly introducing invalid states. Isolation: Concurrent transactions shouldn’t interfere with each other. Corruption in shared pages can cause phantom reads or deadlocks. Durability: Once committed, data must persist, even in the event of system failure or crash. Disk-level corruption undermines durability guarantees. DBCC CHECKDB exists primarily to validate consistency and durability, ensuring that the logical and physical structures adhere to ACID principles. In any internal decision-making that a database engine needs to do – ACID will be the main motivating factor, overriding concerns about performance, high availability or any other considerations. This is by design – ACID is the single most important principle that the engine cares about. Common Causes of SQL Server Database Corruption Corruption usually originates outside of SQL Server, often in the I/O path. Below I list the main causes, ordered in terms of probability - the higher on the list, the more probable it is: Hardware Failures Disk errors, RAID controller cache issues, or faulty RAM can corrupt pages during writes. Even with write-ahead logging, if the physical medium fails, durability is compromised. I/O Subsystem Issues SAN/NAS instability, outdated drivers, or virtualization misconfigurations can cause torn writes. SQL Server relies on the OS and storage stack for atomic page writes; instability breaks this assumption. Improper Shutdowns Power loss during write operations can leave pages partially written, violating atomicity. Torn-page detection mitigates this, but only if checksums are enabled. OS or SQL Server Bugs Rare, but missing cumulative updates can expose edge cases in buffer pool or checkpoint logic. File System Misconfiguration Compressed/encrypted volumes or sector size mismatches can corrupt allocation maps in edge cases. Human Error Manual deletion of MDF/LDF files or incorrect restore sequences can orphan pages in some unsupported scenarios. Malware Ransomware or malicious scripts altering system tables can break referential integrity. How DBCC CHECKDB Works Under the Hood DBCC CHECKDB is SQL Server’s integrity verification tool, validating both physical and logical consistency: 1. Snapshot Creation Creates a transactionally consistent snapshot using sparse files (page changes that happen at runtime are tracked in the files to ensure consistency). Ensures checks run without blocking user activity, preserving isolation, unless WITH TABLOCK option is used to run on live database taking locks during execution. 2. Phases of Execution Allocation Checks Validates GAM, SGAM, and IAM pages for correct page allocation. Detects orphaned extents or double allocations. System Table Checks Verifies metadata in system catalogs like sysobjects and sysindexes. Ensures schema-level consistency. Table and Index Structure Checks Traverses B-trees, validating key order and linkage. Detects broken pointers or incorrect page splits. Page-Level Validation Reads every page, checks checksums or torn-page bits. Critical for durability verification. LOB Checks Ensures integrity of large object chains in IAM and target pages (text, image, XML). Cross-Object Consistency Confirms referential integrity across tables and indexes. 3. Error Reporting Errors include severity and repair recommendation: REPAIR_REBUILD: Non-destructive, fixes structural issues. REPAIR_ALLOW_DATA_LOSS: Last resort, may delete corrupt pages – the integrity of the “repaired” database is not guaranteed when used. Performance Tuning Tips for DBCC CHECKDB CHECKDB is I/O and CPU intensive. Here are some options you can use to optimize its performance while running: Use PHYSICAL_ONLY for Faster Checks DBCC CHECKDB ('YourDatabase') WITH PHYSICAL_ONLY; Skips logical checks, reducing load. This could be used for daily scans, assuming a full scan still happens, just more rarely. Run on a Restored Copy Offload workload to a non-production server using recent backups. Leverage Availability Groups Execute on a readable secondary replica to protect the primary node. Control Parallelism DBCC CHECKDB ('YourDatabase') WITH MAXDOP = 4; Explicit MAXDOP ensures predictable performance. You can adjust the parallelism in accordance with available CPUs. Schedule During Low Activity Avoid peak hours; combine with Resource Governor for throttling. Break Down Large Databases Use DBCC CHECKTABLE for individual large tables if full CHECKDB is too costly. Sample Error Messages and Interpretation Msg 824 “SQL Server detected a logical consistency-based I/O error: incorrect checksum.” → Page-level corruption, often disk-related. Msg 8905 “Extent (1:12345) in database ID 5 is marked allocated in GAM, but not in SGAM.” → Allocation map inconsistency. Msg 2533 “Table error: Object ID 123456789, index ID 1. Page (1:98765) failed checksum.” → Corruption in index or data page. Msg 8928 “Object ID 123456789, index ID 2: Page (1:54321) could not be processed.” → Structural issues in index B-tree. Apart from the errors above, we also sometimes see issues directly related to our use of sparse files when creating the snapshot: The operating system returned error 665(The requested operation could not be completed due to a file system limitation) to SQL Server during a write at offset 0x00002a3ef96000 in file 'DBFile.mdf:MSSQL_DBCC18' The operating system returned error 1450 (Insufficient system resources exist to complete the requested service.) to SQL Server during a write at offset 0x00002a3ef96000 in file with handle 0x0000000000000D5C. This is usually a temporary condition and the SQL Server will keep retrying the operation. If the condition persists, then immediate action must be taken to correct it. Those are directly caused by NTFS File Record Segment objects running out of space due to ATTRIBUTE_LIST_ENTRY list growing out of bounds, and not by CHECKDB command itself - for more details check the article https://learn.microsoft.com/en-us/troubleshoot/sql/database-engine/database-file-operations/1450-and-665-errors-running-dbcc-checkdb. Prevention and Recovery Best Practices Run DBCC CHECKDB weekly either directly on the database or on restored backups. This makes sure that if corruption happens you have a recent last known good backup to restore from. Maintain verified backups with WITH CHECKSUM and regular restore tests. It’s very important to make sure a created backup can be fully restored to make sure there’s no unpleasant surprises if an issue happens. Use enterprise-grade hardware (RAID 10, ECC memory, UPS). Apply cumulative updates for SQL Server and Windows. Avoid unsupported storage configurations (compressed/deduplicated volumes). Key Takeaways Corruption is usually hardware or I/O related, not SQL Server bugs. DBCC CHECKDB is your first line of defence - schedule it regularly. Always restore from a clean backup if in any way possible; use repair options only as a last resort. Critical Note on Repair Options What triggered this blog was a number of issues we saw where the plan in case a database goes suspect was essentially to: Switch database to emergency mode Switch to single user mode Run DBCC CHECKDB with REPAIR_ALLOW_DATA_LOSS options without any previous tests to check if a corruption is present Switching the database back to multi user and using it normally Note that a database can go suspect for multiple reasons, not all related to corruption – basically if we hit something that stops the recovery process and doesn’t allow us to finish. For example – a deadlock with the recovery thread causes the thread to be killed? Database goes to suspect mode. This plan is not the recommended approach - neither in corruption cases nor other reasons for suspect database. Using REPAIR_ALLOW_DATA_LOSS can leave the database logically inconsistent. You always need to validate data post-repair and address root causes (hardware, OS issues) before attempting recovery. For detailed guidance, see https://learn.microsoft.com/en-us/troubleshoot/sql/database-engine/database-file-operations/troubleshoot-dbcc-checkdb-errors. Note that the repair recommendation provided after running is the lowest level of repair that can address all errors reported by CHECKDB. However, “minimum” doesn’t mean it will fix everything it finds - some errors simply can’t be repaired. You might also need to run the repair process more than once, since removing some data might expose additional non-linked pages that themselves needs to be dropped as part of repair. Keep in mind that not every error requires this level of repair and using REPAIR_ALLOW_DATA_LOSS doesn’t always lead to data loss. The only way to know if fixing an error will cause data loss is to actually run the repair and verify the data afterwards. A helpful tip: You can use DBCC CHECKTABLE on any table that shows errors. This will tell you the minimum repair level needed for that specific table. It’s extremely important to remember that after running CHECKDB repair with data loss, you must manually validate your data. The repair process doesn’t guarantee logical consistency. For example, REPAIR_ALLOW_DATA_LOSS might remove entire data pages with inconsistent data. If that happens, tables with foreign key relationships could end up with rows that no longer have matching parent keys. I hope this post was able to bring a bit of clarity to the complicated topic of corruption causes and recovery. It’s important to remember that once corruption happens your options are limited – you have to be prepared before the issue occurs. Further Reading https://learn.microsoft.com/en-us/sql/t-sql/database-console-commands/dbcc-checkdb-transact-sql https://learn.microsoft.com/en-us/troubleshoot/sql/database-engine/database-file-operations/troubleshoot-dbcc-checkdb-errors https://learn.microsoft.com/en-us/sql/sql-server/failover-clusters/automatic-page-repair-availability-groups-database-mirroring?view=sql-server-ver17108Views3likes0Comments更新プログラム適用後、bcp.exe の出力メッセージが英語になる
English follows Japanese こんにちは。SQL Server サポート チームです。 事象 日本語版の SQL Server をインストールし、利用している環境で、SQL Server へ更新プログラムを適用後、bcp.exe の出力メッセージが英語になる場合があります。 接続先インスタンスを誤った場合の例: 本事象は、ODBCドライバーの修正が含まれた更新プログラムを適用する場合、更新プログラムに含まれるODBCドライバーのインストーラーは各言語版ではなく英語版での提供となり、更新プログラムを適用時にODBCドライバーの日本語リソースファイルが削除されるために発生します。 対処策 SQL Server の更新プログラムを適用後、日本語リソースファイルを再配置します。再配置手順は以下の通りです。 ODBC ドライバー(17.10.6.1)を例としてご説明します。 日本語リソースファイルを再配置方法: 1) 以下のサイトにアクセスし、リリース番号 17.10.6.1 ODBC ドライバー(日本語) をダウンロードします。 ODBC Driver for SQL Server のダウンロード - ODBC Driver for SQL Server | Microsoft Learn 2) ダウンロードしたファイル(msodbcsql.msi) を任意のフォルダーに配置します。 なお、後述の手順はC:\temp フォルダーにインストールしたインストーラーを配置した場合の例です。 3) 管理者権限でコマンドプロンプトを起動し、下記のコマンドを実行し、ダウンロードしたインストーラーを展開します。 msiexec /a "C:\temp\msodbcsql.msi" /qb targetdir="C:\temp\msodbcsql_msi" 4) 展開したフォルダーにある msodbcsqlr17.rll を次の配置先のフォルダーにコピーします。 ※ 配置先のフォルダー C:\Windows\System32\1041 が存在しない場合は、フォルダーを作成したうえで msodbcsqlr17.rll をコピーします。 コピー元: C:\temp\msodbcsql_msi\Windows\System32\1041 配置先: C:\Windows\System32\1041 5) bcp.exe の出力メッセージが日本語となるか確認します。 ーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーーー Hello, this is the SQL Server Support Team. Issue In environments where the Japanese version of SQL Server is installed and used, after applying an update to SQL Server, the output messages from bcp.exe may appear in English. Example when connecting to the wrong instance: This issue occurs when the update includes a fix for the ODBC driver. The installer for the ODBC driver included in the update is provided in English rather than in each localized language version. As a result, the Japanese resource file for the ODBC driver is deleted during the update process. Resolution After applying the SQL Server update, relocate the Japanese resource file. The steps below explain how to do this using ODBC Driver version 17.10.6.1 as an example. Steps to relocate the Japanese resource file: 1) Access the following site and download the ODBC Driver version 17.10.6.1 (Japanese version): Download ODBC Driver for SQL Server - ODBC Driver for SQL Server | Microsoft Learn 2) Place the downloaded file (msodbcsql.msi) in any folder. The following steps assume the installer is placed in the C:\temp folder. 3) Open Command Prompt with administrator privileges and run the following command to extract the installer: msiexec /a "C:\temp\msodbcsql.msi" /qb targetdir="C:\temp\msodbcsql_msi" 4) Copy the msodbcsqlr17.rll file from the extracted folder to the following destination folder: If the destination folder C:\Windows\System32\1041 does not exist, create it before copying the file. Source : C:\temp\msodbcsql_msi\Windows\System32\1041 Destination : C:\Windows\System32\1041 5) Confirm that the output messages from bcp.exe are displayed in Japanese.841Views0likes0CommentsChoosing what SQLIO tests to Run and Automating the Tests
First published on MSDN on Apr 04, 2013 Many companies rely on IO stress-testing tools like SQLIO Disk Subsystem Benchmark Tool (aka SQLIO) to perform an I/O subsystem validation before they deploy SQL Server in production.6.3KViews0likes0Comments