First published on MSDN on Sep 15, 2016 Recently I had a question regarding "What happens to records when they are deleted" – so I wanted to dig into what happens under the covers when just such a thing happens. The answer is somewhat simple – but the explanation is somewhat more complicated. So, I am breaking the answer up into 2 different parts. This first tip will look at how to locate a record from a table in the SQL data file. The next tip will look at what happens when a DELETE occurs. We have to be able to find the records to tell if they're there or not after a DELETE – which is why I'm breaking it up.
To begin I'll create a database for our tests.
IF DB_ID('DeletedDB') IS NOT NULL
ALTER DATABASE DeletedDB
DROP DATABASE DeletedDB
Next, I'll insert 100 records into a table named dbo.Sales. Any numeric columns from AdventureWorks I am going to cast as a character string. This just will make it easier to identify data when we look at it through a hex editor.
CREATE DATABASE DeletedDB
SELECT TOP (100) SalesOrderNumber
,CustomerID = CAST(CustomerID AS VARCHAR(12))
,IDCol = CAST(SalesOrderDetailID AS VARCHAR(10))
FROM AdventureWorks2014.Sales.SalesOrderDetail s
JOIN AdventureWorks2014.Sales.SalesOrderHeader h ON s.SalesOrderID = h.SalesOrderID
ORDER BY SalesOrderDetailID ASC
I'll order the table based on the IDCol column and tell SQL that the column is unique.
ALTER TABLE dbo.Sales ALTER COLUMN IDCol VARCHAR(10) NOT NULL
ALTER TABLE dbo.Sales ADD CONSTRAINT PK_Sales_IDCol PRIMARY KEY CLUSTERED (IDCol)
I can use the handy function sys.fn_physlocformatter to query the table and identify specifically where a row is. The column 'Location(File:Page:Slot)' tells me that the row I have highlighted is on page 330. So, this is where our journey will start.
,'Location(File:Page:Slot)' = sys.fn_PhysLocFormatter(% % physloc % %)
,LockResource = % % lockres % %
CROSS APPLY sys.fn_physLocCracker(% % physloc % %) x
We can use sys.dm_os_buffer_descriptors to view the page in the buffer pool.
WHERE page_id = 330
AND database_id = DB_ID('DeletedDB')
Next I am going to use DBCC PAGE to view the contents of the page. This will give us all of the information we need to find the row on disk. Notice what I have highlighted from DBCC PAGE – the offset for the row (purple) and the offset for the column (yellow). These will be our guide for finding the contents on disk.
DBCC PAGE(DeletedDB, 1, 330, 3) WITH TABLERESULTS
Now that I have found the page (330) and the needed offsets for showing the contents on disk, I am going to detach the database so I can read in the file with a hex editor.
ALTER DATABASE DeletedDB SET OFFLINE
Once the file is detached, I am going to open it with the XVI32.exe hex editor.
OK – time for some math. The DB page I am looking at is 330. Each page is 8192 bytes in size. If I multiply 330 by 8192 I get the starting offset in the file for where that page begins. In this case it is hex value 294000.
DECLARE @PageStart INT
DECLARE @TargetPage INT = 330
DECLARE @PageSize INT = 8192
SELECT @PageStart = CAST(@TargetPage * @PageSize AS VARBINARY(100))
SELECT PageStart = @PageStart
In the hex editor I can go to that specific address.
OK, cool. I have found some data. But, I don't really know where anything starts just yet column-wise. I have highlighted one column since it looks familiar to me from looking the resultset from the table earlier.
We can see here the AccountNumber column from the table is highlighted.
So – where does the first column (IDCOl) start? At this point we have all of the information we need. We just need to do a couple of calculations to find it.
What I have highlighted in purple above is the offset for the first row of the table. The row offset is 0x60. If we add this to the page offset we get: 0x2940000 + 0x60 = 0x2940060
If I lookup that address in the hex editor I am able to find a value – but it isn't quite what I am looking for just yet…
So – we now know where the row starts on disk. How about where the first column starts?
To find the location of the first column I simply add the offset for the column (0x17) to the value calculated above for where the row starts. The math is:
0x2940060 + 0x17 = 0x2940077
If I go to that address, I find the value 1 – which is the character string value of the IDCol in the first row. Very cool!
To verify I can find out where other columns start – let's try the CarrierTrackingNumber column. The offset for this column is 0x6a.
When I add 0x6a to the offset of where the row starts I get 0x2940CA.
0x2940060 + 0x6a = 0x2940CA
When I look up that address I am directed to exactly where the CarrierTrackingNumber column value begins on disk.
I hope this helps a bit with understanding how we are laying data for data pages and rows out on disk. Its literally a series of offsets of values that SQL Server uses to pull pages into memory.