Blog Post

SQL Server Blog
2 MIN READ

I/O optimizations for tempdb in SQL Server and Azure SQL

bobward's avatar
bobward
Icon for Microsoft rankMicrosoft
Aug 08, 2023

I have found almost every day of my career at Microsoft, I learn something new about Microsoft SQL. It is one of the reasons I enjoy my job. In April of 2023, I was speaking at the MVP Summit in Redmond, Washington. One day I was spending time in a side conversation with MVP Rob Farley discussing some of his "wish list" items for SQL. One of these wish list items was delayed transaction durability for tempdb.

 

Delayed transaction durability is the concept of lazily flushing changes from the transaction log on disk after a transaction commit. This concept can significantly speed up transaction throughput with the cost of possible data loss if SQL Server crashes or is shutdown before the log records are flushed. But full durability is not needed for tempdb because it is recreated each time on SQL Server startup.

 

It seemed to me like a very reasonable request we could include in updates in Azure SQL or future versions of SQL Server. So I started talking to several SQL "veterans" within our engineering team about this idea. Well, it turns out that even though this database option is not enabled for tempdb, our code by default lazily flushes log changes on commit. So basically in all supported versions of SQL Server and in Azure SQL, the behavior for tempdb is delayed transaction durability. You can see where we have amended our documentation to state this behavior.

 

As part of my investigation, I also found an additional optimization we have made starting in SQL Server 2019 (and in Azure SQL) for disk I/O for tempdb. SQL Server normally uses an option called FILE_FLAG_WRITE_THROUGH for database and log files to ensure disk consistency in case of situations like power failures, especially for disk systems with caching. But since tempdb is recreated each time on server startup, we don't need to use this option which can result in faster I/O performance for writes to tempdb. We have also amended our documentation to note this behavior.

 

These may seem like small optimizations but can result in really nice performance gains for your application's use of tempdb.

 

Bob Ward

Microsoft

Updated Aug 08, 2023
Version 2.0
  • jchang6's avatar
    jchang6
    Copper Contributor

    if SQL Server can detect SSD/NAND flash storage, further optimizations are warranted. NAND flash operations are: 1) Block erase, 2) program (page), and 3) read. Any writes must go to a blank page. there should be no need for file initialization. A SQL Server file allocation should just reserve the pages. A read of a reserved but unallocated page should return blank (block erase sets cell voltage high, not sure if this is zero or 1).

    Also, some specialty SSD/NAND may have small page size, but most are 32KB pages? Unless there is call to change SQL Server page size, is there any way we could try to issue writes in sets of 4 x 8KB pages as much as possible, or at the block size.

    Unrelated, I would really like to rethink SQL Server page storage. The 96-byte page header prevents any 64-byte alignment of rows. If the first 16 row offsets could be placed directly after the page header, and the rest stays at the end of the page?

    Is there are way for row offsets to load into AVX 256 or 512 bit (32 and 64 byte) registers. And then have AMD/Intel implement additional instructions so that row/column offset calculations are done with values loaded in these registers?