The failure patterns exhibited by SSD are similar to spinning media. After reading this document ( https://www.usenix.org/system/files/conference/fast13/fast13-final80.pdf ) it is easy to see the need for a caching controller and making sure data is flushed properly.
· Bit Corruption, Records exhibit random bit errors
· Flying Writes, Well-formed records end up in the wrong place
· ShornWrites, Operations are partially done at a level below the expected sector size
· Metadata Corruption, Metadata in FTL is corrupted
· Dead Device, Device does not work at all, or mostly does not work
· Unserializability, Final state of storage does not result from a serializable operation order
Recommendation: Treat the SSD storage as you would spinning media making sure the appropriate safeguards are in place for power failure (I.E. battery backed cache, etc.)
Capacitor Power Up
The power outage testing document points out many interesting issues that might occur and the systems need to protect against. I specifically found, the need for the capacitor to ‘power up’ thought provoking. The charging behavior makes those power outages that occur and 10 or 15 seconds later another power flicker occurs very interesting indeed.
Most SSDs report 512 byte sector sizes but use 4K pages inside the 1MB erasure blocks. Using 512 byte aligned sectors for the SQL Server log device can generate more (R)ead (M)odify (W)rite activities which could contribute to slower performance and drive wear.
Recommendation: Make sure the caching controller is aware of the correct page size of the SSD(s) and is able to align physical writes with the SSD infrastructure properly.
The common view of a newly formatted drive is one holding all zeros. It is interesting to note that an erased block of an SSD is all 1’s making a raw read of an erased block all F’s. It is unexpected for a user to read an erased block during normal I/O operations. However, just last week I reviewed a report that seems to align with this behavior.
A technique we have used in the past is to write a known pattern to the entire drive. Then as we execute database activity against that same drive we can detect incorrect behavior (stale read / lost write / read of incorrect offset / etc.) when the pattern unexpectedly appears.
This technique does not work well on SSD based drives. The erasure and (R)ead (M)odify (W)rite (RMW) activities a write destroys the pattern. The SSD GC activity, wear leveling, proportional/set-aside list blocks and other optimizations tend to cause writes to acquire different physical locations unlike spinning medias sector reuse.
Flying Writes / Incorrect FLT Mapping
Like many of us the flying writes seem more like a servo and head movement problem. However, in December I worked on a system were the GPT data (sectors) that should be at the start and end of the volume would show up during a read of the database file. The first part of the database page was all zeros followed by the GPT information as outlined for the GPT in MSDN. This was occurring without a power outage/cycle and we continue to investigate FLT mapping bug possibilities.
As you can imagine non-serialized writes are a database killer. Breaking the WAL protocol and making it difficult at best to diagnosis how the data transitioned to damaged state.
The firmware used in SSD drives tends to be complex when compared to the spinning media counterparts. Many drives use multiple processing cores to handle incoming requests and garbage collection activities. Just last week I was made aware of a firmware fix. The cores shared the same memory area, leading to a race condition corrupting the SQL Server Log File (ldf.)
Recommendation: Make sure you keep the firmware up-to-date on the SSDs in order to avoid known problems.
Read Data Damage / Wear Leveling
The various Garbage collection (GC) algorithms tend to remain proprietary. However, there are some common, GC approaches that tend to be well known. One such activity is to help prevent repeated, read data damage. When reading the same cell repeatedly it is possible the electron activity can leak and cause neighboring cell damage. The SSDs protect the data with various levels of ECC and other mechanisms.
One such mechanism relates to wear leveling. The SSD keeps track of the write and read activity on the SSD. The SSD GC can determine hot spots or locations wearing faster than other locations. The GC may determine a block that has been in read only state for a period of time needs to move. This movement is generally to a block with more wear so the original block can be used for writes. This helps even the wear on the drive but mechanically places read only data at a location that has more wear and mathematically increases the failure chances, even if slightly.
The reason I point this behavior out is not to specifically recommend anything but to make you aware of the behavior. Imagine you execute DBCC and it reports and error and you run it a second time and it reports additional or a different pattern of errors. It would be unlikely but the SSD GC activity could make changes between the DBCC executions.
OS Error 665 / Defragmentation
I have started investigations as to what fragmentation means on an SSD. In general, there is not much to do with fragmentation on an SSD. There are some defragmentation and trimming activities that can be of note: http://www.hanselman.com/blog/TheRealAndCompleteStoryDoesWindowsDefragmentYourSSD.aspx
Recommendation(s): Use an appropriate, battery backed controller designed to optimize write activities. This can improve performance, reduce drive wear and physical fragmentation levels.
Consider REFS to avoid the NTFS attribute limitations.
Make sure the file growth sizes are appropriately sized.
I am still trying to understand if there are any real impacts from the SSD compression behaviors. Some of the SSD documentation mentions that writes may be compressed by the SSD. The compression occurring for the SSD is part of the write operation. As long as the drive maintains the intent of stable media, compression could elongate the drive life and may positively impact performance.
Bob Dorr - Principal SQL Server Escalation Engineer
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.