I'm having a problem with Azure VM Premium_LRS data disks and Read/Write caching.

%3CLINGO-SUB%20id%3D%22lingo-sub-142207%22%20slang%3D%22en-US%22%3EI'm%20having%20a%20problem%20with%20Azure%20VM%20Premium_LRS%20data%20disks%20and%20Read%2FWrite%20caching.%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-142207%22%20slang%3D%22en-US%22%3E%3CP%3EI'm%20having%20a%20problem%20with%20Azure%20VM%20Premium_LRS%20data%20disks%20and%20Read%2FWrite%20caching.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EWhile%20benchmark%20testing%20with%20IOMeter%20doing%2032%20concurrent%2016%20KB%20aligned%20random%20write%20I%2FOs%2C%20after%20a%20short%20time%2C%20the%20IO%20will%20freeze%20and%20the%20VM%20becomes%20totally%20non-%20responsive.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EBut%20this%20happens%20*only*%20on%20VMs%20on%20machine%20sizes%20that%20support%20premium%20SSD%20storage%2C%20Premium_LRS%20(e.g.%20Standard_DS2_v2)%2C%20and%20it%20happens%20with%20*both*%20HDD%20and%20SSD%20data%20disks.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EThis%20problem%20does%20*not*%20happen%20on%20machine%20sizes%20that%20only%20offer%20standard%20storage%2C%20only%20HDD%20storage%2C%20Standard_LRS%20(e.g.%20Standard_A4_v2).%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EOne%20thing%20I've%20noticed%20is%20that%20VM%20sizes%20that%20only%20support%20Standard_LRS%20have%20the%20disk%20block%20size%20as%20512%2C%20but%20those%20that%20support%20both%20Standard_LRS%20and%20Premium_LRS%2C%20have%20the%20block%20size%20at%204096%20(4Kn%2C%20512e%2C%20%3CA%20href%3D%22https%3A%2F%2Fwww.wikipedia.org%2Fwiki%2FAdvanced_Format%22%20target%3D%22_blank%22%20rel%3D%22nofollow%20noopener%20noreferrer%22%3Ehttps%3A%2F%2Fwww.wikipedia.org%2Fwiki%2FAdvanced_Format%3C%2FA%3E).%20The%20block%20size%20of%204Kn%2C%20512e%20is%20true%20even%20for%20HDD%20disks%2C%20not%20just%20SSD%20disks.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EBecause%20the%20IOMeter%20I%2FOs%20are%2016K%20address%20aligned%2C%20there%20should%20be%20no%20non-aligned%20read-modify-write%20issues%20to%20explain%20these%20difficulties.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3ENote%20that%20this%20problem%20is%20not%20seen%20if%20the%20HDD%20or%20SSD%20data%20disk%20cache%20is%20set%20to%20Read-Only%2C%20or%20None%20--%20only%20Read%2FWrite%20caching%20causes%20problems.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EI%20originally%20saw%20this%20problem%20on%20a%20Linux%20VM%20that%20was%20being%20used%20by%20a%20Windows%202016%20VM%20via%20iSCSI%2C%20but%20found%20that%20the%20problem%20is%20more%20easily%20reproduced%20by%20just%20adding%20a%20data%20disk%20to%20the%20Windows%202016%20VM%20and%20running%20IOMeter%20against%20that%20disk.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EAlthough%20this%20is%20just%20a%20benchmark%20test%2C%20the%20fact%20that%20it%20kills%20the%20using%20an%20I%2FO%20data%20pattern%20VM%20is%20worrying.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EAre%20there%20any%20known%20issues%20with%20Premium_LRS%20capable%20VMs%20and%20Read%2FWrite%20Caching%2C%20with%20random%20small%20block%20write%20I%2FOs%3F%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-142207%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EData%20%26amp%3B%20Storage%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EStorage%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E
Regular Visitor

I'm having a problem with Azure VM Premium_LRS data disks and Read/Write caching.

 

While benchmark testing with IOMeter doing 32 concurrent 16 KB aligned random write I/Os, after a short time, the IO will freeze and the VM becomes totally non- responsive.

 

But this happens *only* on VMs on machine sizes that support premium SSD storage, Premium_LRS (e.g. Standard_DS2_v2), and it happens with *both* HDD and SSD data disks.

 

This problem does *not* happen on machine sizes that only offer standard storage, only HDD storage, Standard_LRS (e.g. Standard_A4_v2).

 

One thing I've noticed is that VM sizes that only support Standard_LRS have the disk block size as 512, but those that support both Standard_LRS and Premium_LRS, have the block size at 4096 (4Kn, 512e, https://www.wikipedia.org/wiki/Advanced_Format). The block size of 4Kn, 512e is true even for HDD disks, not just SSD disks.

 

Because the IOMeter I/Os are 16K address aligned, there should be no non-aligned read-modify-write issues to explain these difficulties.

 

Note that this problem is not seen if the HDD or SSD data disk cache is set to Read-Only, or None -- only Read/Write caching causes problems.

 

I originally saw this problem on a Linux VM that was being used by a Windows 2016 VM via iSCSI, but found that the problem is more easily reproduced by just adding a data disk to the Windows 2016 VM and running IOMeter against that disk.

 

Although this is just a benchmark test, the fact that it kills the using an I/O data pattern VM is worrying.

 

Are there any known issues with Premium_LRS capable VMs and Read/Write Caching, with random small block write I/Os?

0 Replies