Jan 05 2018
- last edited on
Jul 31 2018
I'm having a problem with Azure VM Premium_LRS data disks and Read/Write caching.
While benchmark testing with IOMeter doing 32 concurrent 16 KB aligned random write I/Os, after a short time, the IO will freeze and the VM becomes totally non- responsive.
But this happens *only* on VMs on machine sizes that support premium SSD storage, Premium_LRS (e.g. Standard_DS2_v2), and it happens with *both* HDD and SSD data disks.
This problem does *not* happen on machine sizes that only offer standard storage, only HDD storage, Standard_LRS (e.g. Standard_A4_v2).
One thing I've noticed is that VM sizes that only support Standard_LRS have the disk block size as 512, but those that support both Standard_LRS and Premium_LRS, have the block size at 4096 (4Kn, 512e, https://www.wikipedia.org/wiki/Advanced_Format). The block size of 4Kn, 512e is true even for HDD disks, not just SSD disks.
Because the IOMeter I/Os are 16K address aligned, there should be no non-aligned read-modify-write issues to explain these difficulties.
Note that this problem is not seen if the HDD or SSD data disk cache is set to Read-Only, or None -- only Read/Write caching causes problems.
I originally saw this problem on a Linux VM that was being used by a Windows 2016 VM via iSCSI, but found that the problem is more easily reproduced by just adding a data disk to the Windows 2016 VM and running IOMeter against that disk.
Although this is just a benchmark test, the fact that it kills the using an I/O data pattern VM is worrying.
Are there any known issues with Premium_LRS capable VMs and Read/Write Caching, with random small block write I/Os?