I'm having a problem with Azure VM Premium_LRS data disks and Read/Write caching.

Copper Contributor

I'm having a problem with Azure VM Premium_LRS data disks and Read/Write caching.

 

While benchmark testing with IOMeter doing 32 concurrent 16 KB aligned random write I/Os, after a short time, the IO will freeze and the VM becomes totally non- responsive.

 

But this happens *only* on VMs on machine sizes that support premium SSD storage, Premium_LRS (e.g. Standard_DS2_v2), and it happens with *both* HDD and SSD data disks.

 

This problem does *not* happen on machine sizes that only offer standard storage, only HDD storage, Standard_LRS (e.g. Standard_A4_v2).

 

One thing I've noticed is that VM sizes that only support Standard_LRS have the disk block size as 512, but those that support both Standard_LRS and Premium_LRS, have the block size at 4096 (4Kn, 512e, https://www.wikipedia.org/wiki/Advanced_Format). The block size of 4Kn, 512e is true even for HDD disks, not just SSD disks.

 

Because the IOMeter I/Os are 16K address aligned, there should be no non-aligned read-modify-write issues to explain these difficulties.

 

Note that this problem is not seen if the HDD or SSD data disk cache is set to Read-Only, or None -- only Read/Write caching causes problems.

 

I originally saw this problem on a Linux VM that was being used by a Windows 2016 VM via iSCSI, but found that the problem is more easily reproduced by just adding a data disk to the Windows 2016 VM and running IOMeter against that disk.

 

Although this is just a benchmark test, the fact that it kills the using an I/O data pattern VM is worrying.

 

Are there any known issues with Premium_LRS capable VMs and Read/Write Caching, with random small block write I/Os?

0 Replies