Forum Discussion
vNVMe on Hyper-V to unlock PCIe 5.0 NVMe performance
On hosts with NVMe PCIe 5.0 (E3.S/U.2), Hyper-V guests still use virtual SCSI and leave a lot of performance on the table.
We are paying for top-tier storage, yet software becomes the limiter.
A virtual NVMe device that preserves checkpoints/Replica/Live Migration would align guest performance with modern hardware without forcing DDA and its operational trade-offs.
4 Replies
- L_Youtell_974Iron Contributor
what did you use to benchmark your virtual disk ?
In my case i use a crystaldiskmark and it work fine.
In my lab Hyper-v server, I have 8 NVME in raid 0 (RAID 0, not RAID 5 because its a lab server) and my VM use scsi connector like always.This is the result
- festucCopper Contributor
Thanks for the reply and for sharing your CDM settings. I reran everything on my side using the exact same CrystalDiskMark 9.0.1 parameters from your screenshot (5 passes, 1 GiB, R70%/W30%, SEQ1M Q8T64 and RND4K Q32T64). Hardware and topology details below for context.
1) Drop in RND4K write after enabling the Hyper-V role (still testing on the host)
- Host (64 cores) on bare-metal vs same host with the Hyper-V role enabled.
- Sequential read/write stayed essentially identical (within ~±2%).
- However, RND4K write (Q32T64) on the host fell by ~44% on both drives after enabling Hyper-V.
- This was reproducible on:
- Intel Optane P5800X 800 GB (U.2, PCIe 4.0 x4)
- Solidigm D7-PS1030 3.2 TB (E3.S, PCIe 5.0 x4)
- Same OS, power plan = High Performance, same CDM build, same test size/data pattern, AV exclusions in place.
In short: just turning on the Hyper-V role didn’t touch sequential, but it did hurt 4K random writes on the host by ~44% for me.
2) Performance inside the VM
- VM configured with 32 vCPU (I do not want to break NUMA; NUMA Spanning = OFF on the host).
- Sequential (SEQ1M Q8T64) inside the VM is almost identical to the host (within ~1–2%) on both drives.
- RND4K read (Q32T64) inside the VM drops significantly versus the host:
- Optane P5800X: about −55% vs host
- D7-PS1030: about −74% vs host
- RND4K write (Q32T64) inside the VM goes up versus the Hyper-V host run (roughly +30%), but it’s still ~25% below bare-metal.
My interpretation: the virtual SCSI + VMBus path and scheduling at T=64 threads inside the VM don’t scale like the host (especially for 4K random reads), while writes may benefit from coalescing/scheduling effects but still can’t match bare-metal.
3) Test setup (for completeness)
- Host: 64 cores.
- VM: 32 vCPU pinned to a single NUMA node (to keep locality); NUMA Spanning OFF at the host level.
- Drives under test (one at a time, direct to CPU, no RAID):
- Intel Optane P5800X 800 GB, U.2, PCIe 4.0 x4
- Solidigm D7-PS1030 3.2 TB, E3.S, PCIe 5.0 x4
- CrystalDiskMark 9.0.1 x64, Admin, same settings as in your screenshot:
- 5 passes, 1 GiB, R70/W30, SEQ1M Q8T64, RND4K Q32T64
- Note: the “RND4K (µs)” row in CDM is the same Q32T64 test expressed in average µs, not QD1 latency.
Thanks again!
- L_Youtell_974Iron Contributor
Hi,
i don't think you can obtain 100% of performance in your VM. Don't forget, you are not on the hard drive but you use an interface to write on a virtual drive so it take a little longer to write the information on the virtual drive.