Forum Discussion
vNVMe on Hyper-V to unlock PCIe 5.0 NVMe performance
what did you use to benchmark your virtual disk ?
In my case i use a crystaldiskmark and it work fine.
In my lab Hyper-v server, I have 8 NVME in raid 0 (RAID 0, not RAID 5 because its a lab server) and my VM use scsi connector like always.
This is the result
- festucAug 27, 2025Copper Contributor
Thanks for the reply and for sharing your CDM settings. I reran everything on my side using the exact same CrystalDiskMark 9.0.1 parameters from your screenshot (5 passes, 1 GiB, R70%/W30%, SEQ1M Q8T64 and RND4K Q32T64). Hardware and topology details below for context.
1) Drop in RND4K write after enabling the Hyper-V role (still testing on the host)
- Host (64 cores) on bare-metal vs same host with the Hyper-V role enabled.
- Sequential read/write stayed essentially identical (within ~±2%).
- However, RND4K write (Q32T64) on the host fell by ~44% on both drives after enabling Hyper-V.
- This was reproducible on:
- Intel Optane P5800X 800 GB (U.2, PCIe 4.0 x4)
- Solidigm D7-PS1030 3.2 TB (E3.S, PCIe 5.0 x4)
- Same OS, power plan = High Performance, same CDM build, same test size/data pattern, AV exclusions in place.
In short: just turning on the Hyper-V role didn’t touch sequential, but it did hurt 4K random writes on the host by ~44% for me.
2) Performance inside the VM
- VM configured with 32 vCPU (I do not want to break NUMA; NUMA Spanning = OFF on the host).
- Sequential (SEQ1M Q8T64) inside the VM is almost identical to the host (within ~1–2%) on both drives.
- RND4K read (Q32T64) inside the VM drops significantly versus the host:
- Optane P5800X: about −55% vs host
- D7-PS1030: about −74% vs host
- RND4K write (Q32T64) inside the VM goes up versus the Hyper-V host run (roughly +30%), but it’s still ~25% below bare-metal.
My interpretation: the virtual SCSI + VMBus path and scheduling at T=64 threads inside the VM don’t scale like the host (especially for 4K random reads), while writes may benefit from coalescing/scheduling effects but still can’t match bare-metal.
3) Test setup (for completeness)
- Host: 64 cores.
- VM: 32 vCPU pinned to a single NUMA node (to keep locality); NUMA Spanning OFF at the host level.
- Drives under test (one at a time, direct to CPU, no RAID):
- Intel Optane P5800X 800 GB, U.2, PCIe 4.0 x4
- Solidigm D7-PS1030 3.2 TB, E3.S, PCIe 5.0 x4
- CrystalDiskMark 9.0.1 x64, Admin, same settings as in your screenshot:
- 5 passes, 1 GiB, R70/W30, SEQ1M Q8T64, RND4K Q32T64
- Note: the “RND4K (µs)” row in CDM is the same Q32T64 test expressed in average µs, not QD1 latency.
Thanks again!
- L_Youtell_974Aug 28, 2025Iron Contributor
Hi,
i don't think you can obtain 100% of performance in your VM. Don't forget, you are not on the hard drive but you use an interface to write on a virtual drive so it take a little longer to write the information on the virtual drive.
- festucAug 28, 2025Copper Contributor
Thanks for the follow-up—totally agree that 100% host-class performance in a VM isn’t realistic due to the virtual I/O path. My goal isn’t zero-overhead; it’s to separate what’s expected from what might be avoidable, especially for SQL-like, small-IO latency.
What I’m seeing (CDM 9.0.1, Profile: Default, 5×, 32 GiB, Random; Windows Server 2025):
- Sequential & higher-queue reads are fine in a VM
- SEQ 1MiB Q8T1 (Read): ~parity host ↔ VM on both drives.
- RND4K Q32T1 (Read/Write): ~parity on Optane; small drop on the Solidigm.
- The big gap is low-queue 4 K latency (Q1T1)
- Optane P5800X: 8.9 µs → 52.6 µs in VM (~6× slower 😒 ; 111k → 18.9k IOPS, −83%).
- Solidigm D7-PS1030: 55.1 µs → 119.7 µs in VM (~2.2× slower; 18.1k → 8.3k IOPS, −54%).
That pattern feels heavier than “a little slower”: the vSCSI/VMBus/StorVSP path seems OK for throughput and for moderate queues, but penalizes Q1/T1 latency very strongly—precisely the OLTP zone.
A couple of clarifications/questions:
- I’m not using exotic CDM presets (no T=64). This is the Default profile (RND4K Q1T1, RND4K Q32T1, SEQ Q1T1 Write, SEQ Q8T1 Read) to mimic SQL patterns.
- The host shows another oddity: after merely enabling the Hyper-V role, host-side RND4K write dropped ~44% (no VMs running). Is that expected (e.g., storage filter/stack changes when the role is installed)?
If there are recommended tunables (StorVSC/StorVSP queueing, controller settings, best-practice for vSCSI) or an ETA/plan for vNVMe / multi-queue improvements, I’m happy to re-run with whatever parameters you suggest (can also switch to DiskSpd for 8 KiB Q1/T1 with latency percentiles).
- Sequential & higher-queue reads are fine in a VM