Forum Discussion

festuc's avatar
festuc
Copper Contributor
Aug 14, 2025

vNVMe on Hyper-V to unlock PCIe 5.0 NVMe performance

On hosts with NVMe PCIe 5.0 (E3.S/U.2), Hyper-V guests still use virtual SCSI and leave a lot of performance on the table.

We are paying for top-tier storage, yet software becomes the limiter.

 

A virtual NVMe device that preserves checkpoints/Replica/Live Migration would align guest performance with modern hardware without forcing DDA and its operational trade-offs.

4 Replies

  • what did you use to benchmark your virtual disk ?

    In my case i use a crystaldiskmark and it work fine.
    In my lab Hyper-v server, I have 8 NVME in raid 0 (RAID 0, not RAID 5 because its a lab server) and my VM use scsi connector like always.

    This is the result

     

    • festuc's avatar
      festuc
      Copper Contributor

       

      Thanks for the reply and for sharing your CDM settings. I reran everything on my side using the exact same CrystalDiskMark 9.0.1 parameters from your screenshot (5 passes, 1 GiB, R70%/W30%, SEQ1M Q8T64 and RND4K Q32T64). Hardware and topology details below for context.

      1) Drop in RND4K write after enabling the Hyper-V role (still testing on the host)

      • Host (64 cores) on bare-metal vs same host with the Hyper-V role enabled.
      • Sequential read/write stayed essentially identical (within ~±2%).
      • However, RND4K write (Q32T64) on the host fell by ~44% on both drives after enabling Hyper-V.
      • This was reproducible on:
        • Intel Optane P5800X 800 GB (U.2, PCIe 4.0 x4)
        • Solidigm D7-PS1030 3.2 TB (E3.S, PCIe 5.0 x4)
      • Same OS, power plan = High Performance, same CDM build, same test size/data pattern, AV exclusions in place.

      In short: just turning on the Hyper-V role didn’t touch sequential, but it did hurt 4K random writes on the host by ~44% for me.

      2) Performance inside the VM

      • VM configured with 32 vCPU (I do not want to break NUMA; NUMA Spanning = OFF on the host).
      • Sequential (SEQ1M Q8T64) inside the VM is almost identical to the host (within ~1–2%) on both drives.
      • RND4K read (Q32T64) inside the VM drops significantly versus the host:
        • Optane P5800X: about −55% vs host
        • D7-PS1030: about −74% vs host
      • RND4K write (Q32T64) inside the VM goes up versus the Hyper-V host run (roughly +30%), but it’s still ~25% below bare-metal.

      My interpretation: the virtual SCSI + VMBus path and scheduling at T=64 threads inside the VM don’t scale like the host (especially for 4K random reads), while writes may benefit from coalescing/scheduling effects but still can’t match bare-metal.

      3) Test setup (for completeness)

      • Host: 64 cores.
      • VM: 32 vCPU pinned to a single NUMA node (to keep locality); NUMA Spanning OFF at the host level.
      • Drives under test (one at a time, direct to CPU, no RAID):
        • Intel Optane P5800X 800 GB, U.2, PCIe 4.0 x4
        • Solidigm D7-PS1030 3.2 TB, E3.S, PCIe 5.0 x4
      • CrystalDiskMark 9.0.1 x64, Admin, same settings as in your screenshot:
        • 5 passes, 1 GiB, R70/W30, SEQ1M Q8T64, RND4K Q32T64
      • Note: the “RND4K (µs)” row in CDM is the same Q32T64 test expressed in average µs, not QD1 latency.

      Thanks again!

       

      • L_Youtell_974's avatar
        L_Youtell_974
        Iron Contributor

        Hi, 

        i don't think you can obtain 100% of performance in your VM. Don't forget, you are not on the hard drive but you use an interface to write on a virtual drive so it take a little longer to write the information on the virtual drive.

Resources