Forum Discussion

festuc's avatar
festuc
Copper Contributor
Aug 14, 2025

vNVMe on Hyper-V to unlock PCIe 5.0 NVMe performance

On hosts with NVMe PCIe 5.0 (E3.S/U.2), Hyper-V guests still use virtual SCSI and leave a lot of performance on the table.

We are paying for top-tier storage, yet software becomes the limiter.

 

A virtual NVMe device that preserves checkpoints/Replica/Live Migration would align guest performance with modern hardware without forcing DDA and its operational trade-offs.

6 Replies

  • Hi all, I agree that using VMware NVMe Controller in a VM delivers better performance compared to VMware default SCSI Emulation and I suspect the same would happen to Windows Hyper-V

    I can even tell this NVMe controller delivers better perf. when using SAN that is not using a single NVMe or SSD but traditional HDD. This might be due to the protocol stack.

     

    There are improvements in NVMe storage driver that might not be seen in the SCSI emulation but only the WS Storage and Hyper-V Team could help here.

     

    What would be cool but hard to test is a direct comparison of same HW with VMware and Hyper-V Windows Server 2025. Please keep in mind to use the latest Diskmark as this has an updated and optimized diskspd.exe.



    • festuc's avatar
      festuc
      Copper Contributor

      Thanks for the insight. I’m using the latest official DiskSpd from Microsoft (v2.2 at the time of testing), with OS cache disabled and latency reporting enabled to keep results clean and comparable.

      Unfortunately I can’t run a side‑by‑side VMware test on the same hardware right now, but I fully agree it would be very valuable to have an apples‑to‑apples comparison (vNVMe on ESXi vs. the current synthetic SCSI path on Hyper‑V) using identical DiskSpd workloads.

      It would be great if someone from the WS Storage & Hyper‑V Team could chime in here—both to confirm current guidance and to share whether a paravirtual NVMe device or further storage‑stack optimizations are on the roadmap for future Windows Server/Hyper‑V releases.

  • what did you use to benchmark your virtual disk ?

    In my case i use a crystaldiskmark and it work fine.
    In my lab Hyper-v server, I have 8 NVME in raid 0 (RAID 0, not RAID 5 because its a lab server) and my VM use scsi connector like always.

    This is the result

     

    • festuc's avatar
      festuc
      Copper Contributor

       

      Thanks for the reply and for sharing your CDM settings. I reran everything on my side using the exact same CrystalDiskMark 9.0.1 parameters from your screenshot (5 passes, 1 GiB, R70%/W30%, SEQ1M Q8T64 and RND4K Q32T64). Hardware and topology details below for context.

      1) Drop in RND4K write after enabling the Hyper-V role (still testing on the host)

      • Host (64 cores) on bare-metal vs same host with the Hyper-V role enabled.
      • Sequential read/write stayed essentially identical (within ~±2%).
      • However, RND4K write (Q32T64) on the host fell by ~44% on both drives after enabling Hyper-V.
      • This was reproducible on:
        • Intel Optane P5800X 800 GB (U.2, PCIe 4.0 x4)
        • Solidigm D7-PS1030 3.2 TB (E3.S, PCIe 5.0 x4)
      • Same OS, power plan = High Performance, same CDM build, same test size/data pattern, AV exclusions in place.

      In short: just turning on the Hyper-V role didn’t touch sequential, but it did hurt 4K random writes on the host by ~44% for me.

      2) Performance inside the VM

      • VM configured with 32 vCPU (I do not want to break NUMA; NUMA Spanning = OFF on the host).
      • Sequential (SEQ1M Q8T64) inside the VM is almost identical to the host (within ~1–2%) on both drives.
      • RND4K read (Q32T64) inside the VM drops significantly versus the host:
        • Optane P5800X: about −55% vs host
        • D7-PS1030: about −74% vs host
      • RND4K write (Q32T64) inside the VM goes up versus the Hyper-V host run (roughly +30%), but it’s still ~25% below bare-metal.

      My interpretation: the virtual SCSI + VMBus path and scheduling at T=64 threads inside the VM don’t scale like the host (especially for 4K random reads), while writes may benefit from coalescing/scheduling effects but still can’t match bare-metal.

      3) Test setup (for completeness)

      • Host: 64 cores.
      • VM: 32 vCPU pinned to a single NUMA node (to keep locality); NUMA Spanning OFF at the host level.
      • Drives under test (one at a time, direct to CPU, no RAID):
        • Intel Optane P5800X 800 GB, U.2, PCIe 4.0 x4
        • Solidigm D7-PS1030 3.2 TB, E3.S, PCIe 5.0 x4
      • CrystalDiskMark 9.0.1 x64, Admin, same settings as in your screenshot:
        • 5 passes, 1 GiB, R70/W30, SEQ1M Q8T64, RND4K Q32T64
      • Note: the “RND4K (µs)” row in CDM is the same Q32T64 test expressed in average µs, not QD1 latency.

      Thanks again!

       

      • L_Youtell_974's avatar
        L_Youtell_974
        Iron Contributor

        Hi, 

        i don't think you can obtain 100% of performance in your VM. Don't forget, you are not on the hard drive but you use an interface to write on a virtual drive so it take a little longer to write the information on the virtual drive.

Resources