I recently tried to install SQL Server on a WS2022 VM hosted on a WS2022 Hyper-V host, but the installation failed with some cryptic error messages. After 2 hours of searching online I finally realized that the installation media was corrupted while being copied from a SMB share, which is also hosted on WS2022.
Then I tried to compute the hash of the installation media file over SMB with PowerShell, and astonishingly I got a different SHA-256 hash each time I tried (of course unless when the file is still cached in memory).
Running Get-FileHash command on files over encrypted SMB shares gives a different hash each time
Looking through the SMBClient logs with Event Viewer, I could see a lot of events with ID 31015 indicating message decryption failed due to "Bad data", and then the connection was immediately closed with event ID 30804 (which is to be expected when decryption fails according to the MS-SMB2 specification 220.127.116.11.1.1).
SMBClient event logs showing a series of events with ID 31015
After disabling SMB encryption, the problem went away, and I was able to copy files from SMB shares without any corruption. Re-enable encryption, and the problem resurfaces.
I reproduced the problem with Wireshark attached to both the server and the client and saved the traces. However, the problem requires a large file transmission to surface, and I don't really have the time and patience to examine each TCP packet for corruption in transport. So instead, I conducted a series of tests trying to pinpoint the issue, and it seems to me that the problem is related to the guest virtual networking stack. I can't be 100% sure though.
UNRELIABLE: Test results suggest that the problem on surfaces when the guest is hosted on Hyper-V, running WS2022, and is connected with a virtual switch
UPDATE (1/9/2022 15:50 UTC):
Just discovered that decryption fails even when SR-IOV is enabled. Maybe my tests weren't so reliable after all.
I turned on checksum validation in Wireshark and saw a huge amount of checksum validation errors on both client side and server side. In fact, nearly all packets had bad checksums. Wireshark suggested that this behavior could be cause by checksum offloading, so I retried after disabling checksum offloading on both the server and the client VMs with Disable-NetAdapterChecksumOffload cmdlet. Unfortunately the problem persisted. Note that this test was performed with a private virtual switch, so external networking infrastructure should have nothing to do with the problem here.
In the tests above, all Windows Server VM instances are freshly installed with the latest cumulative updates applied. All hardware acceleration features on the virtual network adapters are disabled except for SR-IOV when indicated. Virtual TPM and migration traffic encryption are both enabled.
The physical Hyper-V host running WS2022 is a Dell PowerEdge R750 with a pair of Intel Xeon 4316 CPUs and a 4-port Intel X710 10GbE NIC capable of SR-IOV (which by the way has a buggy driver which often bugchecks the host when I change the SR-IOV setting on vEthernet adapters, but that's another story). TPM and Intel TME are both enabled. It has the latest firmware, driver, and OS cumulative updates installed, and a system file check with SFC reported no integrity violations. I also ran the hardware diagnostic utility provided by Dell, which reported no hardware issues.
Now I am working around the issue by not using SMB encryption. However, that also means I lose the associated security benefits. Therefore, I would like to understand the reason and to see if there's a more permanent solution.
Thanks a lot in advance, and feel free to ask if more information or troubleshooting is needed.