26063 deduplication data corruption is still there.

Iron Contributor

From Server 2022 up to this newest 26063 build, they all have the same problem, as described here: https://techcommunity.microsoft.com/t5/windows-server-insiders/server-vnext-26040-and-server-2022-de...

I am out of energy for today and give up for today. It seems to be impossible to get Microsoft to care for actual OS bugs instead of marketing.

20 Replies
Hi, Joachim. Sorry you are feeling frustrated about this -- we do have a lot of different demands on our time in any given day; we are as human as you are. I work in engineering, not marketing. I'm trying to see if the de-dup team has enough data to diagnose this, given the feedback you've provided.

Thanks,
Michael Bernstein (MSFT)
Hi, is this NTFS or REFS? Or both?
Thank you for your reply! The problematic part is actually it applies to Server 2022 too. I can reproduce it with both.
If you want to look how it plays out with Server 2022 and nVext: I recorded a video which is on my home page https://www.joumxyzptlk.de/tmp/microsoft/S2022-Nested_Deduplication_VDI-Hyperv_profile_kills_filesys...
Video was created on AMD Ryzen 5950 with Win11 as host, but is the same on Dual Xeon 6226R with Server 2019 (HP ProLiant DL380) as host.
Thanks, Joachim! I attached that video to the bug report to provide additional context for the engineering team.

Thank you for your Update! This is great. Update from my side:
I've retested with ReFS on 26063 today: Could not reproduce deduplication corruption with ReFS. Tested with all three profiles today, "HyperV" (aka VDI in GUI), "Default" (aka Fileserver in GUI) and "Backup". It seems to be limited to NTFS-Deduplication where I could re-reproduce after those tests exactly the same way as usual.
If your engineering team is interested in "ready to reproduce packages, Version 26063, exported from Server 2019 host" just say. IMHO the existing packages with Server 2022 and Server 26040 should be enough, but if they need it I'll do it. Same goes for retesting Server 2022 + ReFS which I skip today since it is 22:37 local time.
If you look at this thread I started last September: Someone else had a similar problem with Server 2022 https://techcommunity.microsoft.com/t5/windows-server-for-it-pro/server-2022-and-server-vnext-build-...
Additional Information: I've been in contact with Philipp Kuhn (Philipp.Kuhn insert at character here microsoft dot character com), but he has no access to resources to reproduce this. His private machines don't have enough RAM and he really tried, and with Azure Test VMs he provided we could not reproduce since double nested virtualization does not work there.

Could the engineering reproduce this bug, maybe even as easy as I can? I don't expect a fix so fast, but if they could reproduce it would be a good feedback.
I checked on the bug report. As far as I can tell, the engineering team is still working to reproduce the issue.
The easiest way is to use a Server 2019 or Server 2022 or Windows 11 Hyper-V host (real metal, not virtual). Somewhat current CPU, whether AMD Ryzen 2xxx or newer or intel Gen 8 (or their respective Threadripper / Epyc / Xeon counterpart ), SSD, 32 GB RAM. I could repo this in my old i7-4960x too, but not every time but rather every third time.
Then use these prepared packages, unpack on D:, import into Hyper-V. They are completely self contained repos.
The Server 2022 repo package: https://www.joumxyzptlk.de/tmp/microsoft/S2022-nested-2023-09-30-exported-from-S2019-host.7z
The Server 26040 repo package: https://www.joumxyzptlk.de/tmp/microsoft/Server-vNext-26040-nested-dedup-problem-export-from-S2019-h...
The follow the text file on the desktop.

To give you the full information: These are the contents of the text file on the desktop of that VM:
############################################################
Creation host: Ryzen 5950x, 64 GB ECC RAM, Windows 11 21H2.
----------
This guest: Standard as you can see except for:
Set-VMProcessor -VMName <VmName> -ExposeVirtualizationExtensions:$true"
No second VHDX yet.
Server 26040 / Server 2022 VM PREPARATION:
- Standard install, C: = 30720 MB during setup.
As for Server nVext 26063: This is the VHDX download version since the .ISO download does not boot. See here: https://techcommunity.microsoft.com/t5/windows-server-insiders/iso-build-26063-fails-to-boot-in-hype...
- GPEDIT.MSC: Allow empty passwords.
- Windows/Microsoft updates right after installation.
- Activate Role Deduplication, do not configure deduplication.
- Add VHDX for second drive, add folder \Hyper-V.
- copy two test VMs, here those two Server 2012 R2 VMs with update state of 6th June 2022.

This is the export you see now.

These are the steps to reproduce the issue from here on, see screenshots attached / youtube video link:
- Activate Hyper-V in this VM. Not virtual switch needed.
- Activate deduplication for second volume, default setting. Template DOES NOT MATTER, all three expose the issue.
IF you started the VMs before activation deduplication you will have to set the "deduplication files older than" to "0 days".
- Run from Powershell: Start-DedupJob -Volume 😧 -Type Optimization -Full -Wait
- Wait until dedupe is finished
- Check with Get-DedupStatus | fl whether deduplication actually did the job. Expected saving is above 40%, usually above 50%.
- Import the two machines from D:\Hyper-V into Hyper-V Manager. The network cards don't need to be connected.
- Start those two Test VMs. Check folder "Updates-for-offlineinstall-test". they contain updates for offline installing.
Or simply run Windows updates, if you connected the network.
- Run the updates on both machines simultanously, not one after the other.
- Corruption occurs on deduplicated .vhdx files which then get written to.

My test result (here Server 2022 with VDI template):
- https://joumxyzptlk.de/tmp/microsoft/S2022-Nested_Deduplication_VDI-Hyperv_profile_kills_filesystem_...
- Both machines may blue screen during Windows Updates or show weird errors. Both may run into a recovery boot loop.
- Both have defective filesystems, if they manage to boot instead of getting stuck in boot loop.
- After the blue screen desaster: Start-Dedupjob -Volume <volume> -Type Optimization -Full -Wait may not work any more.
- Get-DedupVolume | fl still shows valid statistics if Start-Dedupjob fails.
- CHKDSK /f may show file system damage. Usually <NUMBER>.ccc and <NUMBER>.cd files of the deduplication
chunk store which usually resides in \System Volume Information\Dedup\ChunkStore\{UNIQUE IDENTIFIER}.ddp\Data

Counter verification:
- Run Windows Updates on those VMs without deduplication. It will work fine.

I tried to replicate the issue using powershell/.NET methods [System.IO.File]::* to create and modify
30+ GB testfiles, but that did not provoke the issue. The checksums were always correct.

How I discovered that bug:
I have a Server 2019-nested-vms testserver with many test-os variants installed there. I upgraded that VM to Server 2022. This is where I ran into the issues with windows updates within the nested VMs, and then traced it down to Server 2022 corrupting the filesystem.

Build 26080 : Still as easy to reproduce as ever since Server 2022. Without deduplication those VMs update just fine without blue screen and other weirdness.

If there is interest I'll create an 26080 export of the repo from a Server 2019 machine (which should import in Server 2019, Server 2022 and Windows 11 without problems).

Joachim_Otahal_0-1710356629020.pngJoachim_Otahal_1-1710356662703.png

Joachim_Otahal_2-1710356748030.png

 

Thanks, Joachim. I updated the bug report to note that 26080 is included.

Michael

A little update on this: For the first time in many years I had a blue screen on my actual server, which was Server 2019 until mid last year. The first one ever. Bluescreenview from nirsoft lists "ntfs.sys" as culpruit. I unprofessionaly suspect concurrent writes on the deduped volume residing on a tiered mirrored storage space (not S2D). And I unprofessionally suspect that it might be related to the issue with the dedup repo which corrupts data. But only you can tell, not me.

Computer:
i7-4960x, 32 GB RAM (ASRock x79 Fataility). Powersupply a Fujitsu 750 Watt Ex-Primergy.
C: = normal SSD,
😧= Deduped drive. Tiered Storage Space Mirror, 2*SSD, 2*HDD.
E: = Storage Spaces Parity with four drives.
Bitlocker aktive on 😧 and E:

Using NetQoS the speed is limited to 60 MB/s incoming since the tiered storage space cannot write faster. The blue screen happened during three parallel robocopy jobs over network, two of them with 😧 as destination. When the blue screen happened neither a dedupjob, nor a re-tiering job was running according to the event logs. After the crash and the filesystem check I copied linear, not concurrently, on and everything seemed fine.
This is the first ntfs.sys crash I saw anyway since about, maybe, XP SP0?
Here is the dump + the events around that crash.
https://joumxyzptlk.de/tmp/microsoft/NTFS-maybe-dedup-problem-first-server-2022-crash-ever-on-that-c...

Joachim_Otahal_0-1711652126680.png

 

Hi Joachim,
Thank you for sharing the dump. It shows some allocation was corrupted, but since this is a mini-dump it is difficult to debug further.

Do you think you can reproduce the bugcheck issue? If yes, could you please configure a complete dump by setting the reg key below, rebooting the machine, and then reproduce the bugcheck, and then share the dump.

reg add "hklm\SYSTEM\CurrentControlSet\Control\CrashControl" /v CrashDumpEnabled /t reg_dword /d 1 /f

Did you notice event id 55 in system event log in the past when you originally hit the issue? If you can share the following event log files from the repro machine that would be great. %windir%\System32\winevt\Logs\system.evtx
%windir%\System32\winevt\Logs\Microsoft-Windows-Ntfs%4Operational.evtx

In this case the deduped drive (D:) is a Tiered Storage Space Mirror, 2*SSD, 2*HDD. Is this something you configured recently? Asking because in the original report of the issue, this was not mentioned.

We have tried to reproduce the issue internally but have not seen any corruptions. although dedup drive was not a tiered spaces mirror volume.

In your video, the chkdsk output shows cleanup of security descriptor and index files, but that does not indicate corruption. If by chance you have saved the chkdsk output from before, please share that as well.

Thank you.

The original report of that issue was the completely encapsulated as a easy-to-repo VM on my Ryzen 5950x (SATA SSD), the Dual-Xeon 6226R (professional RAID controller with 2 SSDs RAID 1 for that volume), and with lower probability on that i7-4960x (the Tiered Storage Space mentioned above). So the storage below did not matter for the repo.
I activated the full dump now (local time 21:59), and try to reproduce during the next days. But it may take time to appear. Wish me luck!
As for NTFS event 55: Never. I am paranoid about noticing such thing. Especially since a customer had exchange database corruption several years just 'cause that bit was not noticed. This is checked every time the box is booted. Here that one rare example with NTFS warning after that crash. I guess you can read the xmlfilter variant, the rest of the OS is German :D.

Joachim_Otahal_0-1711659801625.png

 

PS: If you want a life demonstration with the repo, it can be done as a teams session with my work account email address removed for privacy reasons - but be aware, German eastern holiday starts right now, so next response there will be on Tuesday 2. of April since the work laptop is at work and not at home.
I can still perfectly reproduce this with build 26212, with 100% reliability, I suppose I have to recommend not using deduplication in Hyper-V on the local volume hosting the virtual machines with both Server 2022 and Server 2025.
https://techcommunity.microsoft.com/t5/windows-server-insiders/server-2025-hyper-v-deduplication-cor...
Hi, Joachim. MSSWRahman had asked you for a full dump and logs - were you able to hand those off? I understand that you can perfectly reproduce the issue, but we have not been able to reproduce this in-house at all.