[Server 22538] Deduplication gets stuck, refuses to cancel, while fsdmhost.exe spins indefinitely

Iron Contributor

Feedback Hub link

 

On a brand-new disk (WDC WD60EFZX), after creating a storage space, ReFS, bitlocker, copying 3TB of data, and enabling deduplication, the deduplication process begins and starts deduplication, but eventually the threads of the process involved (fsdmhost.exe) appear to start spinning without accomplishing anything.  The machine's fans just come up and it goes wild as CPU is consumed, but memory, I/O, and other standard metrics, don't move.

 

I’ve left it running for ~24 hours, and during that time, taskmgr/resmon/procmon show that once the i/o stops, it never continues again.

 

After rebooting the server, running 'Start-DedupJob -Type Optimization', dedup appears to start, but it eventually gets stuck again.

 

Having done it many times now, it seems to be always stuck at the same file or part on disk, per the stats:

 

PS C:\Users\TReKiE> get-dedupstatus

FreeSpace    SavedSpace   OptimizedFiles     InPolicyFiles      Volume
---------    ----------   --------------     -------------      ------
2.98 TB      689.6 GB     17175              17178              E:

 

 

Additionally, the spinning continues after Stop-DedupJob, and after making the attempt to stop, Get-DedupJob requests refuse to return, and require breaking (Ctrl-C) to even return to PowerShell.

 

 

PS C:\Users\TReKiE> Stop-DedupJob -Volume E:
PS C:\Users\TReKiE> Get-DedupJob
 
Type               ScheduleType       StartTime              Progress   State                  Volume
----               ------------       ---------              --------   -----                  ------
Optimization       Manual             1:13 AM                0 %        PendingCancel          E:
PS C:\Users\TReKiE> Get-DedupJob
[stops here, never returns]

 

 

I attached minidumps taken of fsdmhost.exe and server health ETLs to the bug on Feedback Hub, however, I was unable to attach the full dumps as they seem to be too big (but available on request!).  All the event logs Application/System/Deduplication-Operational/Diagnostic, say everything is fine. 

 

This identical hardware has previously handled deduplication of a 14TB ReFS Storage Space on previous versions of Windows Server 2019 and 2022 without issue.

 

Of note, I originally wrote this for build 22526, but it still happens on 22538.

4 Replies

Having same issue with a Raid of 120TB, it just like how you described. 

Thanks Thomas, I'm glad I'm not alone on this one. As a small addition to what I wrote above, over this past weekend, I started all over, copied the same data, all the same settings, but used NTFS instead. That seemed to solve the issue, dedup now had no problem, and is working normally.

Hi,

Same for me with all versions... Anything to try or a workaround for this issue? 
@Jonathan Kay 

@LaurentF_CHHey Laurent, wow, I had forgotten about this issue.  My apologies, I'm afraid I don't have a solution for you, as this problem was the end of the road for myself and ReFS.  Unfortunately there were too many downsides over the years and not enough benefits.  I still do have one old disk using ReFS in my personal machine, which doesn't have dedup enabled, but I intend on replacing it within the next month or so.

 

However, when you say "all versions", do you mean all insider versions?  If so, I would post a new bug report to hopefully gain more traction than this one did, it might be useful to reference this old one as well.