WVD - almost daily TermService crashes disconnecting all users

Copper Contributor

Hi team,

 

A client of mine has a WVD environment compromising of a single Windows 10 Multisession host. This supports about 15 users, and runs really well for the majority of the time. The host VM is a Standard DS13 v2 (8 vcpus, 56 GiB memory), in the Australia South-East region. FSLogix is handling the user profiles. The host is up to date with all the latest patches and feature releases, has heaps of disk space, is not under CPU or RAM pressure, etc. 

 

Lately the server has developed almost a daily instability whereby the TermService service will crash, which instantly disconnects all the sessions. The users can log back in almost straight away (once the TermService process recycles after 6000ms), but that in turn causes a huge CPU spike which log-jams things for a while. Plus generally it's not a great time for the staff (or a great look for me, since I built the silly thing). Client is getting quite frustrated - 100% understandably.

 

When it faults, this is what is caught by WER in the event log:

Faulting application name: svchost.exe_TermService, version: 10.0.19041.546, time stamp: 0x058e175a
Faulting module name: RDPSERVERBASE.dll, version: 10.0.19041.844, time stamp: 0x491ad66a
Exception code: 0xc0000005
Fault offset: 0x0000000000049a4b
Faulting process ID: 0x4e0
Faulting application start time: 0x01d70b5d30318f10
Faulting application path: C:\windows\System32\svchost.exe
Faulting module path: C:\windows\system32\RDPSERVERBASE.dll
Report ID: 1f6a6eab-7cfd-415d-bc6c-8726cf85744b
Faulting package full name: 
Faulting package-relative application ID: 

 

In the Applications and Services Logs\Microsoft\Windows\TerminalServices-LocalSessionManager\Operational\ logs I see a bunch of these:

Session 4 has been disconnected, reason code 1067

 

Of course I cannot find any reference that tells me what reason code 1067 is, but I'm guessing it's because the TermService process has crashed. They happen straight after the first error is logged, and there's two for each user session logged in. 

 

I've loaded up the NirSoft AppCrashView app, but I can't see anything useful in it's output:

Version=1
EventType=APPCRASH
EventTime=132587022166067043
ReportType=2
Consent=1
UploadTime=132587022454956260
ReportStatus=268435456
ReportIdentifier=f567748d-0e33-482a-97bb-1c00d36ae905
IntegratorReportIdentifier=eb7d197a-204e-4da6-b73c-baa5aab3859e
Wow64Host=34404
NsAppName=svchost.exe_TermService
OriginalFilename=svchost.exe
AppSessionGuid=0000050c-0000-0018-cd61-c1fdab08d701
TargetAppId=W:0000f519feec486de87ed73cb92d3cac802400000000!0000010db07461e45b41c886192df6fd425ba8d42d82!svchost.exe
TargetAppVer=1972//12//14:16:22:50!1c364!svchost.exe
BootId=4294967295
ServiceGroupName=NetworkService
ServiceDllName=termsrv.dll
ServiceSplit=1
TargetAsId=21
IsFatal=1
EtwNonCollectReason=1
Response.BucketId=5244ca240db20558d608f520999effb7
Response.BucketTable=4
Response.LegacyBucketId=1587788389013192631
Response.type=4
Sig[0].Name=Application Name
Sig[0].Value=svchost.exe_TermService
Sig[1].Name=Application Version
Sig[1].Value=10.0.19041.546
Sig[2].Name=Application Timestamp
Sig[2].Value=058e175a
Sig[3].Name=Fault Module Name
Sig[3].Value=RDPSERVERBASE.dll
Sig[4].Name=Fault Module Version
Sig[4].Value=10.0.19041.746
Sig[5].Name=Fault Module Timestamp
Sig[5].Value=2af8aaa6
Sig[6].Name=Exception Code
Sig[6].Value=c0000005
Sig[7].Name=Exception Offset
Sig[7].Value=0000000000049a8b
DynamicSig[1].Name=OS Version
DynamicSig[1].Value=10.0.19041.2.0.0.16.175
DynamicSig[2].Name=Locale ID
DynamicSig[2].Value=1033
DynamicSig[22].Name=Additional Information 1
DynamicSig[22].Value=c926
DynamicSig[23].Name=Additional Information 2
DynamicSig[23].Value=c926eb8a57c3fe4a73c48eb4e93aff94
DynamicSig[24].Name=Additional Information 3
DynamicSig[24].Value=bb7b
DynamicSig[25].Name=Additional Information 4
DynamicSig[25].Value=bb7bb65c865656cf18d1ad71de2d760d
UI[2]=C:\windows\System32\svchost.exe
UI[5]=Close
UI[8]=Remote Desktop Services stopped working and was closed
UI[9]=A problem caused the application to stop working correctly. Windows will notify you if a solution is available.
UI[10]=&Close
LoadedModule[0]=C:\windows\System32\svchost.exe
LoadedModule[1]=C:\windows\SYSTEM32\ntdll.dll
LoadedModule[2]=C:\windows\System32\KERNEL32.DLL
LoadedModule[3]=C:\windows\System32\KERNELBASE.dll
LoadedModule[4]=C:\windows\System32\sechost.dll
LoadedModule[5]=C:\windows\System32\RPCRT4.dll
LoadedModule[6]=C:\windows\System32\ucrtbase.dll
LoadedModule[7]=C:\windows\System32\combase.dll
LoadedModule[8]=C:\windows\SYSTEM32\kernel.appcore.dll
LoadedModule[9]=C:\windows\System32\msvcrt.dll
LoadedModule[10]=C:\windows\System32\bcryptPrimitives.dll
LoadedModule[11]=C:\windows\System32\user32.dll
LoadedModule[12]=C:\windows\System32\win32u.dll
LoadedModule[13]=C:\windows\System32\GDI32.dll
LoadedModule[14]=C:\windows\System32\gdi32full.dll
LoadedModule[15]=C:\windows\System32\msvcp_win.dll
LoadedModule[16]=c:\windows\system32\termsrv.dll
LoadedModule[17]=C:\windows\System32\WS2_32.dll
LoadedModule[18]=C:\windows\System32\cfgmgr32.dll
LoadedModule[19]=c:\windows\system32\UMPDC.dll
LoadedModule[20]=C:\windows\SYSTEM32\WLDP.DLL
LoadedModule[21]=C:\windows\System32\advapi32.dll
LoadedModule[22]=C:\windows\System32\clbcatq.dll
LoadedModule[23]=C:\windows\system32\lsmproxy.dll
LoadedModule[24]=C:\windows\SYSTEM32\rmclient.dll
LoadedModule[25]=C:\windows\System32\sspicli.dll
LoadedModule[26]=C:\windows\system32\rdsnetfs.dll
LoadedModule[27]=C:\windows\System32\OLEAUT32.dll
LoadedModule[28]=c:\windows\system32\REGAPI.dll
LoadedModule[29]=C:\Program Files\Microsoft RDInfra\StackSxS\1.0.2008.26002\RdpCoreCDV.dll
LoadedModule[30]=C:\windows\System32\CRYPT32.dll
LoadedModule[31]=C:\windows\System32\bcrypt.dll
LoadedModule[32]=C:\windows\System32\SHLWAPI.dll
LoadedModule[33]=C:\windows\System32\SHELL32.dll
LoadedModule[34]=C:\windows\SYSTEM32\pdh.dll
LoadedModule[35]=C:\windows\SYSTEM32\IPHLPAPI.DLL
LoadedModule[36]=C:\windows\SYSTEM32\ncrypt.dll
LoadedModule[37]=C:\windows\SYSTEM32\DEVOBJ.dll
LoadedModule[38]=C:\windows\SYSTEM32\WINHTTP.dll
LoadedModule[39]=C:\windows\SYSTEM32\PROPSYS.dll
LoadedModule[40]=C:\windows\SYSTEM32\SECUR32.dll
LoadedModule[41]=C:\windows\SYSTEM32\d3d11.dll
LoadedModule[42]=C:\windows\SYSTEM32\USERENV.dll
LoadedModule[43]=C:\windows\SYSTEM32\AUTHZ.dll
LoadedModule[44]=C:\windows\SYSTEM32\dxgi.dll
LoadedModule[45]=C:\windows\SYSTEM32\websocket.dll
LoadedModule[46]=C:\windows\SYSTEM32\HTTPAPI.dll
LoadedModule[47]=C:\windows\SYSTEM32\tlscsp.dll
LoadedModule[48]=C:\windows\SYSTEM32\DPAPI.DLL
LoadedModule[49]=C:\windows\SYSTEM32\NTASN1.dll
LoadedModule[50]=C:\windows\System32\OLE32.dll
LoadedModule[51]=C:\Windows\System32\umb.dll
LoadedModule[52]=C:\windows\System32\SETUPAPI.dll
LoadedModule[53]=C:\windows\System32\WINTRUST.dll
LoadedModule[54]=C:\windows\System32\MSASN1.dll
LoadedModule[55]=C:\windows\SYSTEM32\CRYPTBASE.DLL
LoadedModule[56]=C:\windows\SYSTEM32\ntmarta.dll
LoadedModule[57]=C:\windows\System32\mstlsapi.dll
LoadedModule[58]=C:\windows\System32\ACTIVEDS.dll
LoadedModule[59]=C:\windows\System32\NETAPI32.dll
LoadedModule[60]=C:\windows\System32\TlsBrand.dll
LoadedModule[61]=C:\windows\System32\netutils.dll
LoadedModule[62]=C:\windows\System32\adsldpc.dll
LoadedModule[63]=C:\windows\System32\WINBRAND.dll
LoadedModule[64]=C:\windows\System32\WLDAP32.dll
LoadedModule[65]=C:\windows\System32\DSROLE.DLL
LoadedModule[66]=C:\windows\System32\LOGONCLI.DLL
LoadedModule[67]=C:\windows\System32\WKSCLI.DLL
LoadedModule[68]=C:\Program Files\Microsoft RDInfra\StackSxS\1.0.2010.07001\RdpCoreCDV.dll
LoadedModule[69]=C:\Program Files\Microsoft RDInfra\StackSxS\1.0.2011.05001\RdpCoreCDV.dll
LoadedModule[70]=C:\windows\system32\mswsock.dll
LoadedModule[71]=C:\windows\system32\rdpcorets.dll
LoadedModule[72]=C:\windows\System32\shcore.dll
LoadedModule[73]=C:\windows\system32\rfxvmt.dll
LoadedModule[74]=C:\windows\system32\RDPBASE.dll
LoadedModule[75]=C:\windows\system32\RDPSERVERBASE.dll
LoadedModule[76]=C:\windows\system32\CRYPTSP.dll
LoadedModule[77]=C:\windows\System32\vmbuspipe.dll
LoadedModule[78]=C:\windows\System32\winsta.dll
LoadedModule[79]=C:\windows\System32\credssp.dll
LoadedModule[80]=C:\windows\system32\schannel.DLL
LoadedModule[81]=C:\windows\SYSTEM32\mskeyprotect.dll
LoadedModule[82]=C:\windows\system32\tspkg.DLL
LoadedModule[83]=C:\windows\system32\ncryptsslp.dll
LoadedModule[84]=C:\windows\SYSTEM32\WINNSI.DLL
LoadedModule[85]=C:\windows\System32\NSI.dll
LoadedModule[86]=C:\windows\SYSTEM32\dhcpcsvc6.DLL
LoadedModule[87]=C:\windows\SYSTEM32\dhcpcsvc.DLL
LoadedModule[88]=C:\windows\SYSTEM32\webio.dll
LoadedModule[89]=C:\windows\SYSTEM32\DNSAPI.dll
LoadedModule[90]=C:\Windows\System32\rasadhlp.dll
LoadedModule[91]=C:\windows\System32\fwpuclnt.dll
LoadedModule[92]=C:\windows\system32\rsaenh.dll
LoadedModule[93]=C:\windows\System32\profapi.dll
LoadedModule[94]=C:\windows\System32\tssrvlic.dll
LoadedModule[95]=C:\windows\System32\msvcp110_win.dll
LoadedModule[96]=C:\windows\System32\LSCSHostPolicy.dll
LoadedModule[97]=C:\windows\System32\lstelemetry.dll
LoadedModule[98]=C:\windows\system32\MF.dll
LoadedModule[99]=C:\windows\System32\MFCORE.DLL
LoadedModule[100]=C:\windows\SYSTEM32\powrprof.dll
LoadedModule[101]=C:\windows\System32\ksuser.dll
LoadedModule[102]=C:\windows\system32\MFPlat.dll
LoadedModule[103]=C:\windows\System32\RTWorkQ.DLL
LoadedModule[104]=C:\Windows\System32\mfh264enc.dll
LoadedModule[105]=C:\windows\System32\coml2.dll
LoadedModule[106]=C:\windows\SYSTEM32\sxs.dll
LoadedModule[107]=C:\Program Files\Microsoft RDInfra\StackSxS\1.0.2011.05001\rdpnanoTransport.dll
LoadedModule[108]=C:\windows\System32\CompPkgSup.DLL
LoadedModule[109]=C:\Windows\System32\Windows.Media.dll
LoadedModule[110]=C:\Windows\System32\Windows.ApplicationModel.dll
LoadedModule[111]=C:\Windows\System32\twinapi.appcore.dll
LoadedModule[112]=C:\Windows\System32\AppXDeploymentClient.dll
LoadedModule[113]=C:\windows\system32\wbem\wbemprox.dll
LoadedModule[114]=C:\windows\SYSTEM32\wbemcomn.dll
LoadedModule[115]=C:\windows\system32\wbem\wbemsvc.dll
LoadedModule[116]=C:\windows\system32\wbem\fastprox.dll
LoadedModule[117]=C:\windows\SYSTEM32\amsi.dll
LoadedModule[118]=C:\ProgramData\Microsoft\Windows Defender\platform\4.18.2101.9-0\MpOav.dll
LoadedModule[119]=C:\windows\system32\version.dll
LoadedModule[120]=C:\windows\system32\WRusr.dll
LoadedModule[121]=C:\windows\System32\PSAPI.DLL
LoadedModule[122]=C:\windows\System32\MSIMG32.dll
LoadedModule[123]=C:\windows\SYSTEM32\windows.storage.dll
State[0].Key=Transport.DoneStage1
State[0].Value=1
OsInfo[0].Key=vermaj
OsInfo[0].Value=10
OsInfo[1].Key=vermin
OsInfo[1].Value=0
OsInfo[2].Key=verbld
OsInfo[2].Value=19041
OsInfo[3].Key=ubr
OsInfo[3].Value=746
OsInfo[4].Key=versp
OsInfo[4].Value=0
OsInfo[5].Key=arch
OsInfo[5].Value=9
OsInfo[6].Key=lcid
OsInfo[6].Value=1033
OsInfo[7].Key=geoid
OsInfo[7].Value=244
OsInfo[8].Key=sku
OsInfo[8].Value=175
OsInfo[9].Key=domain
OsInfo[9].Value=1
OsInfo[10].Key=prodsuite
OsInfo[10].Value=16
OsInfo[11].Key=ntprodtype
OsInfo[11].Value=3
OsInfo[12].Key=platid
OsInfo[12].Value=10
OsInfo[13].Key=sr
OsInfo[13].Value=0
OsInfo[14].Key=tmsi
OsInfo[14].Value=220975696
OsInfo[15].Key=osinsty
OsInfo[15].Value=2
OsInfo[16].Key=iever
OsInfo[16].Value=11.630.19041.0-11.0.220
OsInfo[17].Key=portos
OsInfo[17].Value=0
OsInfo[18].Key=ram
OsInfo[18].Value=57343
OsInfo[19].Key=svolsz
OsInfo[19].Value=126
OsInfo[20].Key=wimbt
OsInfo[20].Value=0
OsInfo[21].Key=blddt
OsInfo[21].Value=191206
OsInfo[22].Key=bldtm
OsInfo[22].Value=1406
OsInfo[23].Key=bldbrch
OsInfo[23].Value=vb_release
OsInfo[24].Key=bldchk
OsInfo[24].Value=0
OsInfo[25].Key=wpvermaj
OsInfo[25].Value=0
OsInfo[26].Key=wpvermin
OsInfo[26].Value=0
OsInfo[27].Key=wpbuildmaj
OsInfo[27].Value=0
OsInfo[28].Key=wpbuildmin
OsInfo[28].Value=0
OsInfo[29].Key=osver
OsInfo[29].Value=10.0.19041.746.amd64fre.vb_release.191206-1406
OsInfo[30].Key=buildflightid
OsInfo[31].Key=edition
OsInfo[31].Value=ServerRdsh
OsInfo[32].Key=ring
OsInfo[32].Value=Retail
OsInfo[33].Key=expid
OsInfo[34].Key=fconid
OsInfo[35].Key=containerid
OsInfo[36].Key=containertype
OsInfo[37].Key=edu
OsInfo[37].Value=0
FriendlyEventName=Stopped working
ConsentKey=APPCRASH
AppName=Remote Desktop Services
AppPath=C:\windows\System32\svchost.exe
NsPartner=windows
NsGroup=windows8
ApplicationIdentity=097D38C6EE88031AC50A5FF3F9857982
MetadataHash=686507704

 

I can't see anything interesting in any of the logs (that I've checked) immediately before or after the crash times. 

 

I'm a little lost as to what my next steps need to be here. I've just now loaded the reg settings as described here: Remote desktop services crashes (microsoft.com) . Hopefully a proper dump will be able to shed some light on things, but I'm not amazingly hopeful. Plus it's not great knowing that I need it to fault at least one more time.

 

Having played with Terminal Services before (right back to NT and Citrix Metaframe days), I'm immediately suspicious of client machine graphics drivers and client printers. But I've got nothing to back that up at this point.

 

The system started off very stable, and ran for months without issue. Then the odd "hiccup" started to creep in - low frequency enough that I put it down to just needing a regular reboot cadence. But we've had three outages this week alone - I feel it's getting worse. 

 

What else have I done:

  • I've run sfc /scannow - no problems found.
  • I've run DISM /Online /Cleanup-Image /RestoreHealth - no issues.
  • Run back through the entire Microsoft guide on deploying WVD, inc the Deployment Best Practices and configuration of FSLogix components. I've crossed t's and dotted i's - I can't see where I've deviated from the recommended process (not to say I haven't, but I certainly can't see it if I did)
  • I thought about setting up Insights, Alerts and Logs on the Azure side for the VM, but not sure that will give me anything over and above what I already have? The VM isn't falling over. Just that one process.
  • All users are E5 licensed. AzureADDS is working correctly, people log in with their Office365 accounts, Azure File Shares are secured with their AzureADDS credentials - it all works a treat. 
  • Users are geographically close to the Azure region (all users are in Melbourne CBD. Azure resources all in AU South East). 

 

Does anyone have any advice on where I should be looking next with this? 

 

My plan (such as it is):

  • Wait for the application to fault again, grab the minidump and see if it holds any smoking guns
  • Maybe build a second host, so that if/when the first goes down at least it only affects 50% of the staff. Bandaid solution at best, but it might help to ease the pain. 

 

Thoughts? Any and all suggestions will be graciously accepted at this point. 

 

Cheers,

Matt

13 Replies
Hi Matt,

Our WVD environment had the same problem for the past weeks.
One of the users was trying to open mp4-files. Everytime she tried that, all the users of that Session Host were disconnected, exactly like in your case. They could connect after that without any trouble.
We changed the program for MP4-files from the 'Windows default program' to a 'third party program'. After that we have had no more disconnected sessions, yet.
I hope this will help you finding the solution for your environment.
Hi PetersBrothers,

That is SUPER interesting! Thanks for posting here.

How ever did you track it down to the mp4 file? Did the user complain that every time they tried to play the file it booted them out, or were you able to track it through the logs somehow?

I'll install something like VLC player on the server tonight, and we'll see if that changes anything.

Interestingly, we've not had the problem reoccur for 3 weeks now. Either Microsoft has changed something with a patch, or the user simply gave up trying to play their mp4.

I'll post back with what I find out.

Again - many thanks your reply.

Cheers,
Matt

@Matt_Ignite We did not track it down to the mp4 file. The user told us that it happened every time she tried to play an MP4 file.

We installed VLC as well.

Closing the loop on this one with a happy ending. After working with Azure Tech Support, we upgraded the session host (Windows 10 Multisession) to the 21H1 feature release as soon as it went GA. This instantly cured the issue - it's been 2 weeks now without a single hint of any interruption and the system has been rock-solid stable.

 

We may not yet be 100% out of the woods, but 2 weeks without an interruption is a massive improvement on the previous symptoms, where we would get a mass-disconnection on average every second day, some times experiencing a number of these incidents in a single day.

 

For anyone struggling with the same issue, I'd highly recommend upgrading the session hosts ASAP.

 

Cheers,

Matt

Hi all,

Posting for anyone else who finds this thread. Unfortunately I jumped the gun with this one. While 21H1 has helped, we're averaging one Remote Desktop crash every 2 weeks now, which results in all the active sessions getting booted off into a Disconnected state.

It's better than it was, for sure, but there's still something fundamentally broken here.

Am continuing to work with Azure Tech Support on this one, but so far no real idea on the root cause or any deep troubleshooting has happened. I'll post back if I ever get to the bottom of it. Suffice to say, pretty disillusioned with both WVD and MSFT/Azure tech support.

@Matt_Ignite I too had seen this behaviour earlier in the year but it also largely disappeared after we upgraded our session host VMs. However, over the past 3 days we have had a Term Service crash (svchost.exe_TermService) on 1 session host every day. We have 5 hosts in the pool and 1 host has been affected each day. This manifests itself in all user sessions on that host being disconnected. As in your scenario, the users can connect back in a few seconds later.

 

Did you ever get to a resolution on this with MS Support?

Hi @tadhgclifford,

 

Yes, we did get a resolution in the end.

  • I went back to Microsoft and the product team did more digging, along with reviewing all the dumps that we had captured (which was a lot, because it was crashing frequently.)
  • Eventually they confirmed that it was due to a previously undiscovered bug (Bug 34382658)  in the way the RDP stack handled multiple connections. Specifically "the known issue is influencing AVD connections because the inbox stack for RDP is sharing the same process with AVD SxS stack, and they are conflicting with each other. "
  • In late September Microsoft released a patch to fix it - this one: September 30, 2021—KB5005611 (OS Builds 19041.1266, 19042.1266, and 19043.1266) Preview (microsoft.c... 
  • Since applying that patch in early October there's been no further crashes at all - it's been absolutely rock solid. Happy to report that it has resolved the issue completely. 

 

Give that a shot - I'd be really keen to see if you've got that patch installed on any of the session hosts in your farm. 

 

If it's related, we're on 21H1 with all the latest Windows Updates installed - completely bang up to date. 

 

Good luck - let me know how you go.

 

Cheers,

Matt

Hi @Matt_Ignite 

 

Fair play to you for sticking with it and getting the resolution. Top marks for persistence!

 

As of today, we have applied all outstanding Windows updates to all our session host VMs (they only needed the most recent ones) so, hopefully, that will resolve it for us also.

 

Many thanks for your work on this and also your swift response.

 

Regards

Tadhg

 

Hi @tadhgclifford 

 

Awesome work mate - fingers crossed that it fixes it for you. I'd have given anything to find an answer to it when it first started affecting us, so hopefully this will sort your machines out quickly. 

 

Let me know how you get on with it!

 

Cheers,

Matt

@Matt_Ignite Hi guys, back here as well.. 

 

Since yesterday some or all sessions got disconnected multiple times a day. All of a sudden. 
Session state: Disconnected. 

When reconnecting users sometimes came back to their 'disconnected' session, and could go on with treir work. But sometimes they came in a new session on a new host (and lost their unsaved work).

Very frustrating!

 

After reading your messages, I applied all the Windows Updates last night (we were on 20H2). All session hosts are now on 21H1. So let's hope this solves the problem...

I'll keep you posted.

@PeterBrothers @tadhgclifford - how have you guys got on over the last couple of days since updating the session hosts? Still happening, or has it got better? 

 

Cheers,

Matt

Touch wood, all has been fine since. No instances of TermService crash at all. Fingers crossed!

@tadhgclifford @Matt_Ignite Same here. Looks stable. Fingers crossed indeed!