Sep 19 2024 09:16 AM
TLDR - Azure Virtual Desktop Black Screens. Could be 2 Min long, could be much longer. Tried removing stuck profiles, spun up all new VMs to see if that would fix it, finally disabled an application service that was polluting the Event logs constantly with appcrashes. Hoping that maybe the event logs weren't able to keep up so we had a black screen while events caught up. Grasping at straws.
We started getting reports of black screens when users login to one of our AVD Host Pools. Our users are using FSLogix for profiles, but we've also seen the issue when logging via RDP with a local admin account. We tested and saw similar results where you login, FSLogix Prompt goes by, then to Preparing Windows, then black screen.
We saw some profile issues with the VMs in the pool appearing to be stuck on a VM when it should be removed upon logoff with FSLogix and we saw some stuck local_username FSLogix profiles still in the users folder. Instead of finding the needle in a haystack, we spun up a new group of VMs and put the others in drain mode / excluded.
With the new VMs, logins from RD Client were working fine yesterday afternoon, evening and this AM. But later in the morning, we saw some issues with users getting a black screen lasting 90 sec - 2 min before desktop loaded in. I had it happen to me when logging in, but it seemed to go away once I tried a couple more times. I even directly RDPd into the host that I had the 2 min black screen for me and was able to get in quickly. So issue appears to still be showing, but not as bad.
We looked in event logs and saw that one particular application - the Aspen Multicase Web service was polluting the service event logs with appcrash errors every few seconds. So we've disabled that application service on all the VMs in the pool and logins have been normal since. We read event logs that were event 4625 (failed login) but the event said event logs couldn't keep up and needed to stop duplicate events...so we were thinking that this service was constantly writing to event logs, could the slow logins happen when the service is trying to run, failing and writing to event logs. the logs wouldn't be able to write the login info.
But every other change we made things seem fine afterward for a while, but then the black screen will come back for at least 90sec - 2 min.
Any suggestions on things we can try / look at that could be causing this?
Sep 19 2024 11:32 AM - edited Sep 19 2024 11:32 AM
We use App Attach with Virtual Desktops and are seeing very similar issues, all of which started on Monday at approximately 2pm Central. When did this issue start for you? Your post, coupled with another below from Monday, suggest that there is an Azure-wide problem that is not specific to my environment or yours.
We are in the process of destroying and rebuilding our host pools and hosts, but it sounds like you've already gone through that exercise. We did switch to our failover environment yesterday, and it worked fine all day. But this morning, the same problem exists in that environment as well.
We've been engaged with Microsoft support since Monday and they have been absolutely no help at all.
Hoping you can post back here if/when your situation changes for the better!
Sep 19 2024 01:59 PM
Sep 19 2024 08:46 PM - edited Sep 19 2024 10:17 PM
We have been facing the exact same problems reported here, also since Monday
also, by any chance did you notice that session hosts also lost assigned ASR rules? (also occurred for us around the same time)
Sep 19 2024 10:50 PM
Sep 19 2024 11:05 PM
Sep 20 2024 01:04 AM - edited Sep 20 2024 01:05 AM
The only time i've seen issues like this was caused by not applying Defender Exceptions, we have them all added and then some. I wonder if there was a definition update on Monday that is causing this. The first few logons are fine, but then one further user may get the black screen, the App Readiness service stalls, this then causes a cascading effect with further logons. You can sometimes free or complete logons by restarting the App Readiness service.
I have no other workaround, its just pot luck.
Sep 20 2024 06:29 AM
Sep 20 2024 06:30 AM
Sep 20 2024 06:36 AM
Sep 20 2024 10:07 AM
Anyone in this thread had any change in status? We continue to have the same problem, and we're now in our fourth day of a near complete outage.
We blew away our host pools and session hosts, then recreated them. On the new hosts, we didn't install ANYTHING (including updates) in hopes that a clean install would fix the problem, but it doesn't.
There is no rhyme or reason to when these failures occur. We are capturing traffic via Fiddler in hopes that we can point MS to the spot that is failing. Our best guess is that one of the dark blue boxes from this Microsoft diagram is where the problem actually lies: https://learn.microsoft.com/en-us/azure/virtual-desktop/media/service-architecture-resilience/servic...
Sep 20 2024 11:30 AM - edited Sep 20 2024 11:35 AM
We're having the same issue. Beginning of the week, we saw slow logons (5 to 20 minutes), mostly during the AppX-LoadPackages phase of the logon (according to the ControlUp AnalyseLogonDuration script). The situation escalated last two days, we now see black screens after a few users per AVD are able to login succesfully. Restarting appreadiness service doesn't help, sometimes disconnecting and moving the logon process to another host helps.
Users that the logon process stalls seem to have issues loading some AppX packages (according to their Get-AppxPacakge count). We had the same issue a few months ago, we managed to stabilize by cleaning up unwanted packages in Get-AppxProvisionedPackage -Online) We're trying right now to do the same right now, per-user.
@JPlendo
Sep 24 2024 02:51 AM
Sep 24 2024 03:53 AM
Sep 24 2024 05:03 AM
Sep 24 2024 06:03 AM - edited Sep 24 2024 06:25 AM
No response from Microsoft, I can see svchost.exe is hammering the CPU, and its linked with the AppXSvc service:
svchost.exe 8224 AppXSvc
Sep 24 2024 06:06 AM
Sep 24 2024 06:08 AM
Sep 24 2024 10:45 AM