Azure Virtual Desktop - Black Screens on logins - What we've tried so far

Brass Contributor

TLDR - Azure Virtual Desktop Black Screens.  Could be 2 Min long, could be much longer.  Tried removing stuck profiles, spun up all new VMs to see if that would fix it, finally disabled an application service that was polluting the Event logs constantly with appcrashes.  Hoping that maybe the event logs weren't able to keep up so we had a black screen while events caught up.  Grasping at straws.

 

We started getting reports of black screens when users login to one of our AVD Host Pools.  Our users are using FSLogix for profiles, but we've also seen the issue when logging via RDP with a local admin account.  We tested and saw similar results where you login, FSLogix Prompt goes by, then to Preparing Windows, then black screen. 

  • In a normal login, this black screen will last 10-20 seconds before desktop comes available and user can begin their session.
  • With this issue, we were seeing black screens that just stayed there until you forced a logout of your account.

We saw some profile issues with the VMs in the pool appearing to be stuck on a VM when it should be removed upon logoff with FSLogix and we saw some stuck local_username FSLogix profiles still in the users folder.  Instead of finding the needle in a haystack, we spun up a new group of VMs and put the others in drain mode / excluded.   

 

With the new VMs, logins from RD Client were working fine yesterday afternoon, evening and this AM.  But later in the morning, we saw some issues with users getting a black screen lasting 90 sec - 2 min before desktop loaded in.  I had it happen to me when logging in, but it seemed to go away once I tried a couple more times.  I even directly RDPd into the host that I had the 2 min black screen for me and was able to get in quickly.   So issue appears to still be showing, but not as bad.

 

We looked in event logs and saw that one particular application - the Aspen Multicase Web service was polluting the service event logs with appcrash errors every few seconds.  So we've disabled that application service on all the VMs in the pool and logins have been normal since.  We read event logs that were event 4625 (failed login) but the event said event logs couldn't keep up and needed to stop duplicate events...so we were thinking that this service was constantly writing to event logs, could the slow logins happen when the service is trying to run, failing and writing to event logs.  the logs wouldn't be able to write the login info.

 

But every other change we made things seem fine afterward for a while, but then the black screen will come back for at least 90sec - 2 min.

 

Any suggestions on things we can try / look at that could be causing this?

139 Replies

We use App Attach with Virtual Desktops and are seeing very similar issues, all of which started on Monday at approximately 2pm Central. When did this issue start for you? Your post, coupled with another below from Monday, suggest that there is an Azure-wide problem that is not specific to my environment or yours.

We are in the process of destroying and rebuilding our host pools and hosts, but it sounds like you've already gone through that exercise. We did switch to our failover environment yesterday, and it worked fine all day. But this morning, the same problem exists in that environment as well.

We've been engaged with Microsoft support since Monday and they have been absolutely no help at all.

Hoping you can post back here if/when your situation changes for the better!

For the issue of black screens on Azure Virtual Desktop, here's a suggestion you could try:

1)FSLogix Profiles: Ensure profiles are properly cleaned up after logoff. You could also clear stuck profiles from the VMs to prevent login delays.

2)GPU-Related Issues: If your VMs are GPU-enabled, try disabling or adjusting GPU settings, as these can sometimes cause black screens.

3)Event Logs: Check for applications causing event log bloat. Disabling unnecessary services may reduce log-in times.

We have been facing the exact same problems reported here, also since Monday
also, by any chance did you notice that session hosts also lost assigned ASR rules?  (also occurred for us around the same time)

We are seeing this with a couple of our clients now, Microsoft needs to get a grip of this!
hello,

you dont know how "happy" i am to find this post....
We are experiencing exactly same issue. Black screen during login to AVD session hosts.
Problem started this week - Monday.

Some technicalities :
- We have 25 hostpools. 24 are located in West Europe. One is located in India Central - and both regions affected by this.
- Vms are different sizes. We have mostly D4 but also D8, B4, B8 - problem with black screen exist on all of them
- We have FsLogix configured for multisession hostpools, and all of them are randomly experiencing this black screen. Looks like problem is not happening on Personal hostpools without FSL.

What about your pools ? Where are they located etc ?
Maybe there is some pattern here, or maybe MS is just hiding some service malfunction that happened recently...

We can also confirm that creating a brand new session host is stopping black screens for some time, like 24hr or something.

We Suspect that it may have something to do with Defender for endpoint. As soon as Defender engine will update to newest version - 1.1.24080.9 - some of the rules (ASR) are reporting OFF status.

So far we did not find any permanent solution for this. We have a lot end user incidents related with this black screen on AVD, we are shuffling users one host to another using drain mode.... its crazy....

The only time i've seen issues like this was caused by not applying Defender Exceptions, we have them all added and then some. I wonder if there was a definition update on Monday that is causing this. The first few logons are fine, but then one further user may get the black screen, the App Readiness service stalls, this then causes a cascading effect with further logons. You can sometimes free or complete logons by restarting the App Readiness service.
I have no other workaround, its just pot luck.

We have actually been through 2 of these 3
1. FSLogix Profiles - We started seeing lots of profiles stuck on VMs, both local profiles and the FSLogix local_username profiles. We went through the VMs and removed everything, then started to delete VMs and allow new ones to spin up
2. Event Logs - We had a particular application (Aspen Multicase Web Service) that was causing repeated errors in the Event logs, every few seconds. We also saw some event logs talking about needing to catch up. We disabled that service and have seen much better logins since. We see a black screen here and there, lasts 90s - 2 min and eventually logs in.

So with brand new VMs, verifying profiles are working properly and being removed after logout and disabling that service, we are seeing much better results, but still get the occasional black screen issues
We were on the phone with MS, who is playing dumb and have no answers for us. They wanted FSLogix logs, but came back with nothing. We sent them emails yesterday asking detailed questions. Have they responded? No....cause MS Support, even at Sev A is a friggin joke. Its disgusting their support is this bad now and its OK.
Yeah, this could be the culprit. We had a similar issue back in 2020. MS ended up making a patch for us specifically, which we applied and it cleared up the issues. They eventually released the patch in October 2020. The non security patch was for Windows 10, so wonder if they need a new one.

What really ticks me off is that this started after the outages and when we asked MS if the outages could have an effect on this they of course said "no, the outages wouldnt do that"

MS support is so incompetent, they didnt announce the outages the last two weeks until hours after people started complaining about it. Then when the outage was cleared, MS support called me to "work on the issue"....work on what issue? The issue was resolved by YOU. How does your support stafff not know that? I will tell you how, cause MS could care less about giving great support. They have such a hold on so many companies and they know no one is gonna leave. What are you gonna do, make all your non IT workers learn how to use Linux OS? Yeah, good luck with that. MS knows it and shows it by just not caring about support levels any longer. They dont stick to their SLAs and the support staff you do get on the phone is normally not very good. And of course MS is the BEST as asking you questions you've answered in the ticket you submitted or asking you to run processes you already ran and showed them, again. Its sickening how little they seem to give a crap about users anymore.

 

Anyone in this thread had any change in status? We continue to have the same problem, and we're now in our fourth day of a near complete outage.

 

We blew away our host pools and session hosts, then recreated them. On the new hosts, we didn't install ANYTHING (including updates) in hopes that a clean install would fix the problem, but it doesn't.

 

There is no rhyme or reason to when these failures occur. We are capturing traffic via Fiddler in hopes that we can point MS to the spot that is failing.  Our best guess is that one of the dark blue boxes from this Microsoft diagram is where the problem actually lies:  https://learn.microsoft.com/en-us/azure/virtual-desktop/media/service-architecture-resilience/servic...

 

@JPlendo 

We're having the same issue. Beginning of the week, we saw slow logons (5 to 20 minutes), mostly during the AppX-LoadPackages phase of the logon (according to the ControlUp AnalyseLogonDuration script). The situation escalated last two days, we now see black screens after a few users per AVD are able to login succesfully. Restarting appreadiness service doesn't help, sometimes disconnecting and moving the logon process to another host helps.

 

Users that the logon process stalls seem to have issues loading some AppX packages (according to their Get-AppxPacakge count). We had the same issue a few months ago, we managed to stabilize by cleaning up unwanted packages in Get-AppxProvisionedPackage -Online) We're trying right now to do the same right now, per-user.

@JPlendo 

Any movement on this, I have just had a very angry customer on who have just been hit with the black screens. AVD credibility is getting hit pretty bad at the moment.
We have an Open case on this since last week. We provided logs. We had a session with guys from fslogix team. No solution so far. Still under investigation.
Have opened a Category A with Microsoft now.

No response from Microsoft, I can see svchost.exe is hammering the CPU, and its linked with the AppXSvc service:
svchost.exe 8224 AppXSvc

It took pretty much all week, but our issue (which may be slightly different because we're using App Attach) seems to be resolved. We ended up blowing out our host pools and all session hosts and recreating them. MS also provided a hotfix related to AppX "over processing" which creates a race condition that causes a session host to become unresponsive for several minutes. Yesterday was our first day with the updated configuration and a full load, and we were problem free. I'm not sure if it was the rebuild of the pool/hosts, the hotfix patch from MS, or if MS fixed something on the backend. I kind of suspect that latter -- probably something in the global broker service if I had to guess.
Was this something that MS rep used internally to lookup hotfix/patch or in what context was it used?