Forum Discussion
Azure Virtual Desktop - Black Screens on logins - What we've tried so far
TLDR - Azure Virtual Desktop Black Screens. Could be 2 Min long, could be much longer. Tried removing stuck profiles, spun up all new VMs to see if that would fix it, finally disabled an application service that was polluting the Event logs constantly with appcrashes. Hoping that maybe the event logs weren't able to keep up so we had a black screen while events caught up. Grasping at straws.
We started getting reports of black screens when users login to one of our AVD Host Pools. Our users are using FSLogix for profiles, but we've also seen the issue when logging via RDP with a local admin account. We tested and saw similar results where you login, FSLogix Prompt goes by, then to Preparing Windows, then black screen.
- In a normal login, this black screen will last 10-20 seconds before desktop comes available and user can begin their session.
- With this issue, we were seeing black screens that just stayed there until you forced a logout of your account.
We saw some profile issues with the VMs in the pool appearing to be stuck on a VM when it should be removed upon logoff with FSLogix and we saw some stuck local_username FSLogix profiles still in the users folder. Instead of finding the needle in a haystack, we spun up a new group of VMs and put the others in drain mode / excluded.
With the new VMs, logins from RD Client were working fine yesterday afternoon, evening and this AM. But later in the morning, we saw some issues with users getting a black screen lasting 90 sec - 2 min before desktop loaded in. I had it happen to me when logging in, but it seemed to go away once I tried a couple more times. I even directly RDPd into the host that I had the 2 min black screen for me and was able to get in quickly. So issue appears to still be showing, but not as bad.
We looked in event logs and saw that one particular application - the Aspen Multicase Web service was polluting the service event logs with appcrash errors every few seconds. So we've disabled that application service on all the VMs in the pool and logins have been normal since. We read event logs that were event 4625 (failed login) but the event said event logs couldn't keep up and needed to stop duplicate events...so we were thinking that this service was constantly writing to event logs, could the slow logins happen when the service is trying to run, failing and writing to event logs. the logs wouldn't be able to write the login info.
But every other change we made things seem fine afterward for a while, but then the black screen will come back for at least 90sec - 2 min.
Any suggestions on things we can try / look at that could be causing this?
281 Replies
- JacobLoweCopper Contributor
JPlendo
We applied the KIR as mentioned previously in this thread, this resolved the black screen issue but resulted in Outlook/OneDrive failing to load for 20-30 users (which was not occurring prior).
I manually uninstalled KB5043064 as recommended https://www.linkedin.com/posts/jagmeet-singh-69061b23a_avd-knownissues-bug-activity-7246980182027206656-r7Ns?utm_source=share&utm_medium=member_desktop on two of the session hosts this morning, both the black screen and outlook issues are now gone for these servers. I'll repeat the same for the entire farm and update with my results tomorrow. It's looking promising for far!
OS: Microsoft Windows 10 Enterprise for Virtual DesktopsBuild: 10.0.19045
- djordan1910Copper Contributor
JacobLowe IF people have done the KIR, KB5043064 isn't even visible in Windows update to uninstall. They'll need to re-enable it, KB5043064 becomes visible in windows update, THEN uninstall KB5043064.
- dit-chrisBrass Contributor
Interesting, I could still see KB5043064 listed having done the KIR in Programs & Feature > Installed Updates. Having uninstalled it it seem to have been replaced with KB5041580 in that list however which looks like August's patch Tuesday update which I guess figures. So two hosts absolutely fine, one still look to have the black screen and now AADbroker Outlook/Teams/OneDrive auth issues and one that completely went berserk at 9am and even became completely unresponsive to mouse click in the console session and had a bunch of session stuck in a pending state - yesterday that one was fine (before uninstalling that KB and rebooting!).
I may have jinxed things however by saying to a colleague who came in at 9am that thing were looking ok at that point and we hadn't seen any issues or had any reports of black screens or Outlook issues.
- Oliver_KrageBrass ContributorSo uninstalling this update: KB5043064 resolved the issues alone?
- DanCubedCopper Contributor
I uninstalled the update from the golden image (W10) and reimaged the host pool. Everything seems a lot better at the moment.
- dacid671Copper ContributorTo count individual values, use COUNTIF in spreadsheets for specific counts, or UNIQUE with COUNTA for unique values. In Python, you can use collections. Counter or convert your list to a set and apply len(). In SQL, use COUNT with GROUP BY. Let me know if you need more help!
- paulosilva_PICopper Contributor
Microsoft has advised that the fix will be released in the October patch, meanwhile a KIR (Known Issue Rollback) has been release to fix this Appx Crash issue.
Please https://download.microsoft.com/download/a7ee9dc7-dfc9-498d-808c-86f2c046a55d/Windows%2010 20H2, 21H1, 21H2 and 22H2 KB5040525 241001_01051 Known Issue Rollback.msi this KIR and follow instructions https://learn.microsoft.com/en-us/troubleshoot/windows-client/group-policy/use-group-policy-to-deploy-known-issue-rollback
This will install a new computer Local Group Policy which looks like this,
...which then need to be set as disabled, please reboot and you'll see that the Appx Crash will stop.
This has been the solution from Microsoft to deal with the issue until next windows patch release.
Give it a try and let me know if it works.
- AnAverageHumanCopper ContributorWe have a couple hundred Win10 multisession AVD hosts and ~1000 users daily. Black screens were relatively intermittent compared to some of the reports here in this thread but our support team did have to deal with the few that appeared.
Tuesday this week, updated the VM we use for creating base images and included the KIR via local policy.
Rebuilt all shared host pools from updated image Tuesday evening.
Reports Wednesday were there were zero black screen tickets or calls.
Will continue to be patient for the actual fix but this mitigation seems to have made a positive impact.
Thanks for the links to KIR download and instructions paulosilva_PI ! - JPlendoBrass ContributorSorry for delay in replying....does that KIR work with Windows 11? I see it says Windows 10 in the download.
- chrissmith585Copper ContributorThank you, we've been suffering from these issues for WEEKS with no clue what was the root cause.
We've applied the KIR and we will see how our Monday morning rush hour goes!
- borisg1Copper ContributorJust posting to add to other comments. We're pretty desperate to get a patch for this now, It's been killing us for two weeks. It's definitely the same issue as described; event logs are flooded with AppX errors, black screens across our AVD pools, all aligning with the timelines described. What a mess.
- KevHalIron Contributor
Indeed, the lack of updates or awareness from Microsoft to the issue has shown a complete disconnect between us (who take the flack from customers) and Microsoft. There is no clear feedback mechanism to these issues when we need it.
The whole saga has been going on far too long now and has left a sour taste.- djordan1910Copper ContributorSo, we are having the same problem with ours as well. I cannot find a published KB or fix for this.
For a quick workaround, what we HAVE found, is that if you RDP to the host, open task manager, find the user with the black screen in the 'details' tab. Look for their explorer.exe process, right click on it and choose 'analyze wait chain'. You'll notice there is an associated svchost.exe process. Hit the checkbox and kill just that svchost.exe process and their desktop will pop right up.
- Casey_GeeCopper ContributorWe're also noticing the same issue, timeline matches - 4 x Windows 10 session hosts - it's random which host is impacted throughout the day - noticing AppX errors in the event logs.
- I want to clarify if Windows 11 has the same problem? Since I am thinking about switching to it as a solution, however, it is labor-intensive since it is a custom image
- tristian1250Copper Contributor
- Friends, I the same problem.
No one has changed the environment since July.
-----------------------------------------------------
These problems started two weeks ago.
We did the following:
1. Updated FSLogix 2210 hotfix 2 (2.9.8612.60056) to FSLogix 2210 hotfix 4 (2.9.8884.27471) - did not solve the problem
1.1 Checked all the defender exceptions - did not solve the problem
2. Sent the logs to Microsoft - did not solve the problem. Now this is an open case (recommendation from MS)
3. Updated VM to the latest OS Win10 - did not solve the problem (recommendation from MS)
4. Removed all VMs from the host pool and added them again - lasted a week, yesterday saw the same problem on one of the VMs 😞
The problem is random, rare 5 users per day (on the morning when users login) from 180 users.
We play with drain mode and force logoff.
We know the solution? I am thinking about making a script to add new VMs to the pool every few days.- marve435Copper Contributor
We got info that fix for this will be kb5045594 that arrives 3rd oct) or wait for October monthly updates
- tilikumtimBrass Contributor
Same issues here with a Windows 10 host pool. Replaced with a new Windows 11 host pool (same VM SKUs) and no issues since. Far from ideal 'solution' though, but I have found Windows 11 AVD hosts to be a lot more stable for our customers, very few issues reported from our customers using Windows 11 host pools compared to Windows 10 host pools.
Edit: To clarify, none of my hosts are enrolled in Intune. All policies set via GPO.
- KevHalIron ContributorSpent the night with Microsoft, finally got the hotfix off them, applied it and are error free. They are adamant in not using it in production but I don't think we have a choice, this has been a damaging few weeks for the credibility and reliability of AVD with a lot of our customers.
- tilikumtimBrass ContributorWhat is the hotfix for? Defender? Are the Defender settings for your hosts getting applied via Intune or GPO?
- VOatMH1265Copper ContributorHeads up as there is potential issue with defender platform update causing exclusion not to apply, and that seems to be isolated to windows 11 multi-session managed via Intune. It does potentially impact ASR but reverting the platform version seems to resolve it. All of our AVD win11 hosts are hit by this other thing now...
https://www.reddit.com/r/Intune/comments/1fp1xa0/latest_ms_defender_platform_update_broke/- tilikumtimBrass ContributorThanks for the heads up. Another dodgy update by MS! I assume this broken update has been released to the broad update channel? I would forgive MS somewhat if the broken update was only released to the pre-release or preview channel, but not the broad channel.
It does feel like the issue I was having on my old Win10 host pool with certain apps not running, black screens, lock ups etc, but I have no hosts enrolled in Intune, AVD can be bad enough without throwing Intune into the mix! All policies for my hosts (Win10 and Win11 host pools), including Defender settings, are set via GPO.
- KevHalIron ContributorI rebuilt the host pool with new session hosts, thought everything was fine, then got reports of black screen again, my heart sank. Spent so much work rebuilding them and i'm back to square one. Still working with Microsoft but i'm starting to lose it now..
- KevHalIron ContributorAny movement on this, I have just had a very angry customer on who have just been hit with the black screens. AVD credibility is getting hit pretty bad at the moment.
- We have an Open case on this since last week. We provided logs. We had a session with guys from fslogix team. No solution so far. Still under investigation.
- KevHalIron Contributor
No response from Microsoft, I can see svchost.exe is hammering the CPU, and its linked with the AppXSvc service:
svchost.exe 8224 AppXSvc