Forum Discussion
Azure Virtual Desktop - Black Screens on logins - What we've tried so far
TLDR - Azure Virtual Desktop Black Screens. Could be 2 Min long, could be much longer. Tried removing stuck profiles, spun up all new VMs to see if that would fix it, finally disabled an application service that was polluting the Event logs constantly with appcrashes. Hoping that maybe the event logs weren't able to keep up so we had a black screen while events caught up. Grasping at straws.
We started getting reports of black screens when users login to one of our AVD Host Pools. Our users are using FSLogix for profiles, but we've also seen the issue when logging via RDP with a local admin account. We tested and saw similar results where you login, FSLogix Prompt goes by, then to Preparing Windows, then black screen.
- In a normal login, this black screen will last 10-20 seconds before desktop comes available and user can begin their session.
- With this issue, we were seeing black screens that just stayed there until you forced a logout of your account.
We saw some profile issues with the VMs in the pool appearing to be stuck on a VM when it should be removed upon logoff with FSLogix and we saw some stuck local_username FSLogix profiles still in the users folder. Instead of finding the needle in a haystack, we spun up a new group of VMs and put the others in drain mode / excluded.
With the new VMs, logins from RD Client were working fine yesterday afternoon, evening and this AM. But later in the morning, we saw some issues with users getting a black screen lasting 90 sec - 2 min before desktop loaded in. I had it happen to me when logging in, but it seemed to go away once I tried a couple more times. I even directly RDPd into the host that I had the 2 min black screen for me and was able to get in quickly. So issue appears to still be showing, but not as bad.
We looked in event logs and saw that one particular application - the Aspen Multicase Web service was polluting the service event logs with appcrash errors every few seconds. So we've disabled that application service on all the VMs in the pool and logins have been normal since. We read event logs that were event 4625 (failed login) but the event said event logs couldn't keep up and needed to stop duplicate events...so we were thinking that this service was constantly writing to event logs, could the slow logins happen when the service is trying to run, failing and writing to event logs. the logs wouldn't be able to write the login info.
But every other change we made things seem fine afterward for a while, but then the black screen will come back for at least 90sec - 2 min.
Any suggestions on things we can try / look at that could be causing this?
281 Replies
- parentcharlesCopper Contributor
On my end, the KIR has fixed the issue with black screen, but hasn't fixed the issue with Office 365 Authentication and I'm worried it might have corrupted the user profiles.
It seems to work from time to time, but OneDrive is completely broken.
I'm deploying Windows 11 now to see if it fixes the issue.
I haven't got ANY news from Microsoft yet and we kind of lost hope of AVD solution as we just had the issue with Memory Leak at the start of the year, and it's always from Microsoft patches, so to be honnest, we don't know what we can do better to make the environment more stable. It's infuriating.
- parentcharlesCopper Contributor
OneDrive is still completely broken after creating a new hostpool in Windows 11... so it seems there are remnants of the issue in the user profile. I'm looking into fixing the OneDrive Sync
- Christian_RoyCopper ContributorDuring our troubleshooting of the login with the following script https://www.controlup.com/script-library-posts/analyze-logon-duration/ , we noticed that the Fslogix ShellStart has gone up compared to before so it lead us to the InstallAppxPackages settings inside Fslogix. Once we disable that feature, login delays we're gone and back to normal. It's a setting that is enable by default in Fslogix. Beware that this may affect applications that were installed from the Microsoft Store by the user.
- dit-chrisBrass ContributorLooking at a few of our sessions with that ControlUp script they are showing delay on the following package loading
AppX packages loaded during logon
---------------------------------
Package Duration (s) Start Time End Time
------- ------------ ---------- --------
Microsoft.Windows.CloudExperienceHost_cw5n1h2txyewy 157.188 14:50:04.5 14:52:41.6 - 247ServerAdminCopper Contributor
Christian_Roy Can you be more specific for you resolution?
Was this a GPO change or Reg change?
**Update**
Its a registry change. Trying it now.
Computer\HKEY_LOCAL_MACHINE\SOFTWARE\FSLogix\Profiles
New Dword
Name InstallAppxPackages
Value0- Christian_RoyCopper ContributorYep that the key to change
Computer\HKEY_LOCAL_MACHINE\SOFTWARE\FSLogix\Profiles
New Dword
Name InstallAppxPackages
Value0
- JoeriK1285Copper Contributor
It seems that adding more AVD session hosts has stabilized the situation in our case.
It seems that the “appx stack” starts crashing when the stack is under load/stress. This happens during logon storm (users logging on at the same time in the morning) and also when the host has more than +/- 10 users working on it.
To mitigate this, we make sure we have plenty of AVD session hosts so all of them are online during the logon storm + we make sure that a AVD session host has max. 10 users connected to it.We also applied the KIR (https://download.microsoft.com/download/a7ee9dc7-dfc9-498d-808c-86f2c046a55d/Windows%2010 20H2, 21H1, 21H2 and 22H2 KB5040525 241001_01051 Known Issue Rollback.msi) in the master image and redeployed all our AVD session hosts from that new master image version.
Hopefully MSFT comes up with the fix soon so we can go back to the previous situation as this mitigation has some impact on the costs.- JoeriK1465Copper Contributor
seems to be solved in the cumulative update for November. Preview can already be installed since 22/10/24, see https://support.microsoft.com/en-us/topic/october-22-2024-kb5045594-os-build-19045-5073-preview-f307a4b0-f62d-4c28-9062-44207aea55c3
- marve435Copper Contributor
JoeriK1465
We applied:
https://support.microsoft.com/en-us/topic/october-22-2024-kb5045594-os-build-19045-5073-preview-f307a4b0-f62d-4c28-9062-44207aea55c3
the svchost.exe_appxsvc is not crashing anymore - but!
this entry is still in eventviewer:
Failure to load the application settings for package Microsoft.AAD.BrokerPlugin_cw5n1h2txyewy. Error Code: -2147024893
Meaning Teams/outlook/onedrive not doing so well
- djordan1910Copper ContributorAnother thing to note... I dont know where everybody has their session clients timeouts at. We were at 1 hour no activity - logout, 15 minutes no activity after disconnect - logout. Even with the KIR in place, we noticed 2 or 3 daily (~200 users) who would raise a ticket about a black screen or inability to login to Outlook thick client. This morning we changed to a 2 hour no activity logout and 1 hour after disconnect logout and we havent had a complaint all day.. I dont know if this is even relevant, just relaying things as we try them.
- shaaricCopper ContributorWe have only experienced this issue on our Win 10 AVDs, Win 11's have been ok so far. Rolled back the Sept update and Win 10 AVDs have been behaving normally. Watching for a confirmed fix, not KIR. Thankful for the forums like these to find that others are experiencing.
- henrikmc2Copper Contributor
JPlendoThe buggy component is the App Readiness service, and this is not the first time we seem black screens and the same service responsible. Kill that service and you fly in on the desktop, but then SSO stops working so causes other problems. They need to fix it.
But its troublesome that they dont post any public articles on this.
Fix will be KB5045594
To help on a faster logon that dosent wait for App Readiness, the below reg keys works (But dosent fix the appsvc service crash)
Windows Registry Editor Version 5.00
; --------------------------------------------------------------------------------------
; #Reason: Fix App Readiness with timeout; --------------------------------------------------------------------------------------
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer]
"AppReadinessPreShellTimeoutMs"=dword:00060000
"AppReadinessGlobalTimeoutMs"=dword:00120000[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Terminal Server]
"fRunAppReadiness"=dword:00000000
; --------------------------------------------------------------------------------------- KristofHBrass ContributorThis one KB5045594 is now released in preview/beta without any mention of the issue: https://blogs.windows.com/windows-insider/2024/10/14/releasing-windows-10-build-19045-5070-to-beta-and-release-preview-channels/
- JPlendoBrass Contributor
henrikmc2You mentioned KB5045594 being released as a fix. Is that only for Windows 10? Do you know if they will release anything for Windows 11?
What does that reg fix you listed do if its not helping the appreadiness service crashing? Does it allow for logins even if the service has hung?
- dit-chrisBrass ContributorHi JPlendo
So as I understand it that just says if the AppReadiness has run for 60sec at PreShell just kill it, if it runs for 120 seconds in total then again just kill it, in theory that should cut that black screen delay by simply stopping the running of the hanging process... but that will then cause issue with AppX packages deploying and SSO etc as a side effect. Another fix with the same effect is probably to just set the AppReadiness service to Disabled in services... yes you'll get much quicker login times probably but a load of stuff won't work - which might include the start menu as there is an AppX packages for that is appears!
- Mario_ZVGLCopper ContributorWe are also affected with 2 hostpools in west europe. Microsoft A lvl Ticket is opened and after looking up this problem with microsoft (Logs, AV Exclusions and so on..), the workaround of this is:
Method 1:
Uninstall Windows Update KB5043064 and KB5041580, by going to Settings-->Windows Update-->Update History-->Uninstall Updates-->Select the problematic update and click on Uninstall-->Reboot the OS
Method 2:
If the customer is using FsLogix, request the customer set the following registry key SOFTWARE\FSLogix\Profiles\DeleteProfileOnLogoff key to 0. This is to avoid temporarily FSLogix from deleting the user profile at logoff. Although this is not scable option, but it should temporally mitigate the issue.
Method 3:
The Produce team is currently working on a KIR Group Policy to roll back the code regression. Once the KIR is available, we will provide an update via email.
Lets see.. - On this week we roll-back to KB5039211 on my previous golden image
Now - fly is normal (48h). ~ 150 - 200 active users per day
1 issue with Outlook connection today Microsoft.AAD.BrokerPlugin (-21470009096)- JPlendoBrass ContributorWindows 10 or Windows 11?
What did you to rollback?- Used the previous image version and updated FSL to FSLogix 2210 hotfix 4 (2.9.8884.27471).
It's too early to rejoice, since last time on September the problem returned after three days.
- I don't like the solution since I have a rule to keep hosts updated with manual process until the monthly update.
-Another thing I noticed is that before updating the prod host pools is always tested on a test host pool with 15% of users.
- rakim71Copper Contributor
Applying the KIR has not worked for my org. We have tried with the Sept CU update both installed and uninstalled.
I'd also like to share some odd behavior.
Prior to applying the KIR, we saw crashes with faulting application name: svchost.exe_appXSvc, faulting module name: appxdeploymentserver.dll
After applying the KIR, we now see a crash with faulting application name: svchost.exe, faulting module name: aphostservice.dll.
Additionally we are seeing the following:
- When the issue occurs, users who are already logged onto a host see app instability, for example Outlook or OneDrive crashes.
- If a user logs on for the first time whilst the issue is occurring (i.e. they do not have an existing FSLogix profile), SSO will not work for OneDrive, and 365 shared licensing will be broken.
This has been a complete nightmare, not only can users not logon but existing sessions are affected. My org is reevaluating its use of AVD.
- KevHalIron Contributor
It is about time Microsoft publicly acknowledge this issue. I'm watching more and most host pools get this issue on a daily bases.
The KIR is hit and miss. I think the appx issues causes some damage to AAD.brokertoken which is turn breaks authentication to office/teams etc. This is host specific, resetting profiles does not resolve the issue. I have had to rebuild hosts from an older golden image and block updates completely.
Some things really need to happen:
A:) Pull the September update, or
B:) Public OOB hotfix that includes the KIR. Or fixes the issue with appx (End of October is not good)
C:) Patch Tuesday tomorrow (8th Oct), I've heard the KIR will get overwritten by this update, which will still include the Appx issue, so we may be back to square one.
D:) At least some kind of public statement that this is being looked at, several Microsoft support engineers have been giving different answers.- dit-chrisBrass ContributorKevHal I agree the KIR doesn't really seem to work and some of the mitigation Microsoft have applied have basically wrecked some of our hosts - to the extent that we have gone into this morning with a half deployed new host pool running a minimum viable product app stack on a clean image - although I do wonder if that is much better as its still got the last 3 months patches on along with the KIR deployed. My thinking is that even having applied the KIR and/or rolled back that September update that the AppX deployment state had got totally screwed up with stuff previously half deployed.
A) probably can't happen as you also need to everything back to July (if not the late June preview) - that is what the KIR targets back to when you look at the KB number
B) agreed
C) my understand is it shouldn't due to the way the KIR works and the fact it relates to a KB from July
D) that is what really irks me... not getting a straight answer, ones even told me there was no known issue because it wasn't on the Windows health dashboard!
- Robert_HurdBrass ContributorSo we started having Defender Network Protection and Defender Exclusion issues about 3 weeks ago with our Windows 10/11 AVD pools managed with Intune. Opened ticket with Microsoft and have been working on it ever since. The issue is related to Antimalware Client Version: 4.18.24080.9, if you revert this to an earlier version it starts working again. Revert with this command. run
"C:\programdata\microsoft\Windows Defender\Platform\4.18.24080.9-0\MpCmdRun.exe" -revertplatform
Next issue, was black screens on login and Outlook client issues on AVD host pools started yesterday. I was able to install the proposed revert patch and that seems to be solved now.
I have shared this info with MS and they are working with the Product Group to get the Defender issue patched.
Hope this helps some! - paulcruwysCopper Contributor
The KB removal worked for us in terms of login speed was much faster and no black screen.
Anyone else had an issue with Outlook starting and just hanging showing loading profile?
I tested after the KB removal of one user and it opened fine but since they are all logged in the Outlooks are all frozen so noone can use now.
Have attempted to remove local profiles via C:\Users\ and rename the folder to say XXX to force a rebuild on the next login but nothing worked. Users are currently on webmail 365 for now.
- djordan1910Copper Contributoryes, read up a few responses from the past two days. Outlook is still wonky even after removing the KB or doing the KIR. Its not an issue for everyone, but we see it in about 1 of 20 logins to the VDIs. We are simply kicking them from the host and they reconnect to a different one and the problem goes away.
- paulcruwysCopper ContributorIs it host specific then? I mean if that use comes back to the same (faulty) host - do they suffer the same issue? Mine is affecting approx 80% of users not opening outlook right now.