Forum Discussion
Azure Virtual Desktop - Black Screens on logins - What we've tried so far
dit-chris We have applied the KIR last night on session hosts (make sure to restart the VMs afterwards!) and monitored user feedback today. Issue seems to be resolved. We opened a Microsoft support case last Friday with no engineer assigned yet. Thank you to everyone contributing to this post, we would be still struggling big time without the KIR.
HelenaKohler we recieved this fix (KIR) today from MS. The only doubt i have is how to set the state of this GPO.
MS guy wrote to set it to ENABLE, but based on what you guys write here it should be set to DISABLED.... so what is the correct value ?
- dit-chrisOct 02, 2024Brass Contributor
Yep certainly ack! Trust me that's not through choice I'm considering doing that, it unsurprisingly is not something we have ever tested, but equally having ended up with most of the other available hosts in drain mode earlier its a case of which is the lesser of the possible evils... scale out the seemingly "reliable" machines in case we need it in anger and these issue persist again tomorrow, or end up with hosts at double their designed user density and being completely CPU and memory bound which is where we ended up today or be stuck 75 to 80% of users hitting black screens and having to resort to Outlook and Teams in a browser (and I don't fancy having to explain to anyone else how OWA works tomorrow... I did that enough times today🤦:male_sign:). Even if we only get like double the performance with triple the CPU and RAM capacity then that that has to be better than the same performance but with double the users competing for it.
If we believe them one user reported a 4 hours login delay stuck on a black screen... although to be fair I am a little dubious as they had tried to apparently login, got a black screen, left the machine sat in the session to go on a 3 hour seminar in a meeting room followed by lunch and reckoned it was still stuck as they left it - and to be fair it may have been as the fix was Ctrl-Alt-End and telling it to log off.
Scaling wise we found the opposite; we were using smaller VMs, some of the customer LOB apps were bursty and CPU intensive at time and RAM heavy (generating final documents - oh and opening certain national newspaper's website in breaks that I wasn't convinced weren't using javascript crypto-mining scripts, the sort of website that can cause an iphone to get red hot, but I think were just autoplaying content and ads) and we found that 22-27 users (note that assumes 100% of people are working which of course they aren't with part-time/flexible working, study, holiday, sickness etc on any given day) on a 32 core machine worked better than 10 on a 16 core as it spread the peaks and troughs more effectively across a larger user pool so I suspect it depends on you specific application stack and what its particular resource utilisation profile looks like, similarly we were able to go from E to D series machines and trade off reducing overprovisioned available RAM to burst into for like 60-90 seconds (but like 8GB at time) for some net extra CPU cores in the pool.
Crossing my fingers for a better day tomorrow - djordan1910Oct 02, 2024Copper ContributorAck! Pushing them that large wont do a thing for you. We went down that path too and decided to use the 24 vCPU version at 128gb ram. Did nothing to pickup performance. This is documented in quite a few places. Instead we instantiated more smaller VMs across the hostpools (limited users to 20) and got way better performance. Aside from that, our black screen issues have disappeared HOWEVER Outlook (thick clients) have been problematic on about a dozen users profiles today so far. Simply stops on Outlook at 'loading profile'. Cannot even delete the profile the normal ways you would on a standard desktop/laptop. Instead we're just trashing the users FSLogix profile and letting it rebuild. Fingers crossed at 200+ users it was a pretty quiet day.
- dit-chrisOct 02, 2024Brass Contributor
djordan1910 thanks for confirming. My plan tonight is to look to push the one "fairly stable" 32 core VM to 96 cores and 384GB RAM... talk about having all your eggs in one basket but we ended up with nearly 50 users on a D32ads_v5 earlier normally scaled for like 25 and the with machine insights showing <3GB free of 128GB RAM!
I'm currently waiting on Microsoft to pickup the escalated severity A case that apparently 5 hours ago was sat with someone from the "Critical Situation Management and Escalation Team" on why these mitigations don't seem to be working, but at the present rate if ramping that to a single monster machine gets through to the weekend then that will buy time to sort out migrating the data from 4TB of FSLogix profiles into clean new ones.
- djordan1910Oct 02, 2024Copper ContributorWe're completely deleting the profile and letting it rebuild on login. I dont have enough users with this issue (only 3 so far out of 200+) and enough time while they bounce around between hosts throughout the normal course of the day yet to understand if that is the fix for the outlook and occasional black screen issues. I can update later today after the rebuilt profiles have more time on them.
- dit-chrisOct 02, 2024Brass Contributor
djordan1910 when you say "For those, we're rebuilding the desktop profile in FSLogix and then they're fine." rebuilding in what way, blowing just the Outlook profile away or blowing the whole FSLogix virtual disk and starting from scratch?
I have to say I was wondering is some of the issue we are seeing still could be broken AppX deployment states in either the machine records (as I think the machine tracks the SIDs for all previous users not just the current ones) or in the FSLogix profile disk somewhere from previous logins as there seem to be not a lot of consistency even on hosts with issues.
- djordan1910Oct 02, 2024Copper ContributorSo here's my 4 hour morning update with 200+ users logged in on our VDIs after doing the KIR.
SO FAR, NO MAJOR ISSUES - however, we've had 3 or 4 random black screens and 2 or 3 complaints about Outlook desktop app stuck on loading profile. We've had 2 that, even once moved to a different host in the pool, still had Outlook profile loading issues. For those, we're rebuilding the desktop profile in FSLogix and then they're fine. This could be oddities/barnacles in their profiles from the black screen mess yesterday? Other than that, we've not seen anything funky. Checking logs we're not seeing anything out of the ordinary. Fingers crossed. I'll update again later today. - paulosilva_PIOct 02, 2024Copper ContributorCorrect, that was the update that caused all the issues, these were more significant following September patching and therefore they've released this KIR to remediate it for now. you can always wait for October patching windows for a fix.
- dit-chrisOct 02, 2024Brass Contributorand the KB number referenced also perhaps looks wrong in the title as that looks like something from back in June?
- paulosilva_PIOct 02, 2024Copper Contributor
It's DISABLED, the MS engineer that sent me this also said to change it to enabled, BUT it's not.. if you read the GPO...