Forum Discussion
Azure AD Endpoint Manager User Profile Corruption: Black Screen Flashing Taskbar Explorer Crash Loop
We are in the midst of a Azure/Endpoint Manager (Intune) Migration. 300+ Endpoints and are running into deployment nightmare:
We are experiencing a very odd, completely random issue when a previously Synced Hybrid Azure AD User logs into their endpoint (which was previously working without issue for weeks/months) and then suddenly fails to load.
This issue only seems to occur when NEW endpoints are added to the Azure AD tenant/domain.
We know the issue is about to happen when you receive a call from an end-user stating their previously working credentials are "no longer working".
When the the user attempts to login via "other user"; The login will proceed, and the user will login to a black desktop/screen and flashing taskbar.
Windows Task Manager is not responsive; Safe-mode options will not produce a better end result.
Upon reviewing the logs you will see "explorer.exe" crash loop prompting urtcbase.dll.
- Azure AD homed user accounts and local user accounts are able to login without issue into the endpoint.
- The issue is only specific to Hybrid Azure AD User Profiles (on-premise cached/home' d accounts). I'm thinking it has to do with a conflict of the on-premise SAM Account name. I'm not sure why adding new endpoints to the tenant causes the issue.
This particular issue is happening across all different makes, models, and Window Image variations. The issue is specific to only Azure AD Profiles that attempt to login to the endpoint.
Precursors:
Incorrect password prompt. Requires uses to select "other user"
After selecting other user, user profile experiences delayed "Welcome"
Black screen appears with flashing taskbar, rending the profile useless
If we attempt a Wipe/Restore the issue will randomly reoccur on another workstation.
I believe the issue is specific in the way Windows try to load/create the profile for Azure AD users. I'm not sure if AutoPilot is attempting to configure these endpoints in Hybrid mode. However we've noticed discrepancies in the naming convention of some profiles and domains. For example:
- AzureAD\FirstLastName
- shortdomain\FLast
I believe the User Profile Service is somehow bugged and causing a mismatch between the registry's SID for the user profile.
Has anyone else experienced this issue? We are desperate for answers; this is worse than any virus as its random intermittent nature will return after a fresh system restore.
I've received a call from another organization stating they are seeing the same issue occur throughout their deployment. I believe this is now a wide-spread issue.
We have a ticket opened with the Microsoft on this. Windows Performance Team is reaching out to Azure Team.
Daniel,
We just discovered the same thing and rolled out a fix for it in our environment. For users with an email address in on-prem AD, Azure AD Connect Sync was creating the accounts in Azure online with the pre-Windows 2000 NetBIOS domain name which matches the pre-Windows 2000 NetBIOS user logon name. However, for those without an email, it was creating the account in Azure with the subdomain of the domain FQDN instead of the pre-Windows 2000 name as specified on the account or in Domains and Trusts. Azure AD Cloud Sync was trying to update all accounts to the subdomain and completely ignoring the pre-Windows 2000 names entirely.
As far as experiencing the taskbar issue, once it occurred for one account on the machine, it would then impact all accounts on the machine both pre-existing and new sign-ins. However, accounts that did not have an AD mail attribute would not experience the issue. We found the same SubPkgs key and those that were in the NetBIOS subkeys would have the taskbar, permission, and general SID mismatch errors but those that were in the subdomain subkey would not.
We shut down our Azure AD Connect and are now relying entirely on Cloud Sync. Then, to fix the machines without a reimage, we performed a full Cloud Sync and then ran the following PowerShell script on Azure AD joined machines to clean up the broken accounts. This allowed users to sign in fresh with the subdomain instead of NetBIOS prefix and the issue has not reoccurred, but it's only been about a week.
$CurrentUserSID = (C:\Windows\System32\whoami.exe /User /Fo CSV | ConvertFrom-Csv).SID $CachedAccounts = Get-CimInstance -Classname win32_userprofile | where-object { (!$_.Special) -And ($_.SID -like 'S-1-12-1-*') -And ($_.SID -NotLike $CurrentUserSID) } foreach ($Account in $CachedAccounts) { $SIDtoUser = $null $SID = New-Object System.Security.Principal.SecurityIdentifier($Account.SID) try { $SIDtoUser = $SID.Translate([System.Security.Principal.NTAccount]) Write-Host "Removing $SIDtoUser from list of cached accounts." if ($SIDtoUser -ne $null) { $CachedAccounts = @($CachedAccounts | Where-Object SID -ne $SID) } } catch { Write-Host "Unable to translate SID ($SID) to user." } } if ($CachedAccounts.Count -gt 0) { Write-Host 'Accounts to be removed:' $CachedAccounts | Select LocalPath,SID | ft $Confirmation = Read-Host "Do you want to remove those accounts? (Yes or No)" if ($Confirmation.ToLower() -like "y*") { Write-Host "Removing accounts..." $CachedAccounts | Remove-CimInstance -Verbose } else { Write-Host 'Accounts not removed.' } } else { Write-Host 'No accounts to remove.' }
Hopefully, this script and info helps someone else.
Rexford
52 Replies
- Daniel_GatleyCopper ContributorWe have been experiencing the same issue and in our situation managed to find a fix. On machines with the issue we had two subkey under the following registry key, on working machine only one.
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\IdentityStore\LogonCache\B16898C6-A148-4967-9171-64D755DA8520\SubPkgs
The keys should represent the NetBIOS name of the on-prem domain, in our situation we had ABC and ABC-GLOBAL, ABC represented the actual NetBIOS name while ABC-GLOBAL was the first part of the FQDN (looking at the domain properties in powershell it was it's "Name"). Both of these keys had a value for "AuthenticatingAuthorityDns" which matched our domain FQDN so the assumption here is some kind of conflict. The solution was to simply delete the key which wasn't out domain NetBIOS name (required taking ownership etc). After a reboot the users could login again.
The value for netBIOSName in AzureAD is populated using %Domain.Netbios% for AD Connect and
%DomainNetBios% in Cloud Sync. I've not been able to find anymore information about how these values are obtained. I can only assume that Cloud Sync isn't using exactly the same value as AD Connect. Maybe this only impacts where the "Name" and "NetbiosName" (from a get-addomain) are different.
Anyway that was my experience, I hope this helps people in the future or gives people pointers to allow cloud sync migrations without this issue,- jmerwinCopper Contributor
Daniel_Gatley I'm trying to take ownership of the key and I still receive the error you can't delete this key. It's frustrating that MS can't figure this out.
- Rexford_Haugen_COLTCopper Contributor
Daniel,
We just discovered the same thing and rolled out a fix for it in our environment. For users with an email address in on-prem AD, Azure AD Connect Sync was creating the accounts in Azure online with the pre-Windows 2000 NetBIOS domain name which matches the pre-Windows 2000 NetBIOS user logon name. However, for those without an email, it was creating the account in Azure with the subdomain of the domain FQDN instead of the pre-Windows 2000 name as specified on the account or in Domains and Trusts. Azure AD Cloud Sync was trying to update all accounts to the subdomain and completely ignoring the pre-Windows 2000 names entirely.
As far as experiencing the taskbar issue, once it occurred for one account on the machine, it would then impact all accounts on the machine both pre-existing and new sign-ins. However, accounts that did not have an AD mail attribute would not experience the issue. We found the same SubPkgs key and those that were in the NetBIOS subkeys would have the taskbar, permission, and general SID mismatch errors but those that were in the subdomain subkey would not.
We shut down our Azure AD Connect and are now relying entirely on Cloud Sync. Then, to fix the machines without a reimage, we performed a full Cloud Sync and then ran the following PowerShell script on Azure AD joined machines to clean up the broken accounts. This allowed users to sign in fresh with the subdomain instead of NetBIOS prefix and the issue has not reoccurred, but it's only been about a week.
$CurrentUserSID = (C:\Windows\System32\whoami.exe /User /Fo CSV | ConvertFrom-Csv).SID $CachedAccounts = Get-CimInstance -Classname win32_userprofile | where-object { (!$_.Special) -And ($_.SID -like 'S-1-12-1-*') -And ($_.SID -NotLike $CurrentUserSID) } foreach ($Account in $CachedAccounts) { $SIDtoUser = $null $SID = New-Object System.Security.Principal.SecurityIdentifier($Account.SID) try { $SIDtoUser = $SID.Translate([System.Security.Principal.NTAccount]) Write-Host "Removing $SIDtoUser from list of cached accounts." if ($SIDtoUser -ne $null) { $CachedAccounts = @($CachedAccounts | Where-Object SID -ne $SID) } } catch { Write-Host "Unable to translate SID ($SID) to user." } } if ($CachedAccounts.Count -gt 0) { Write-Host 'Accounts to be removed:' $CachedAccounts | Select LocalPath,SID | ft $Confirmation = Read-Host "Do you want to remove those accounts? (Yes or No)" if ($Confirmation.ToLower() -like "y*") { Write-Host "Removing accounts..." $CachedAccounts | Remove-CimInstance -Verbose } else { Write-Host 'Accounts not removed.' } } else { Write-Host 'No accounts to remove.' }
Hopefully, this script and info helps someone else.
Rexford
- slider484Copper Contributor
We have just dealt with this issue and confirmed this is a the fix. The Entra ID Cloud sync attribute mapping for the netbios name is set by default to %netbiosdomain%. Update this attribute mapping to netbiosname and re-sync Entra ID cloud sync. The run the script with all domain accounts logged out as a local admin. Reboot the machine and users can log back in again and don't have the issue. Note - It does look like this does re-profile the users. We are rolling this out currently but will see if this fixes the issue long term
- PSKieranCopper ContributorWe also ran into this issue and believe that it's a User Profile Service issue too. I don't experience your password/"other user" issue, but since this is the top result when searching for the problem I thought it best to add our solution.
We have this problem after joining existing PCs to Azure joined. You need to log in as an admin, open up the User Profile list (shortcut: rundll32.exe sysdm.cpl,EditUserProfiles) and delete BOTH profiles for your problem users - the old local profile and the Azure one. Make sure you have a backup of any data you need first.
Then log off admin, and the Azure version of the user profile will be able to log on without problems. - creatoni41245Copper ContributorHello,
thanks a lot for the post. I am glad that we are not alone and that the problem is not on our side. Well, of course, I'm not happy that Microsoft, seeing this problem, cannot even triage it.
Briefly about the problem: a problem with the profile (or it looks like).
Temporary (crooked) solution: delete the profile before adding it to the tenant.
Briefly about my task: we are migrating from one AAD tenant to another. This is my first time participating in a migration project and it may not surprise anyone here, but it seemed to me that ours is not quite ordinary for the following reasons:
- we must disable ADsync (that's what I call the Azure AD Connect utility and will continue to call it that in the text) from the old tenant
- change upn suffixes for all users in the old tenant so that you can unbind the main domain name, let's call it contuso.com
- bind this domain name to a new tenant
- enable ADsync synchronization with the new tenant
As far as I understand, tenant migrations are usually stretched out in time and usually this is a merging/ acquisition with a change in upn suffixes.
In our case, the ground infrastructure remains and, accordingly, "On-premises SAM account name", "On-premises domain name"
So, in order:
- all users are created in the ground domain contuso.com,
- synced to AAD (including the aforementioned SAMaccounnt name etc attributes)
- then two options:
1) You add devices to the ground domain (then it syncs and becomes hybrid joined). In this scenario, if you remove the device from the domain and add it to AAD, there will be no problems (according to my tests). I think this is due to the fact that not UPN is involved in adding to the domain, but USER logon name (pre-windows 2000)
I was able to remove these devices from the ground domain without any problems and add them to the new tenant. As far as I can tell, there are no issues in this migration scenario.
2) You add the device to AAD and use the UPN when you sign in. If you go to the folder with user profiles, you will see that a profile will be created in the SAMAccountName format.
We will be interested in this particular option of adding, since we have more than 600 people connected by this type and this is just the target group that needs to be migrated to the new tenant.
By the way, this problem does not apply to Endpoint Manager (Intune), but is related to AAD in general. Even if you don't have Intune, you will run into this issue.
To test the device migration we did the following:
- deployed new ADsync
- set up intune, aad, licenses, etc. in the new tenant
- defined an OU scope for it, where we threw several accounts for synchronization
- for the convenience of testing set up a password hash sync
- for convenience, added a simple name contuso.io as an upn for users (so as not to print long and uncomfortable contusonew.onmicrosoft.com)
That is, we got a working migration, except that it will be "io" instead of "com"
When adding new "clean" devices to the new tenant, there are no problems, a profile is created with the same SAMaccountName as in AD. Everything works perfectly.
But, in the case where the device was previously added to the old tenant (there is already a profile), I get the same Explorer problem described in this thread.
And it’s quite funny, but in this blinking state you can call the Task Manager, through which you can start ProcMon, which I actually did and record what happens with Explorer.
Let's imagine that our user is John Snow. Pre-windows 2000 format: Contuso\JSnow.
In the old tenant, your profile will be JSnow. When added to a new tenant, a new JSnow.Contuso folder will be added, then if you have already done this more than once it will be .001...
The reason for the crash is that the system is trying to write not to the newly created profile folder, but to the already existing JSnow, for which there are not enough rights, ACL etc…..
Deleting the profile (through the profiles menu) helps to solve the problem, but in our case this is a potential loss of data / settings, because our migration plan included migrating the profile using a specialized utility.
We also decided to make an unusual workaround. It works, but it doesn't suit us as a company heavily tied to legacy applications.
Workaround: in ADsync, add the letter N to SAMaccountName to the rule. We get that in the new tenant there will be JSnowN in the profile. Thus, there will be no conflict, and the user will be able to log in without blinking Explorera.
Cons of this method: Kerberos, NTLM stops working in the “ground” domain, because it no longer links you to an existing user.
We created a support ticket, but to be honest, looking at this post and understanding how Microsoft works, I do not expect quick solutions.
I'm sure someone will benefit from our experiments and hope that Microsoft can fix this problem. Perhaps, after reading this text, you can offer something as a working workaround.
Thanks to all!- Edmundo PenaBrass Contributor
It's been a few months since we had the recurrence of the problem. We ended up decommissioning our entire on-premises infrastructure and getting rid of Azure AD Connect and Azure AD Cloud Provisioning/Sync..
While Microsoft has not confirmed this, I'm almost positive the UPN mismatch issue is due to having both products installed at the same time or when transitioning from one to the other. Something about having Azure AD Connect and Azure AD Cloud Sync installed messes with Azure and prevents the endpoint from being able to choose which UPN to match the UIDSID when Win Logon is running.
- sjohn777Copper ContributorHope this helps some still fighting this issue. We had already reimaged the <20 impacted PC's to restore them back to service, but continued to work with MS on troubleshooting one of the devices. Per Microsoft, they confirmed the resolution was related to the ‘Default Impersonation Level’ and we were able to confirm that by changing the setting to the recommended setting it resolved the issue after the device was restarted. Unfortunately, Microsoft was unable to determine what/how the ‘Default Impersonation Level’ was changed in the first place. [Per Microsoft, only way to change settings is to perform below steps. 1. Run dcomcnfg.exe. 2. Expand “Component Services” 3. Expand Computers 4. Right click on “My Computer” and select “Properties” 5. Change to the “Default Properties” tab. 6. Under “Default Impersonation Level” select “Identify” 7. Reboot the machine. Per Microsoft, this cannot be done outside of changing the setting using dcomcnfg.exe.]
- BishopstonITBrass Contributor
thanks for the info. I just reinstalled the 5 laptops in question. Microsoft (via O365/Azure support portal) provided no help. I gave them the link to this discussion, and they just closed the case after I said I reinstalled them! It appears to be related to the new Azure Cloud Sync (not AD connect). My specific tenant in question used to have AD Connect installed (but it was removed a couple of years ago), so I am unsure if this played a part in the corruption? Anyway, thanks for the workaround ...just concerned that MS won't fix the issue, so will be lumbered with more issue on some of the larger clients that I want to enable Cloud Sync ...I imagine I'll just use AD Connect on the larger clients.
- inderlyitCopper Contributor
Same issue here with a few users. Trying to summon the will to contact Microsoft support... EDIT: Just to note we're in the EXACT same boat as OP. We switched from azure ad connect (it was broken for some time) to the new azure ad cloud sync found in the Azure AD portal, following that the issues began. No changes whatsoever on our local AD, and everything matched up nicely when it synced again.
We don't have SID mismatch on all users that are affected though. Some had it some don't.
For now we are removing devices from Azure AD. I don't know how much we'll follow up on this with Microsoft support as we've been getting zero results from them lately.
- Colin KnessCopper ContributorWe are now starting the long process of find some one on the sales side for subscription refunds and on-site hr work to repair.. this should be fun to find some one that cares . AZURE ALREADY SAYS NOT THEIR ISSUE ! good luck it’s a microsoft world out their, we just live in it .
- jmerwinCopper ContributorWe are now seeing a message when we do get signed In to the machine of a SID mismatch error.
- Edmundo PenaBrass ContributorThat’s the canary in the coal mine Microsoft continues to ignore.
- sjohn777Copper Contributor
MS is still troubleshooting a problematic station we have off to the side. Thankfully after reimaging the ~14 other devices, the issue hasn't returned to those or anywehre else in the enterprise, but we still don't know what/why it occurred.
- Edmundo PenaBrass ContributorThe issue is still happening. FML.
- BishopstonITBrass Contributor
happening here too. just installed Azure AD Cloud Sync and hybrid users outside of domain all have incorrect password, then flashing screens when they log on......??!! no news??!!
- jmerwinCopper ContributorThe other thing we noticed is this.
We have a NPS server to allow users to sign into our WiFi with their windows creds.
Now when a user signs into their onprem device the domain name is dlsd0\username
When a AAD device goes to connect to the SSID and the user says use windows credentials the domain gets autopopulated as dlsd\username- SkyisblueCopper Contributor
jmerwin unfortunately in our Organization/tenant the issue appeared two months ago with only administrative accounts when upon expiring password had to renew their credentials.
After this if we use the administrative accounts on any of endpoint devices it creates this taskbar flickering issue.
Until recently other administrative accounts also got affected even without changing the password. We have a hybrid setup.
- Edmundo PenaBrass Contributor
sjohn777 it will happen again when you add new endpoints to the azure ad domain/tenant. I believe there is something wrong with azure endpoint provisioning.
- fourmysirCopper ContributorYes, we are experiencing the same conditions. The issue is very random. The only consistency is the Azure AD admin account is never affected.
- Edmundo PenaBrass ContributorI'm over a month into this issue and the response from Microsoft has been nothing short of log collection and consolidating the various ticket's I've opened with the different support teams in an attempt to progress the troubleshooting efforts. Microsoft Support is a **bleep**ing disgrace.
- jmerwinCopper ContributorThis has just started to happen to our fleet of Azure Devices/Users this past week and into this week. The change that we made was moving from Azure AD Connect to the cloud provisioning agent. The only account not affected is the azure global admin account. Our only fix has been to reset the PC
- sjohn777Copper ContributorWe have the same situation and have opened MS case. Has anyone determined root cause for this situation yet?
- Edmundo PenaBrass ContributorI've confirmed the issue occurs when new endpoints are added to the Azure AD tenant. I'm still pushing Microsoft for answers.
- sjohn777Copper Contributor
Edmundo Pena MS is still reviewing out logs. We however did review WHEN our endpoints were AzureAD joined and such occurred months ago. So we're not certain that's our root cause. We still have yet to find a clear root cause and it (for now) seems to only be for a subset of our endpoints, but it's affected about 16.
- Edmundo PenaBrass ContributorMore days burned in the deep anals of the logs. They called me again to ask for a status update collecting more procmon boot logs. I'm at a loss for words.
- Colin KnessCopper Contributor
it’s very sad to see Microsoft’s support crumple to this BAD intune support operates in a bubble two weeks each day a new day new log no answers no clue have 100 machines affected to date. Microsoft just does not care about their customer any more.. Two weeks !!! No help from twitter sjohn777 @satyanadella
#intunesupport #satyanadella
- Colin KnessCopper Contributor
Edmundo Pena https://call4cloud.nl/2022/07/the-incredibly-strange-device-who-stopped-syncing-and-became-certificate-zombies/
Part 2