SOLVED

Azure AD Endpoint Manager User Profile Corruption: Black Screen Flashing Taskbar Explorer Crash Loop

Brass Contributor

We are in the midst of a Azure/Endpoint Manager (Intune) Migration. 300+ Endpoints and are running into deployment nightmare:

 

We are experiencing a very odd, completely random issue when a previously Synced Hybrid Azure AD User logs into their endpoint (which was previously working without issue for weeks/months) and then suddenly fails to load. 

This issue only seems to occur when NEW endpoints are added to the Azure AD tenant/domain.

 

We know the issue is about to happen when you receive a call from an end-user stating their previously working credentials are "no longer working".

 

When the the user attempts to login via "other user"; The login will proceed, and the user will login to a black desktop/screen and flashing taskbar.

 

Windows Task Manager is not responsive; Safe-mode options will not produce a better end result.

 

Upon reviewing the logs you will see "explorer.exe" crash loop prompting urtcbase.dll.

 

  • Azure AD homed user accounts and local user accounts are able to login without issue into the endpoint.
  • The issue is only specific to Hybrid Azure AD User Profiles  (on-premise cached/home' d accounts). I'm thinking it has to do with a conflict of the on-premise SAM Account name. I'm not sure why adding new endpoints to the tenant causes the issue.

This particular issue is happening across all different makes, models, and Window Image variations. The issue is specific to only Azure AD Profiles that attempt to login to the endpoint.

 

Precursors:

Incorrect password prompt. Requires uses to select "other user"

After selecting other user, user profile experiences delayed "Welcome"

Black screen appears with flashing taskbar, rending the profile useless

 

If we attempt a Wipe/Restore the issue will randomly reoccur on another workstation.

 

I believe the issue is specific in the way Windows try to load/create the profile for Azure AD users. I'm not sure if AutoPilot is attempting to configure these endpoints in Hybrid mode. However we've noticed discrepancies in the naming convention of some profiles and domains. For example:

 

  • AzureAD\FirstLastName
  • shortdomain\FLast

I believe the User Profile Service is somehow bugged and causing a mismatch between the registry's SID for the user profile.

 

Has anyone else experienced this issue? We are desperate for answers; this is worse than any virus as its random intermittent nature will return after a fresh system restore.

 

I've received a call from another organization stating they are seeing the same issue occur throughout their deployment. I believe this is now a wide-spread issue.

 

We have a ticket opened with the Microsoft on this. Windows Performance Team is reaching out to Azure Team. 

49 Replies

Hey @Edmundo Pena,

 

Have you had any update in regards to this issue? I have been experiencing a similar thing with an account. 

I'm still waiting on debug analysis from the Windows Performance team. I am trying to push them for answers.

@Edmundo Pena any updates we defederated a few … the saga continues … I understand Microsoft may change their name to “Take It Or Leave It” or at least the support model ! 

@Colin Kness After weeks of “deep log analysis” they called me today with a off the cuff work around that had nothing to do with the logs detailed and or what my prior troubleshooting  efforts determined. This is low effort/low quality support. Very disappointed with the response. Microsoft continues to fail partners and customers with this support/ test everything in production development model. 

We have the same situation and have opened MS case. Has anyone determined root cause for this situation yet?

it’s very sad to see Microsoft’s support crumple to this BAD intune support operates in a bubble two weeks each day a new day new log no answers no clue have 100 machines affected to date.  Microsoft just does not care about their customer any more.. Two weeks !!! No help from twitter @sjohn777 @satyanadella

#intunesupport #satyanadella

More days burned in the deep anals of the logs. They called me again to ask for a status update collecting more procmon boot logs. I'm at a loss for words.
I've confirmed the issue occurs when new endpoints are added to the Azure AD tenant. I'm still pushing Microsoft for answers.

@Edmundo Pena MS is still reviewing out logs.   We however did review WHEN our endpoints were AzureAD joined and such occurred months ago.  So we're not certain that's our root cause.  We still have yet to find a clear root cause and it (for now) seems to only be for a subset of our endpoints, but it's affected about 16.

They won't find anything. Proc Mon won't load on the effected profiles so they don't have the logs they needs to effectively troubleshoot the problem. They are looking at this from the wrong perspective. The issue is specific to Hybrid Accounts on Azure Joined Endpoints. I think is has to do with a mismatch of the Azure vs on-premise SAM account name that the User Profile Service tries to create on the local workstation but no one at Microsoft is paying attention. 10% of one of our customers is down this morning because of it.

This particular issue has now effected 25% of the 350+ users we have in production and reoccurred across 15 users after adding 20 endpoints over the weekend to the Azure AD tenant.

I am fighting to have this escalated this Microsoft Bug escalated to a more capable team from what is currently provided through professional partner support (while being chastised for it).

Steps we’ve taken while waiting to hear from Microsoft Support:

1. Disabled Directory Sync (Hopefully clears on-premise attributes from current AD users)
2. We’ve currently renamed all effected user accounts to email address removed for privacy reasons
3. Recreated Azure AD Users with original UPNs
4. Migrating Exchange, OneDrive & Teams Data from old users to newly recreated users

 

@sjohn777 @ElliotStewart @Colin Kness 

Yes, we are experiencing the same conditions. The issue is very random. The only consistency is the Azure AD admin account is never affected.
I'm over a month into this issue and the response from Microsoft has been nothing short of log collection and consolidating the various ticket's I've opened with the different support teams in an attempt to progress the troubleshooting efforts. Microsoft Support is a **bleep**ing disgrace.
This has just started to happen to our fleet of Azure Devices/Users this past week and into this week. The change that we made was moving from Azure AD Connect to the cloud provisioning agent. The only account not affected is the azure global admin account. Our only fix has been to reset the PC
Incredible. The issue self-healed on one laptop and caused another azure ad joined endpoint to experience the same thing. This is the worse than any virus. Microsoft has not been able to produce any results.

MS is still troubleshooting a problematic station we have off to the side.  Thankfully after reimaging the ~14 other devices, the issue hasn't returned to those or anywehre else in the enterprise, but we still don't know what/why it occurred.

 

1 best response

Accepted Solutions
best response confirmed by Edmundo Pena (Brass Contributor)
Solution

Daniel,

 

We just discovered the same thing and rolled out a fix for it in our environment. For users with an email address in on-prem AD, Azure AD Connect Sync was creating the accounts in Azure online with the pre-Windows 2000 NetBIOS domain name which matches the pre-Windows 2000 NetBIOS user logon name. However, for those without an email, it was creating the account in Azure with the subdomain of the domain FQDN instead of the pre-Windows 2000 name as specified on the account or in Domains and Trusts. Azure AD Cloud Sync was trying to update all accounts to the subdomain and completely ignoring the pre-Windows 2000 names entirely.

 

As far as experiencing the taskbar issue, once it occurred for one account on the machine, it would then impact all accounts on the machine both pre-existing and new sign-ins. However, accounts that did not have an AD mail attribute would not experience the issue. We found the same SubPkgs key and those that were in the NetBIOS subkeys would have the taskbar, permission, and general SID mismatch errors but those that were in the subdomain subkey would not.

 

We shut down our Azure AD Connect and are now relying entirely on Cloud Sync. Then, to fix the machines without a reimage, we performed a full Cloud Sync and then ran the following PowerShell script on Azure AD joined machines to clean up the broken accounts. This allowed users to sign in fresh with the subdomain instead of NetBIOS prefix and the issue has not reoccurred, but it's only been about a week.

 

 

$CurrentUserSID = (C:\Windows\System32\whoami.exe /User /Fo CSV | ConvertFrom-Csv).SID
$CachedAccounts = Get-CimInstance -Classname win32_userprofile | where-object { (!$_.Special) -And ($_.SID -like 'S-1-12-1-*') -And ($_.SID -NotLike $CurrentUserSID) }
foreach ($Account in $CachedAccounts) {
    $SIDtoUser = $null
    $SID = New-Object System.Security.Principal.SecurityIdentifier($Account.SID)
    try { 
        $SIDtoUser = $SID.Translate([System.Security.Principal.NTAccount])
        Write-Host "Removing $SIDtoUser from list of cached accounts."
        if ($SIDtoUser -ne $null) {
            $CachedAccounts = @($CachedAccounts | Where-Object SID -ne $SID)
        }
    } catch {
        Write-Host "Unable to translate SID ($SID) to user."
    }
}
if ($CachedAccounts.Count -gt 0) {
    Write-Host 'Accounts to be removed:'
    $CachedAccounts | Select LocalPath,SID | ft
    $Confirmation = Read-Host "Do you want to remove those accounts? (Yes or No)"
    if ($Confirmation.ToLower() -like "y*") {
        Write-Host "Removing accounts..."
        $CachedAccounts | Remove-CimInstance -Verbose
    } else {
        Write-Host 'Accounts not removed.'
    }
} else {
    Write-Host 'No accounts to remove.'
}

 

 

Hopefully, this script and info helps someone else.

 

Rexford

View solution in original post