Session Hosts Hanging Frequently

%3CLINGO-SUB%20id%3D%22lingo-sub-848947%22%20slang%3D%22en-US%22%3ESession%20Hosts%20Hanging%20Frequently%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-848947%22%20slang%3D%22en-US%22%3E%3CP%3EWondering%20if%20anyone%20else%20is%20seeing%20regular%20VM%20hangs%20with%20the%20Windows%2010%20Enterprise%20for%20Virtual%20Desktops%20image%2C%20or%20has%20any%20advice%20on%20troubleshooting%20the%20issue%20we're%20experiencing%3F%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EIn%20our%20tenant%20we%20have%2014%20session%20hosts%20(each%20has%2016%20vCPU%2C%2064GB%20RAM%2C%20256GB%20Premium%20SSD)%20in%20a%20single%20host%20pool.%20FSLogix%20Apps%20is%20used%20for%20profiles%20and%20they're%20stored%20on%20a%20Premium%20Azure%20Files%20Storage%20Account%20(5TB%20Quota%2C%205000%20allowed%20IO%2Fs%2C%2015000%20burst%20IO%2Fs)%20in%20the%20same%20region%20as%20the%20hosts.%20There%20are%20225%20users%20that%20use%20WVD%20for%20a%20full%20desktop%20environment%20(no%20RemoteApps).%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EAverage%20CPU%20and%20RAM%20usage%20during%20peak%20time%20is%20less%20than%2050%25%20per%20VM.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EAlmost%20every%20day%2C%20usually%20during%20peak%20hours%2C%20at%20least%20one%20of%20the%20VMs%20hangs%20and%20needs%20to%20be%20restarted%20from%20the%20Azure%20portal.%20Users%20that%20are%20connected%20to%20the%20affected%20VM%20report%20that%20none%20of%20their%20opened%20applications%20are%20responsive%2C%20and%20that%20they%20can%E2%80%99t%20launch%20or%20close%20any%20applications%2C%20even%20using%20task%20manager.%20The%20start%20menu%20also%20becomes%20unresponsive.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EAny%20new%20connections%20(via%20Remote%20Desktop%20client%20or%20directly%20via%20RDP)%20fail.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EIf%20we%20try%20to%20log%20off%20users%20using%20the%20%22Invoke-RdsUserSessionLogoff%22%20cmdlet%2C%20their%20session%20hangs%20at%20the%20%22Signing%20you%20out%22%20screen%20indefinitely.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EIf%20we%20try%20to%20kill%20any%20of%20their%20processes%20using%20task%20manager%20(as%20an%20admin)%20we%20get%20an%20Access%20Denied%20error%20message%2C%20or%20the%20process%20doesn%E2%80%99t%20get%20killed.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3ETypically%20in%20the%20event%20logs%2C%20about%2030%20minutes%20prior%20to%20the%20VM%20hanging%2C%20we%20start%20to%20a%20few%20of%20following%20events%2C%20but%20there%20is%20no%20commonality%20between%20applications%2C%20servers%20or%20users%20in%20the%20event%20descriptions.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E1002%20%E2%80%93%20Application%20Hang%3CBR%20%2F%3E7036%20%E2%80%93%20Services%20entering%20a%20stop%20state%3CBR%20%2F%3E7011%20%E2%80%93%20A%20timeout%20was%20reached%20wile%20waiting%20for%20a%20response%20from%20a%20service%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-850034%22%20slang%3D%22en-US%22%3ERe%3A%20Session%20Hosts%20Hanging%20Frequently%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-850034%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F354980%22%20target%3D%22_blank%22%3E%40DanRobb%3C%2FA%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EI%20have%20seen%20this%20exact%20behavior%2C%20but%20the%20culprit%20was%20storing%20the%20profiles%20in%20blob%20storage.%26nbsp%3B%20After%20moving%20to%20SMB%2C%20our%20lock%20ups%20went%20away.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E1%20question%2C%20did%20you%20enable%20Known%20Folders%20for%20OneDrive%3F%26nbsp%3B%20I've%20been%20testing%20that%20option%20and%20think%20there%20may%20be%20issues%20with%20it%20locking%20up%20the%20box%20but%20I'm%20not%20certain.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-850779%22%20slang%3D%22en-US%22%3ERe%3A%20Session%20Hosts%20Hanging%20Frequently%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-850779%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F393111%22%20target%3D%22_blank%22%3E%40evgaff%3C%2FA%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EWe're%20using%20Azure%20Files%20(not%20blob%20storage)%20and%20connecting%20to%20it%20over%20SMB.%20Are%20your%20profiles%20still%20stored%20in%20an%20Azure%20storage%20account%2C%20or%20did%20you%20move%20them%20to%20a%20file%20server%3F%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EYes%2C%20we've%20enabled%20known%20folders%20with%20OneDrive.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-1524045%22%20slang%3D%22en-US%22%3ERe%3A%20Session%20Hosts%20Hanging%20Frequently%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1524045%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F354980%22%20target%3D%22_blank%22%3E%40DanRobb%3C%2FA%3E%26nbsp%3Bdid%20you%20ever%20figure%20this%20out%3F%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-1525809%22%20slang%3D%22en-US%22%3ERe%3A%20Session%20Hosts%20Hanging%20Frequently%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1525809%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F359032%22%20target%3D%22_blank%22%3E%40Robin_Kinetix%3C%2FA%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EWe%20were%20never%20able%20to%20figure%20out%20a%20proper%20solution%20despite%20months%20of%20back%20and%20forth%20with%20MS%20Support.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EI%20ended%20up%20creating%20a%20scheduled%20task%20to%20reboot%20on%20the%20hosts%20every%208%20hours%20(if%20nobody%20is%20logged%20in)%20which%20has%20reduced%20the%20frequency%20of%20the%20hangs%20from%20once%20every%20few%20days%20to%20once%20every%20few%20weeks%2Fmonths.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EHere's%20the%20Powershell%20script%20that%20is%20called%20by%20the%20scheduled%20task%20-%20it%20runs%20every%205%20minutes.%3C%2FP%3E%3CPRE%20class%3D%22lia-code-sample%20language-powershell%22%3E%3CCODE%3E%24minHoursBetweenReboots%20%3D%208%20%23%20Minimum%20number%20of%20hours%20between%20reboots%0A%0Afunction%20Write-Log(%24logMessage)%0A%7B%0A%20%20%20%20%24logFile%20%3D%20%22%24(%24PSScriptRoot)%5CScheduledRestart.log%22%0A%20%20%20%20Write-Output%20%22%5BLOG%5D%20%5B%24(Get-Date%20-Format%20%22yyyy-MM-dd%20HH%3Amm%3Ass%22)%5D%20%24logMessage%22%0A%20%20%20%20Write-Output%20%22%5BLOG%5D%20%5B%24(Get-Date%20-Format%20%22yyyy-MM-dd%20HH%3Amm%3Ass%22)%5D%20%24logMessage%22%20%7C%20Out-File%20%24logFile%20-Append%0A%7D%0A%0AWrite-Log%20%22Start%20of%20script%22%0A%0Afunction%20End-Script%0A%7B%0A%20%20%20%20Write-Log%20%22End%20of%20script%22%0A%20%20%20%20exit%200%0A%7D%0A%0A%23%20Check%20for%20running%20installations.%20If%20any%2C%20then%20exit%0Aif((Get-Process).Name%20-match%20%22msiexec%7Csetup%7Cwusa%22)%0A%7B%0A%20%20%20%20Write-Output%20%22Detected%20software%20installation%20in%20progress.%20Exiting...%22%0A%20%20%20%20End-Script%0A%7D%0A%0A%23%20Check%20for%20active%20sessions.%20If%20any%2C%20then%20exit%0A%5Barray%5D%24activeSessions%20%3D%20%26amp%3B%20query%20session%20%7C%20Select-String%20-SimpleMatch%20%22Active%22%0Aif(%24activeSessions.Count%20-gt%200)%0A%7B%0A%20%20%20%20Write-Log%20%22There%20are%20%24(%24activeSessions.Count)%20active%20sessions.%20Exiting...%22%0A%20%20%20%20End-Script%0A%7D%0Aelse%0A%7B%0A%20%20%20%20Write-Log%20%22There%20are%20no%20active%20sessions%22%0A%7D%0A%0A%23%20Get%20last%20boot%20time%0Atry%0A%7B%0A%20%20%20%20%24osInfo%20%3D%20Get-WmiObject%20-Class%20Win32_OperatingSystem%0A%20%20%20%20%5Bdatetime%5D%24lastBootTime%20%3D%20%24osInfo.ConvertToDateTime(%24osInfo.LastBootUpTime)%0A%20%20%20%20Write-Log%20%22Last%20boot%20time%3A%20%24lastBootTime%22%0A%7D%0Acatch%0A%7B%0A%20%20%20%20Write-Log%20%22ERROR%3A%20Unable%20to%20get%20last%20boot%20time.%20Exiting...%22%0A%20%20%20%20End-Script%0A%7D%0A%0A%23%20If%20more%20than%20%24minHoursBetweenReboots%20since%20last%20boot%20time%20then%20reboot%0Aif(%24lastBootTime%20-lt%20(Get-Date).AddHours(-%24minHoursBetweenReboots))%0A%7B%0A%20%20%20%20Write-Log%20%22Last%20boot%20time%20was%20more%20than%20%24minHoursBetweenReboots%20hours%20ago.%20Restarting...%22%0A%20%20%20%20%26amp%3B%20shutdown%20-r%20-t%200%20-f%20-c%20%22Restart%20initiated%20by%20scheduled%20task%2Fpowershell%20script%22%0A%7D%0Aelse%0A%7B%0A%20%20%20%20Write-Log%20%22Last%20boot%20time%20was%20less%20than%20%24minHoursBetweenReboots%20hours%20ago.%20Exiting...%22%0A%20%20%20%20End-Script%0A%7D%0AEnd-Script%3C%2FCODE%3E%3C%2FPRE%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E
Highlighted
Occasional Contributor

Wondering if anyone else is seeing regular VM hangs with the Windows 10 Enterprise for Virtual Desktops image, or has any advice on troubleshooting the issue we're experiencing?

 

In our tenant we have 14 session hosts (each has 16 vCPU, 64GB RAM, 256GB Premium SSD) in a single host pool. FSLogix Apps is used for profiles and they're stored on a Premium Azure Files Storage Account (5TB Quota, 5000 allowed IO/s, 15000 burst IO/s) in the same region as the hosts. There are 225 users that use WVD for a full desktop environment (no RemoteApps).

 

Average CPU and RAM usage during peak time is less than 50% per VM.

 

Almost every day, usually during peak hours, at least one of the VMs hangs and needs to be restarted from the Azure portal. Users that are connected to the affected VM report that none of their opened applications are responsive, and that they can’t launch or close any applications, even using task manager. The start menu also becomes unresponsive.

 

Any new connections (via Remote Desktop client or directly via RDP) fail.

 

If we try to log off users using the "Invoke-RdsUserSessionLogoff" cmdlet, their session hangs at the "Signing you out" screen indefinitely.

 

If we try to kill any of their processes using task manager (as an admin) we get an Access Denied error message, or the process doesn’t get killed.

 

Typically in the event logs, about 30 minutes prior to the VM hanging, we start to a few of following events, but there is no commonality between applications, servers or users in the event descriptions.

 

1002 – Application Hang
7036 – Services entering a stop state
7011 – A timeout was reached wile waiting for a response from a service

4 Replies
Highlighted

@DanRobb 

 

I have seen this exact behavior, but the culprit was storing the profiles in blob storage.  After moving to SMB, our lock ups went away.

 

1 question, did you enable Known Folders for OneDrive?  I've been testing that option and think there may be issues with it locking up the box but I'm not certain.

Highlighted

@evgaff 

 

We're using Azure Files (not blob storage) and connecting to it over SMB. Are your profiles still stored in an Azure storage account, or did you move them to a file server?

 

Yes, we've enabled known folders with OneDrive.

 

 

 

 

Highlighted

@DanRobb did you ever figure this out?

Highlighted

@Robin_Kinetix 

 

We were never able to figure out a proper solution despite months of back and forth with MS Support.

 

I ended up creating a scheduled task to reboot on the hosts every 8 hours (if nobody is logged in) which has reduced the frequency of the hangs from once every few days to once every few weeks/months.

 

Here's the Powershell script that is called by the scheduled task - it runs every 5 minutes.

$minHoursBetweenReboots = 8 # Minimum number of hours between reboots

function Write-Log($logMessage)
{
    $logFile = "$($PSScriptRoot)\ScheduledRestart.log"
    Write-Output "[LOG] [$(Get-Date -Format "yyyy-MM-dd HH:mm:ss")] $logMessage"
    Write-Output "[LOG] [$(Get-Date -Format "yyyy-MM-dd HH:mm:ss")] $logMessage" | Out-File $logFile -Append
}

Write-Log "Start of script"

function End-Script
{
    Write-Log "End of script"
    exit 0
}

# Check for running installations. If any, then exit
if((Get-Process).Name -match "msiexec|setup|wusa")
{
    Write-Output "Detected software installation in progress. Exiting..."
    End-Script
}

# Check for active sessions. If any, then exit
[array]$activeSessions = & query session | Select-String -SimpleMatch "Active"
if($activeSessions.Count -gt 0)
{
    Write-Log "There are $($activeSessions.Count) active sessions. Exiting..."
    End-Script
}
else
{
    Write-Log "There are no active sessions"
}

# Get last boot time
try
{
    $osInfo = Get-WmiObject -Class Win32_OperatingSystem
    [datetime]$lastBootTime = $osInfo.ConvertToDateTime($osInfo.LastBootUpTime)
    Write-Log "Last boot time: $lastBootTime"
}
catch
{
    Write-Log "ERROR: Unable to get last boot time. Exiting..."
    End-Script
}

# If more than $minHoursBetweenReboots since last boot time then reboot
if($lastBootTime -lt (Get-Date).AddHours(-$minHoursBetweenReboots))
{
    Write-Log "Last boot time was more than $minHoursBetweenReboots hours ago. Restarting..."
    & shutdown -r -t 0 -f -c "Restart initiated by scheduled task/powershell script"
}
else
{
    Write-Log "Last boot time was less than $minHoursBetweenReboots hours ago. Exiting..."
    End-Script
}
End-Script