How to handle ConnectionFailedOrchestrationFailureOnUnhealthyHost Errors

%3CLINGO-SUB%20id%3D%22lingo-sub-1640152%22%20slang%3D%22en-US%22%3EHow%20to%20handle%20ConnectionFailedOrchestrationFailureOnUnhealthyHost%20Errors%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1640152%22%20slang%3D%22en-US%22%3E%3CP%3EHi%2C%26nbsp%3B%3C%2FP%3E%3CP%3EWe%20have%20have%20been%20using%20the%20GA%20version%20the%20WVD%20in%20our%20production%20environment%20and%20have%20also%20configured%20ARM%20based%20auto-scaling%20tool%20that%20leverages%20automation%20account%20and%20logic%20app%20to%20turn%20ON%20and%20OFF%20session%20hosts%20based%20on%20the%20peak%20time%20and%20user%20connections.%20We%20also%20have%20configured%20the%20monitoring%20in%20the%20environment%20and%20we%20are%20using%20the%20workbook%20released%20by%20the%20engineering%20team%20to%20monitor%20different%20details%20of%20the%20environment%20like%20session%20host%20health%2C%20session%20details%20etc.%26nbsp%3B%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CU%3EAuto%20scale%20configuration%3A%26nbsp%3B%3C%2FU%3E%3C%2FP%3E%3CP%3EHost%20pool%20mode%20-%20shared%3C%2FP%3E%3CP%3EHost%20pool%20connection%20threshold%20-%205%3C%2FP%3E%3CP%3EEach%20session%20host%20instance%20-%202%20vCores%3C%2FP%3E%3CP%3ESessions%20allowed%20per%20core%20-%204%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CU%3EProblem%3A%3C%2FU%3E%3C%2FP%3E%3CP%3EMostly%20it%20works%20fine%2C%20however%20some%20times%20we%20are%20seeing%20constant%20logs%20appearing%20in%20the%20session%20details%20mentioning%20this%20and%20these%20appear%20during%20peak%20times%20when%20auto-scale%20comes%20in%20the%20action.%26nbsp%3B%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CU%3EError%3A%26nbsp%3B%3C%2FU%3E%3C%2FP%3E%3CP%3E%3CSPAN%3EConnectionFailedOrchestrationFailureOnUnhealthyHost%3A%26nbsp%3BFailed%20orchestration%20on%20session%20host%20%E2%89%A4session_host_name%E2%89%A5%20id%3Dd4xyus-fap2-av5c-a321-990g58b411c%20with%20state%20Unhealthy.%3C%2FSPAN%3E%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CU%3EPotential%20cause%3A%3C%2FU%3E%3C%2FP%3E%3CP%3EWe%20suspect%20that%20when%20the%20allowed%20connection%20for%20session%20host%20%2F%20session%20host%20threshold%20is%20hit%20i.e.%205%20connections%20-%20auto%20scale%20clearly%20is%20starting%20new%20session%20host%20however%20the%20WVD%20load%20balancer%20is%20not%20really%20waiting%20for%20host%20to%20be%20fully%20ready%20and%20is%20redirecting%20new%20user%20sessions%20to%20this%20freshly%20started%20session%20host%20and%20that%20is%20causing%20the%20error.%26nbsp%3B%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3ECan%20someone%20share%20insights%20on%20this%20or%20if%20our%20understanding%20is%20correct%20then%20what%20could%20be%20the%20potential%20fix%20for%20this%20issue%3F%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EThanks%20in%20advance.%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E
Contributor

Hi, 

We have have been using the GA version the WVD in our production environment and have also configured ARM based auto-scaling tool that leverages automation account and logic app to turn ON and OFF session hosts based on the peak time and user connections. We also have configured the monitoring in the environment and we are using the workbook released by the engineering team to monitor different details of the environment like session host health, session details etc. 

 

Auto scale configuration: 

Host pool mode - shared

Host pool connection threshold - 5

Each session host instance - 2 vCores

Sessions allowed per core - 4

 

Problem:

Mostly it works fine, however some times we are seeing constant logs appearing in the session details mentioning this and these appear during peak times when auto-scale comes in the action. 

 

Error: 

ConnectionFailedOrchestrationFailureOnUnhealthyHost: Failed orchestration on session host ≤session_host_name≥ id=d4xyus-fap2-av5c-a321-990g58b411c with state Unhealthy.

 

Potential cause:

We suspect that when the allowed connection for session host / session host threshold is hit i.e. 5 connections - auto scale clearly is starting new session host however the WVD load balancer is not really waiting for host to be fully ready and is redirecting new user sessions to this freshly started session host and that is causing the error. 

 

Can someone share insights on this or if our understanding is correct then what could be the potential fix for this issue?

 

Thanks in advance. 

1 Reply

We tried opening a support ticket for this with Azure, however have been informed that this is unsupported and scaling tool itself is not meant to be used in the production environment since wvd engineering team is still working on it?

have reported same issue on the RDS-Templates git https://github.com/Azure/RDS-Templates/issues/552 for further clarification and assistance.

 

If in case anyone has any clue - please respond to this thread.