In some certain situations, Cloud Service role instances might be slow to start, or they might be recycling or stuck with busy state so that the role instance fails to start as expected. The role application contains two parts that might cause such kinds of role instance recycling/busy issue: Startup Tasks and Role code (Implementation of RoleEntryPoint). In this document, I’m going to share troubleshooting guidance of Startup Tasks failure mainly. You could refer to following official documents regarding the initial configuration, execution and examples of Startup Tasks in Classic Cloud Service.
https://docs.microsoft.com/en-us/azure/cloud-services/cloud-services-startup-tasks
https://docs.microsoft.com/en-us/azure/cloud-services/cloud-services-startup-tasks-common
How to identify if the issue is caused by startup task?
If startup task is applied in your application, we might need to identity if it’s caused by startup task initialization or execution firstly when we encounter role recycling/busy issue.
First step is trying to RDP into the problematic role instance. You could get the RDP file via clicking following Connect button if you enabled RDP feature before.
If not, you could enable it in easy way over Remote Desktop blade, then get back to above step.
Next, you could check processes status in the Task Manager---Details tag after logging into role instance.
Based on Windows Azure Role Architecture, WaHostBootstrapper process is responsible for startup tasks specifically.
Hence if this process starts, but you don’t see the WaIISHost(for WebRole) or WaWorkerHost(for WorkerRole) processes start then it is most likely a startup task that is failing. Please kindly notice that this is applicable to simple and foreground task types, not for background. The reason is background startup tasks are executed asynchronously, in parallel with the startup of the role.
How to troubleshoot startup task failure?
1.Check WaHostBootstrapper log
You could find the log file from this path: C:\Resources\WaHostBootstrapper.log which is listed in this official document.
Then you could search if there is any Error or Exception regarding the Startup.cmd execution, especially to verify if the exited code equals 0 otherwise the startup task complete with errors. If there isn’t log related to existed code, it also means the startup task is still running.
2.Enable customize logging mechanism
Sometimes, Startup script can’t be rerun freely in the production environment due to expected business impact. Thus, except built-in WaHostBootstrapper log, adding customize log in the command line could be more significant for troubleshooting process.
You could output key information in the command of script like …..>>"%TEMP%\StartupLog.txt".
3.Manually trigger startup task
If you could observe execution exception from above step, the direct way is running this cmd file in the present environment to reproduce the issue. You could navigate to the local directory E:\approot\bin\Startup.cmd (for WebRole) or E:\approot\Startup.cmd(for WorkerRole) and run the command line manually to verify if it can be executed successfully. This might be helpful for narrowing down the issue.