Troubleshooting Node down Scenarios in Azure Service Fabric - Part II
Published May 20 2021 11:11 PM 2,866 Views

This is a continuation of Troubleshooting Node down Scenarios in Azure Service Fabric here



Virtual Machine associated with the node is healthy, but Service Fabric Extension being unhealthy could cause node to go down in Service Fabric cluster.


RDP into node, which is down. Open Task manager and Observe the Fabric processes.


If Fabric.exe and FabricHost.exe is crashing and Restarting often, then check Mitigation#1.

If ServiceFabricNodeBootStrapAgent.exe is crashing and Restarting often check Mitigation#2.

If FabricInstallerSvc.exe is crashing and Restarting often check Mitigation#3.



  • <path>/Cluster.current.xml
  • Does it match manifest for cluster (compare with the one in SFX)
  • No
    • Does SFX indicate upgrades in progress?
  • No upgrades in progress
    • Go to  <Path>
    • Open Clustermanifest.current.xml
    • Replace contents of Clustermanifest.current with contents of manifest in SFX.
    • Save
    • In task manager, select Fabric.exe if running and click on "End Task" button
    • If Fabric.exe is not running, reboot VM.
    • It will take a few minutes for node to become healthy.
    • Node did not become healthy, start from beginning.                                               

Path: D:\SvcFab\_Nodename_\Fabric\ClusterManifest.current.xml



Check if this process listed in list of processes in Task Manager.

  • If “Yes”:
    • Wait a while to see if the node heals itself.
    • This process tries to heal the failure at a coarse level by restarting the VM and reinstalling SF runtime.
    • It waits for 15 minutes after an attempt to heal before taking the next action.
    • Check ServiceFabricNodeBootstrapAgent.InstallLog – Check “From the Node”                                                                                     Path: C:\Packages\Plugins\Microsoft.Azure.ServiceFabric.ServiceFabricNode\<version>\Service\ServiceFabricNodeBootstrapAgent.InstallLog
    • Did not heal, go to “Event Viewer logs” for error details.


  • If “No”:
    • Go to Services tab in Task Manager and click on Open Services link at the bottom.
    • Check the startup mode for the bootstrap service, make sure it is Automatic .
    • Start service.
    • If it stays running, go to "Yes" section above.



Check if the connectivity of the Node is working.

For more details Refer to Part III - Troubleshooting Node down Scenarios.

Version history
Last update:
‎May 20 2021 11:15 PM
Updated by: