Application Troubleshooting - Stateless/Stateful Services Cannot Be Started In Service Fabric

Published Mar 01 2021 07:05 PM 952 Views
Microsoft

This blog introduces troubleshooting steps for the issue that stateless/stateful services cannot be started in service fabric as well. 

Customer could read this information and follow up the troubleshooting steps to identify the exception and the issue events when stateless and stateful try to start.


Stateless and Stateful Service Lifecycle 

 

The lifecycle of a stateless service is straightforward. Here's the order of events:

   1. The service is constructed.

   2. Then, in parallel, two things happen:

      * StatelessService.CreateServiceInstanceListeners() is invoked and any returned listeners are opened. ICommunicationListener.OpenAsync() is called on each listener.

      * The service's StatelessService.RunAsync() method is called.

   3. If present, the service's StatelessService.OnOpenAsync() method is called. This call is an uncommon override, but it is available. Extended service initialization tasks can be started at this time.

 

Stateful services have a similar pattern to stateless services, with a few changes. For starting up a stateful service, the order of events is as follows:

   1. The service is constructed.

   2. StatefulServiceBase.OnOpenAsync() is called. This call is not commonly overridden in the service.

      The following things happen in parallel:

      * StatefulServiceBase.CreateServiceReplicaListeners() is invoked.

      * If the service is a Primary service, all returned listeners are opened. ICommunicationListener.OpenAsync() is called on each listener.

      * If the service is a Secondary service, only those listeners marked as ListenOnSecondary = true are opened. Having listeners that are open on secondaries is less common.

      * If the service is currently a Primary, the service's StatefulServiceBase.RunAsync() method is called.

   3. After all the replica listener's OpenAsync() calls finish and RunAsync() is called, StatefulServiceBase.OnChangeRoleAsync() is called. This call is not commonly overridden in the service.


Events and Cancellation Token

 

CreateServiceInstanceListener is to supply the communication listeners for the service instance, it is normally override in stateless service like using Kestrel , https and so on.

 

RunAsync() is executed in its own task. Note that in the code snippet above, we jumped right into a while loop. There is no need to schedule a separate task for your workload. Cancellation of your workload is a cooperative effort orchestrated by the provided cancellation token. The system will wait for your task to end (by successful completion, cancellation, or fault) before it moves on. It is important to honor the cancellation token, finish any work, and exit RunAsync() as quickly as possible when the system requests cancellation. It will be triggered for stateful primary replica or all stateless instances and normally override in stateful service.

 

Cancellation token is provided to coordinate when your service instance needs to be closed. In Service Fabric, this open/close cycle of a service instance can occur many times over the lifetime of the service as a whole. This can happen for various reasons, including:

   * The system moves your service instances for resource balancing.

   * Faults occur in your code

   * The application or system is upgraded.

   * The underlying hardware experiences an outage.


Troubleshooting 

 

Please follow up below steps to idenify the exception method:

   1. RDP to service fabric node. (primary replica node if it is stateful service)

   2. Check Application event logs for any exception if no exceptions go to step 3.

   3. Check if the port is occupied by the other services.

 

For TCP: Get-Process -Id (Get-NetTCPConnection -LocalPort YourPortNumberHere).OwningProcess
For UDP: Get-Process -Id (Get-NetUDPEndpoint -LocalPort YourPortNumberHere).OwningProcess

 

   4. For non-prod environment, remote debug would be helpful to get more insight, please ref more details via Debug your application in Visual Studio

   5. List underlying exceptions and capture dump via procmon

      Start-bitstransfer "https://download.sysinternals.com/files/Procdump.zip"

      procdump.exe -accepteula -e 1 -f ""  -w "processname"

Ping_Wu_0-1612191610290.png

   6. Then capture the dump for specific exception.

       procdump.exe -ma -e 1 -f "NullReferenceException" -w "processname"

   7. Use Windbg, Debugdiag to get details about exception like method call stack.

Ping_Wu_1-1612191610321.png

 
 
 
 
 
 
Co-Authors
Version history
Last update:
‎Feb 01 2021 08:00 AM
Updated by: