Blog Post

Azure PaaS Blog
3 MIN READ

Troubleshooting expired certificate Azure Service Fabric Standalone Cluster

mohitkhanna's avatar
mohitkhanna
Icon for Microsoft rankMicrosoft
Jan 06, 2021

Sometime either primary or secondary cluster certificates get expired before you can rotate with new certificate which can cause cluster to be inaccessible or unreachable then you can follow these steps to recover standalone Service Fabric cluster. If you looking to rotate a near expiry certificate refer to previous article : Certificate rotation Azure Service Fabric Standalone cluster - Microsoft Tech Community

This article assumes you are running cluster with thumbprint approach. In general, the common name approach is recommended for easy certificate management. More information about certificate on Standalone cluster refer to Secure a cluster on Windows by using certificates - Azure Service Fabric | Microsoft Docs

 

Recover Azure Service Fabric Standalone Cluster which is inaccessible or unreachable due to expired cluster certificates:

 

  1. Create or get the new certificate.  
  2. Deploy the new cert to all nodes manually by following https://docs.microsoft.com/en-us/powershell/module/pkiclient/import-pfxcertificate?view=win10-ps  

     3. RDP into each VM and make sure certificate is present and the private key is already ACL'd to  'Network Service' 

          a ) Run certlm.msc 

          b) Find the new certificate

          c) Right click cert, Manage Private Keys, ensure NETWORK SERVICE has full permissions 

 

    4. Stop  and disable "Microsoft Service Fabric Host Service" service in command prompt with administrative rights. 

         Set-Service -ServiceName FabricHostSvc -StartupType disabled 

         net stop FabricHostSvc 

   5.  Locate ClusterManifest.current.xml in the cluster root folder like "C:\ProgramData\SF\Fabric\ClusterManifest.current.xml" according to actual datapath deployed, and copy to somewhere like C:\Temp\clusterManifest.xml 

   

   6.  Remove clusterManifest.xml read-only attribute and Modify the C:\Temp\clusterManifest.xml and update with new thumbprint. 

     a) Replace all occurrences of old cert with the new thumbprint .

  

   7. Locate InfrastructureManifest.xml from .\Fabric\Fabric.Data\InfrastructureManifest.xml path, for my case, it is C:\ProgramData\SF\vm0\Fabric\Fabric.Data\InfrastructureManifest.xml as dataroot is at C:\ProgramData, and copy to c:\temp too.  

 

  8. Modify the C:\Temp\InfrastructureManifest.xml and update with new thumbprint. 

      a) Replace all occurrences of old cert with the new thumbprint 

 

   9.  Run following cmdlet to update the Service Fabric cluster, replace the SvcFab path according to the actual path.   

New-ServiceFabricNodeConfiguration "C:\ProgramData\SF" -FabricLogRoot "C:\ProgramData\SF\log" -ClusterManifestPath "C:\Temp\clusterManifest.xml" -InfrastructureManifestPath "C:\temp\InfrastructureManifest.xml" 

 

  10.  Look  for  "C:\ProgramData\SF\vm0\Fabric\Fabric.Package.current.xml"  and note  the "Configuration version" 

 

Cd into the corresponding folder 

 

Edit "C:\ProgramData\SF\vm0\Fabric\Fabric.Config.0.131572537807340469\Settings.xml"  and Replace all occurrences of old cert with the new thumbprint .

 

11. Set the services "Microsoft Service Fabric Host Service" startup type and start it again 

Set-Service -ServiceName FabricHostSvc -StartupType automatic 

net start FabricHostSvc 

 

12. Repeat the above steps on every cluster node. 

 

13. After step 12 you should able to reconnect to the cluster over SFX and PowerShell.   

 

14. Now, even the SFX is working, and you can call Connect-ServiceFabricCluster from one of cluster node and secure connection is fine, but Get-ServiceFabricClusterConfiguration still give you the old cluster thumbprint in deployment JSON file. 

 

Get-ServiceFabricClusterConfiguration still outputs the old cluster cert thumbprint expired as expected. 

 

15. We will have to use  set-ServiceFabricUpgradeOrchestrationServiceState to get into the cluster state  

  1. Connect-ServiceFabricCluster 
  2. Get-ServiceFabricUpgradeOrchestrationServiceState | Out-File .\state.json 
  3. Replace the old thumbprint in state.json file with the new thumbprint. 
  4. Set it back "set-ServiceFabricUpgradeOrchestrationServiceState -StateFilePath c:\60CU2\state.json

16. Run Get-ServiceFabricClusterConfiguration cmdlet again, you should see the updated cert info. 

 

Updated Jan 06, 2021
Version 2.0
No CommentsBe the first to comment