Sometime either primary or secondary cluster certificates get expired before you can rotate with new certificate which can cause cluster to be inaccessible or unreachable then you can follow these steps to recover standalone Service Fabric cluster. If you looking to rotate a near expiry certificate refer to previous article : Certificate rotation Azure Service Fabric Standalone cluster - Microsoft Tech Community.
This article assumes you are running cluster with thumbprint approach. In general, the common name approach is recommended for easy certificate management. More information about certificate on Standalone cluster refer to Secure a cluster on Windows by using certificates - Azure Service Fabric | Microsoft Docs
Recover Azure Service Fabric Standalone Cluster which is inaccessible or unreachable due to expired cluster certificates:
- Create or get the new certificate.
- Deploy the new cert to all nodes manually by following https://docs.microsoft.com/en-us/powershell/module/pkiclient/import-pfxcertificate?view=win10-ps
3. RDP into each VM and make sure certificate is present and the private key is already ACL'd to 'Network Service'
a ) Run certlm.msc
b) Find the new certificate
c) Right click cert, Manage Private Keys, ensure NETWORK SERVICE has full permissions
4. Stop and disable "Microsoft Service Fabric Host Service" service in command prompt with administrative rights.
Set-Service -ServiceName FabricHostSvc -StartupType disabled
net stop FabricHostSvc
5. Locate ClusterManifest.current.xml in the cluster root folder like "C:\ProgramData\SF\Fabric\ClusterManifest.current.xml" according to actual datapath deployed, and copy to somewhere like C:\Temp\clusterManifest.xml
6. Remove clusterManifest.xml read-only attribute and Modify the C:\Temp\clusterManifest.xml and update with new thumbprint.
a) Replace all occurrences of old cert with the new thumbprint .
7. Locate InfrastructureManifest.xml from .\Fabric\Fabric.Data\InfrastructureManifest.xml path, for my case, it is C:\ProgramData\SF\vm0\Fabric\Fabric.Data\InfrastructureManifest.xml as dataroot is at C:\ProgramData, and copy to c:\temp too.
8. Modify the C:\Temp\InfrastructureManifest.xml and update with new thumbprint.
a) Replace all occurrences of old cert with the new thumbprint
9. Run following cmdlet to update the Service Fabric cluster, replace the SvcFab path according to the actual path.
New-ServiceFabricNodeConfiguration "C:\ProgramData\SF" -FabricLogRoot "C:\ProgramData\SF\log" -ClusterManifestPath "C:\Temp\clusterManifest.xml" -InfrastructureManifestPath "C:\temp\InfrastructureManifest.xml"
10. Look for "C:\ProgramData\SF\vm0\Fabric\Fabric.Package.current.xml" and note the "Configuration version"
Cd into the corresponding folder
Edit "C:\ProgramData\SF\vm0\Fabric\Fabric.Config.0.131572537807340469\Settings.xml" and Replace all occurrences of old cert with the new thumbprint .
11. Set the services "Microsoft Service Fabric Host Service" startup type and start it again
Set-Service -ServiceName FabricHostSvc -StartupType automatic
net start FabricHostSvc
12. Repeat the above steps on every cluster node.
13. After step 12 you should able to reconnect to the cluster over SFX and PowerShell.
14. Now, even the SFX is working, and you can call Connect-ServiceFabricCluster from one of cluster node and secure connection is fine, but Get-ServiceFabricClusterConfiguration still give you the old cluster thumbprint in deployment JSON file.
Get-ServiceFabricClusterConfiguration still outputs the old cluster cert thumbprint expired as expected.
15. We will have to use set-ServiceFabricUpgradeOrchestrationServiceState to get into the cluster state
- Connect-ServiceFabricCluster
- Get-ServiceFabricUpgradeOrchestrationServiceState | Out-File .\state.json
- Replace the old thumbprint in state.json file with the new thumbprint.
- Set it back "set-ServiceFabricUpgradeOrchestrationServiceState -StateFilePath c:\60CU2\state.json
16. Run Get-ServiceFabricClusterConfiguration cmdlet again, you should see the updated cert info.
Updated Jan 06, 2021
Version 2.0mohitkhanna
Microsoft
Joined August 28, 2019
Azure PaaS Blog
Follow this blog board to get notified when there's new activity