Sometime either primary or secondary cluster certificates get expired before you can rotate with new certificate which can cause cluster to be inaccessible or unreachable then you can follow these steps to recover standalone Service Fabric cluster. If you looking to rotate a near expiry certificate refer to previous article : Certificate rotation Azure Service Fabric Standalone cluster - Microsoft Tech Community.
Recover Azure Service Fabric Standalone Cluster which is inaccessible or unreachable due to expired cluster certificates:
3. RDP into each VM and make sure certificate is present and the private key is already ACL'd to 'Network Service'
a ) Run certlm.msc
b) Find the new certificate
c) Right click cert, Manage Private Keys, ensure NETWORK SERVICE has full permissions
4. Stop and disable "Microsoft Service Fabric Host Service" service in command prompt with administrative rights.
Set-Service -ServiceName FabricHostSvc -StartupType disabled
net stop FabricHostSvc
5. Locate ClusterManifest.current.xml in the cluster root folder like "C:\ProgramData\SF\Fabric\ClusterManifest.current.xml" according to actual datapath deployed, and copy to somewhere like C:\Temp\clusterManifest.xml
6. Remove clusterManifest.xml read-only attribute and Modify the C:\Temp\clusterManifest.xml and update with new thumbprint.
a) Replace all occurrences of old cert with the new thumbprint .
7. Locate InfrastructureManifest.xml from .\Fabric\Fabric.Data\InfrastructureManifest.xml path, for my case, it is C:\ProgramData\SF\vm0\Fabric\Fabric.Data\InfrastructureManifest.xml as dataroot is at C:\ProgramData, and copy to c:\temp too.
8. Modify the C:\Temp\InfrastructureManifest.xml and update with new thumbprint.
a) Replace all occurrences of old cert with the new thumbprint
9. Run following cmdlet to update the Service Fabric cluster, replace the SvcFab path according to the actual path.
New-ServiceFabricNodeConfiguration "C:\ProgramData\SF" -FabricLogRoot "C:\ProgramData\SF\log" -ClusterManifestPath "C:\Temp\clusterManifest.xml" -InfrastructureManifestPath "C:\temp\InfrastructureManifest.xml"
10. Look for "C:\ProgramData\SF\vm0\Fabric\Fabric.Package.current.xml" and note the "Configuration version"
Cd into the corresponding folder
Edit "C:\ProgramData\SF\vm0\Fabric\Fabric.Config.0.131572537807340469\Settings.xml" and Replace all occurrences of old cert with the new thumbprint .
11. Set the services "Microsoft Service Fabric Host Service" startup type and start it again
Set-Service -ServiceName FabricHostSvc -StartupType automatic
net start FabricHostSvc
12. Repeat the above steps on every cluster node.
13. After step 12 you should able to reconnect to the cluster over SFX and PowerShell.
14. Now, even the SFX is working, and you can call Connect-ServiceFabricCluster from one of cluster node and secure connection is fine, but Get-ServiceFabricClusterConfiguration still give you the old cluster thumbprint in deployment JSON file.
Get-ServiceFabricClusterConfiguration still outputs the old cluster cert thumbprint expired as expected.
15. We will have to use set-ServiceFabricUpgradeOrchestrationServiceState to get into the cluster state
16. Run Get-ServiceFabricClusterConfiguration cmdlet again, you should see the updated cert info.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.