Service Fabric Explorer (SFX) is an open-source tool for inspecting and managing Azure Service Fabric clusters. Service Fabric Explorer is a desktop application for Windows, macOS and Linux. To launch SFX in a web browser, browse to the cluster's HTTP management endpoint from any browser - for example https://clusterFQDN:19080. Service Fabric explorer may not load for numerous reasons. Most frequent reasons could be access denied while trying to access or unable to choose the right certificate. Following steps provide some useful insights on investigation steps and mitigations to be followed in such scenarios.
1. Check the status of the cluster and certificate that is being tried to access the cluster. If the cluster state is in “Upgrade service unreachable” then mostly, the certificate might be expired. If the certificate has a warning stating it is not under trusted root and is issued by a third-party certificate issuer, then add it to trusted root certificates and exclude the Certificate issuer from any security rules that will block the access to the site. Furthermore, if cluster is healthy, and certificate is not expired, then verify if provided certificate to the cluster is wrong.
2. If the issue persists, post verifying correct certificate usage and validity, clear the browser session and cache to get it to prompt again. Additionally, try to access from incognito mode or private window.
3. SFX fails to load due to certificate issues. Initially, To identify certificate related issues at the first level, verify if there is a pop up coming up on screen to choose the certificate before accessing the Service Fabric Explorer.
Note: Download the certificate on the machine that is been used to access the Service fabric explorer such that certificate appears on pop up while accessing SFX.
4. When loading the admin Service Fabric Explorer, use F12 (or any network traffic analyzer) to look at the call failures.
If there are call failures with 403 as shown below, it means Fabric Upgrade Service is not able to talk to gateway.
This indicates an issue with the certificate or an issue with http gateway. For certificate issues, check if certificate is ACL’d correctly to 'Network Service' and has full permissions.
5. If a similar screen like below is visible,
Moving ahead, verify if the Inbound connectivity is blocked from the Azure portal. check if port 19000 and 19080 are open and accessible in Azure NSG, and machine’s IP of user is whitelisted when trying to access from local machine.
In case of any blockages in inbound connectivity to 19080 via network/firewall/proxy issue at client end, They must be unblocked by the client. To help identify network issues, Using the Network Monitor Tool will help capture the traces that can be analysed further.
7. Further, to isolate the issue, RDP into any one of the VM that is a part of Service fabric cluster. Try to access localhost:19080 and see if service fabric explorer is visible. If yes, then check your Load balancer’s rules and allow https connection to 19080.
8. Try connecting to the cluster over PowerShell using Connect-ServiceFabricCluster. If this succeeds, FabricGateway is up and the TCP management endpoint is fine. As a next step Please reach out to Service Fabric support team to investigate traces for HttpGateway issues.
If connection to cluster fails, FabricGateway is having issues and not just the Http endpoint. The next step is to share the traces located in D:\SvcFab\Log\Traces with service fabric support team to investigate further for FabricGateway issues.