Service Fabric Clusters secured with certificates issued by DigiCert - at risk of undergoing outage
Published Nov 24 2020 04:07 PM 3,631 Views
Microsoft

What is the Certificate Validation Issue? 

DigiCert introduced a new CA which reuses the signing key of an existing and still-valid CA. This means there are 2 different CA certificates in circulation, and either can be included in the chain built for a certificate signed by this shared key. Existing certificates declared in Service Fabric clusters by subject with issuer pinning are at risk of spontaneously failing validation.  

 

How to identify if your cluster is susceptible to the Certificate Validation Issue? 

This issue affects any SF cluster that uses a Cluster certificate that is a DigiCert-issued X509 certificate(s), and which meets both of the following conditions:

 

a) Cluster certificate is declared by common name with issuer pinning, and the list of issuer thumbprints specifies one - but not both - of the following thumbprints: 1fb86b1168ec743154062e8c9cc5b171a4b7ccb4, 626d44e704d1ceabe3bf0d53397464ac8080142c.
 

b) The cluster certificate is signed by one of the 2 conflicting CAs; you can determine if that is the case either by examining the certificate extensions, or its chain, as follows:

  • The certificate’s Authority Key Identifier extension (AKI, OId: 2.5.29.35) matches KeyID=0f80611c823161d52f28e78d4638b42ce1c6d9e2, or
  • The certificate's issuer is either of the following DigiCert SHA2 Secure Server CAs:
    • SHA1 thumbprint 1f:b8:6b:11:68:ec:74:31:54:06:2e:8c:9c:c5:b1:71:a4:b7:cc:b4
      valid until 08/Mar/2023
      serial #01:fd:a3:eb:6e:ca:75:c8:88:43:8b:72:4b:cf:bc:91
    • SHA1 thumbprint 62:6d:44:e7:04:d1:ce:ab:e3:bf:0d:53:39:74:64:ac:80:80:14:2c
      valid until 22/Sep/2030
      serial #02:74:2e:aa:17:ca:8e:21:c7:17:bb:1f:fc:fd:0c:a0

The cluster certificate configuration can be found in the ARM resource of your Service Fabric cluster. If your cluster is not configured using the above properties, you may disregard the rest of this post.  

 

Symptoms in impacted environments 

  • One or more cluster nodes appear down/unhealthy. 
  • Cluster is unreachable, whether from the Azure portal or directly (SFX/other clients). 
  • Event logs show errors like: “authorization failure: CertificateNotMatched”. 
  • Pending upgrades are not progressing/appear to be stuck.

Required Action 

  • Follow the Trouble Shooting guide with Mitigation steps: Troubleshooting Guide  
  • Mitigation specified in the TSG must be applied by you.  

 

If you have any questions or concerns, please contact us by opening a support request. In addition, here are your general support options for Service Fabric: Learn about Azure Service Fabric Support options - Azure Service Fabric | Microsoft Docs

Version history
Last update:
‎Nov 24 2020 06:54 PM
Updated by: