DigiCert introduced a new CA which reuses the signing key of an existing and still-valid CA. This means there are 2 different CA certificates in circulation, and either can be included in the chain built for a certificate signed by this shared key. Existing certificates declared in Service Fabric clusters by subject with issuer pinning are at risk of spontaneously failing validation.
How to identify if your cluster is susceptible to the Certificate Validation Issue?
This issue affects any SF cluster that uses a Cluster certificate that is a DigiCert-issued X509 certificate(s), and which meets both of the following conditions:
a)Cluster certificate is declared by common name with issuer pinning, and the list of issuer thumbprints specifies one - but not both - of the following thumbprints: 1fb86b1168ec743154062e8c9cc5b171a4b7ccb4, 626d44e704d1ceabe3bf0d53397464ac8080142c.
b) The cluster certificate is signed by one of the 2 conflicting CAs; you can determine if that is the case either by examining the certificate extensions, or its chain, as follows:
The certificate’s Authority Key Identifier extension (AKI, OId: 220.127.116.11) matches KeyID=0f80611c823161d52f28e78d4638b42ce1c6d9e2, or
SHA1 thumbprint 1f:b8:6b:11:68:ec:74:31:54:06:2e:8c:9c:c5:b1:71:a4:b7:cc:b4 valid until 08/Mar/2023 serial #01:fd:a3:eb:6e:ca:75:c8:88:43:8b:72:4b:cf:bc:91
SHA1 thumbprint 62:6d:44:e7:04:d1:ce:ab:e3:bf:0d:53:39:74:64:ac:80:80:14:2c valid until 22/Sep/2030 serial #02:74:2e:aa:17:ca:8e:21:c7:17:bb:1f:fc:fd:0c:a0
The cluster certificate configuration can be found in the ARM resource of your Service Fabric cluster. If your cluster is not configured using the above properties, you may disregard the rest of this post.
Symptomsin impacted environments
One or more cluster nodes appear down/unhealthy.
Cluster is unreachable, whether from the Azure portal or directly (SFX/other clients).
Event logs show errors like: “authorization failure: CertificateNotMatched”.
Pending upgrades are not progressing/appear to be stuck.