Microsoft Entra Blog

8 MIN READ

The False Identifier Anti-pattern

Microsoft

Jun 20, 2023

Today, we’d like to highlight a dangerous anti-pattern in the identity world: the false identifier anti-pattern.  An anti-pattern is a common response to a recurring problem that’s usually ineffective and risks being highly counter-productive. You may have also heard of the password anti-pattern. Today's discussion represents a possibly even more dangerous practice.

The false identifier anti-pattern occurs when an application or service assumes that an attribute other than the subject of an assertion from a federated identity provider is a unique, durable, and trustworthy account identifier during single sign-on.

A structured security document like an OpenID Connect id_token or a SAML 2.0 assertion can contain personal user data like phone numbers, social media handles, and email addresses. These details are used as unique identifiers in some contexts (like sending email or making phone calls) and they are also used as login identifiers in some cases, but there’s a critical difference between using identifiers when they’re informational attributes for the purposes of communication or collaboration, and picking up that value in a federated context from a part of the token that comes with no guarantees of uniqueness and then using that value as a uniquely identifying index, backing authorization decisions and data access.

Use of claims other than subject identifier to uniquely identify an end user in OpenID Connect is non-compliant

The authors of OpenID Connect 1.0 were aware of this anti-pattern ten years ago when the specification was designed. There is normative text directly in the spec that requires all relying parties—the applications and services consuming identity information from SSO transactions—to always use the sub attribute as the primary account key. Section 2 of the OpenID Connect core specification defines sub as follows:

Additionally, Section 5.7 of the OpenID Connect specification explicitly states:

If an OpenID Connect relying party uses any other claims in a token besides a combination of the sub (subject) claim and the iss (issuer) claim as a primary account identifier in OpenID Connect, they’re breaking the contract of expectations between federated identity provider and relying party, operating in non-conformance to the specification, and potentially incurring risk to users. If you’re a relying party for whom the conditions of business include standards compliance, there’s only one option.

It’s worthwhile to determine the explicit requirement in the OpenID Connect spec. Why is a generic payload claim used in the critical security capacity of a unique account identifier so problematic? The specification text in section 5.7 gives all the hints that we need.

Unique local identifier

When an identifier is locally unique, the identity provider is actively detecting and preventing value duplications across users in the domain. If local uniqueness is enforced, there can never be a case where two different identity provider accounts are represented at time of single sign-on (SSO) by the same string. In modern identity infrastructure, organizations manage multiple user communities, and it’s entirely possible those communities have overlapping or recurring communications channels.

An employee who has a test account and a production account might set their phone number to be the same for both accounts. A non-employee might have an invited guest account for corporate collaboration and a customer account because they use the identity provider’s retail services, both using the same email address. In these examples, the email and phone number are not locally unique and insufficient to distinguish accounts, but they’re acceptable claims to include in the token, because the attributes are part of a token payload without a uniqueness requirement that describes a token subject that has a uniqueness requirement. 

Never reassigned within the issuer

The next statement in section 5.7 refers to the attribute’s lifecycle. Various events can occur to disassociate a user from their email address or phone number. The question is what happens next. If an identifier can be assigned to two different users at two different times, there’s a risk of the later user getting assigned to the permissions and data of the earlier user. 

For example, imagine an organization hires Fred Smith and assigns an email address of fsmith@company.com. Fred eventually leaves, and Frida Smith is hired and assigned the same email address. The token subject should be a different value for Fred and Frida, even if the email address is the same. This prevents any chance that Frida ever sees Fred’s data.

Stability over time

There’s a simple consideration that guests and partners may need to intentionally change their communications channels at some point, like during mergers and acquisitions. If company A takes over company B, the communications values need to change for the entire company, without affecting users’ ability to perform SSO. Token subjects must be durable enough to survive name changes, corporate transitions, office moves to new area codes, and other volatile events.

Real-world examples

Time to talk about a real-world example. Frida is an accountant who works for Company3 and contracts services out to Company1 and Company2 (for Company1 she works for two separate departments). Her expertise is a SaaS Accounting application called RP.com, and she spends the whole day in that application working for various customers, but Frida only has one email address, the one through her employer. What will happen when she signs in to each of her four identity provider accounts and tries to SSO to RP.com?

Frida uses one email with four different accounts at three companies. What token subject mappings should occur?

Option 1: False identifier anti-pattern avoided at RP.com

If RP.com is conformant to the OpenID Connect specification, Frida will happily sign in to each of her accounts to see relevant data and interact with the appropriate users. RP.com will be putting the right user in the right security context.

Frida's four identity provider accounts are securely mapped to four unique relying party accounts. Yay!

Option 2:  RP.com falls for the False Identifier Anti-pattern

If RP.com has fallen into the trap of assuming email addresses are acceptable as durable identifiers, a few things may happen. 

The first is local conflation within a single tenant. This is where the identity provider is sending RP.com tokens for two different accounts that happen to have the same email address attached. If RP.com is not indexing on the subject, Frida (and her employer) will experience “unnecessary challenges”. When RP.com sees tokens that represent two different accounts as representing a single account, the authentication process becomes unstable, possibly preventing Frida from being able to perform her job. The resulting user session might only grant access to one of Frida’s two accounts, or the data on the backend might become conflated between her two RP.com accounts. What if the reason why Frida has two accounts at Company2 is because one of those accounts is for a highly confidential investigation, and the confidential data comingles with normal data within the RP tenant for Company2? 

Frida's four identity provider accounts encounter a mis-mapping of token subject. This is bad.

Option 3: RP.com hits the false identifier anti-pattern worst-case scenario

If RP.com seriously overcommits to a false identifier, the worst-case scenario is that RP.com only looks at the email address when creating identifiers across multiple relying party tenants. In this case, four different tokens representing four different user accounts will end up identified at RP.com by the same string.

Determining then which of the four accounts are accessed under what circumstances would come down to how the relying party architects their service. It could be that Frida can only always get to one of her RP accounts no matter what IDP account she signs in on. It could be that there is a tenant-specific starting page that drops a cookie, or that each tenant has its own subdomain. 

The worst-case scenario can escalate if the mechanism that dictates which tenant session is loaded can be manipulated. While Frida might not think of impersonating one account from another account, attackers will. If an attacker could register their own tenant, connect it via SSO, and specify Frida’s email address, they could potentially compromise each of the three companies Frida interacts with at RP.com.

Frida's four identity provider accounts are mis-mapped both within a tenant and across tenants. This is really bad.

False identifiers in SAML

You may be wondering, with all this talk about OpenID Connect, what happens in the world of Security Assertion Markup Language (SAML)? The anti-pattern is the same for both protocols, but the SAML specification is where the industry learned those tough cross-domain identifier lessons. SAML implementations can still be an issue, but implementation details can change the nuances:

The complexity of the SAML specification makes it difficult for relying parties to roll their own implementation. While relying parties can still make mistakes when they map identity attributes, the libraries and tools that validate SAML assertions are often mature enough to have mitigated risk where possible.
In the SAML specification, an email address is a valid and commonly used token subject. However common use of SAML is in captive workforce cases where the organization "owns" the email and enforces uniqueness. This captive paradigm is breaking down due to the use of bring your own identity (BYOI) and B2B federated use cases—which may mean risk is growing for identity providers, not just relying parties.
The SAML specification has no commonly implemented parallel to the end-user consent flow that OAuth-based OpenID Connect uses, where if an administrator allows it, end users can virally ‘opt-in’ to access an app (this is managed in Azure AD via the User and Admin Consent feature).

The higher education world is ahead in recognizing and taking action to counter this anti-pattern. Section 2.1 of the SAML V2.0 Subject Identifier Attributes Profile Version 1.0 stated in 2019 that: “Identification of subjects in security protocols and applications has a fraught history of inconsistent syntax, bugs, terrible but deeply cemented practices such as misuse of email addresses, vertical market-specific approaches, and failure to precisely communicate intended semantics and constraints.”

Both SAML and OpenID Connect identity providers can be at risk for the false identifier anti-pattern, but the work the OpenID Connect specification editors put in up front results in a certified identity provider is being well-protected. 

Token subjects tend to be generated at the library or cloud infrastructure level and are not usually a claim that is individually mapped per connection from a data store by an admin. If a SAML identity provider admin chooses to map email address into the subject identifier of a SAML assertion, and the user population authorized for the target relying party includes accounts with email addresses that are not guaranteed to be locally unique, never reassigned, and stable over time, the organization has fallen prey to the false identifier anti-pattern and has incurred dangerous risk.

The simple takeaway

At the end of the day, the message for the RP.coms of the world is simple: always and only use the token subject and issuer together to form the primary key when identifying a federated account. 

The message for anyone who’s an identity provider is simple: always ensure every token subject generated for SSO is locally unique, never reassigned, and stable over time.  

Follow these two rules to evade the anti-pattern and securely map all your users in all the right places.

Learn more about Microsoft Entra:

Related Articles:
- Decentralized identity: The Direct Presentation model
- All about FIDO2, CTAP2 and WebAuthn

See recent Microsoft Entra blogs
Dive into Microsoft Entra technical documentation
Join the conversation on the Microsoft Entra discussion space and Twitter
Learn more about Microsoft Security

Updated Jun 19, 2023

Version 1.0

identity standards

Pamela Dingle

Microsoft

Joined August 21, 2018

View Profile

Microsoft Entra Blog

Stay informed on how to secure access for any identity to any resource, anywhere, with comprehensive identity and network access solutions powered by AI.

Visit the homepage

Blog Post