Microsoft Security Community Blog

15 MIN READ

Decentralized Identity: Verifiable Credentials Deep Dive

Microsoft

Dec 09, 2022

Welcome to part four of our decentralized identity series! The goal for this segment of our larger story is to show you what a concrete VC (aka Verifiable Credential) looks like and to describe enough of the terms and concepts that you can further research W3C VC Data Format 1.1 specification easily. There are a lot of links in this post, most of them will take you right to the corresponding part of the specification.

If you missed our earlier installations in the series you can find them here:

Part I: The Five Guiding Principles

Part II: The Direct Presentation model

Part III: The Basics of Decentralized Identity

Part IV: Deep Dive: Verifiable Credentials <-- You are here

Part V: Deep Dive: Decentralized Identifiers

General Model – and what is a Holder?

To understand the world of verifiable credentials and verifiable presentations, you need to understand the ecosystem in which they are expected to be used. The VC Data Model v1.1 defines the roles different entities play in the data format, as well as two processes: credential issuance and credential presentation. You are probably familiar with a concept of issuers or verifiers in the physical world, but the role of holder could be new to you. The specification defines a holder as an entity that can “possess a verifiable credential and generate verifiable presentations”. Another way to think about this is the place you store your credentials until you are ready to take them out and use them for a purpose. You might want to jump to the conclusion that a holder and a digital wallet are the same thing - a wallet can play the role of holder, but it has many other characteristics we will discuss some other time. Additionally, a wallet is usually a "user-present" technology, meaning that there is a human performing a ceremony as part of the credential interaction. While wallet interactions are the most talked-about use of VCs there are also machine-to-machine interactions that do not have a user experience at all.

Credential Issuance

I have previously called a verifiable credential (VC) a letter of introduction – a statement of relationship between an issuer and a subject. A verifiable credential uses cryptographic proofs to bind an issuer statement about a subject to the subject's identifier. The resulting document can contain claims relative to the subject and in some cases proofs of different kinds can be bundled together. Once issued, a verifiable credential can be held for potentially long periods of time and presented for multiple purposes and in multiple ways.

VCs in a wallet wrapped by a VP and sent to verifiers

Credential Presentation

In the verifiable credential world, presentation involves the generation of a VP – a verifiable presentation. A VP is generated by a holder - it is a document that wraps a verifiable credential with a new credential that is both fresh and which proves a relationship in the moment between the holder and the original VC subject. Most often the subject/holder relationship is direct - meaning that the entity presenting the credential can prove they are the same entity named in the credential. To check this relationship, the verifier checks that the VC subject is the same as the VP issuer, AND that the signature of the VP is valid. VCs in a wallet wrapped by a VP and sent to verifiers

All of this passing around of VC's and VP's is like a big game of hot potato, and different credentials may have different patterns of presentation frequency and breadth of presentation across many verifiers.

A visual representation of overall VC/VP travel - VCs travel between Issuers and Holders, then VCs wrapped in VPs travel from holders to verifiers.

What is in a Verifiable Credential?

Now that you know how verifiable credentials move, we can look at what they contain. While the VC Data Model doesn't specify the protocols for travel of credential between actors in the flow, the actors must be part of the data model, because their associated public/private keypairs are used for cryptographic signing operations. At the highest and geekiest level, a verifiable credential bundles claims together into a container called a credential, and imposes standardized rules around how parties interacting with the VC can derive confidence in different characteristics of that bundle - for example the data model shows how to verify that the created credential was made by the entity claiming to be the credential issuer.

Canonical Data Format vs. Encoded Representations

If you take all proofs and interpretation out of a verifiable credential, you have a base data format (the bundle I referenced earlier) that is simply referred to in the spec as “a credential”. This credential data format is referred to as the “canonical form” and the assumption is that you can always get from a given verifiable credential representation to a canonical credential data format and back again, using the encoding & decoding rules laid out in section 6 of the VC Data Model 1.1. This is why the examples in the specification document have three tabs – one shows the canonical form, and the other two show each of the representations that can be used. While any number of representations could be additionally defined, there are two specified in the core data model:

JSON-encoded verifiable credentials (known as JWT-VCs) are encoded as signed JWTs (JSON Web Tokens).
JSON-LD encoded verifiable credentials (known as LDP-VCs) can use different proof formats, but the most common are Linked Data Proofs.

Three boxes showing a credential (without verifiability) on top and two credentials (with viability) descending - a JSON encoded credential and a JSON-LD Encoded Credential

A simple way to think about the representations versus the canonical credentials is that the representations are what add the verifiability to a verifiable credential. A canonical credential is the bundle of claims that purports to be from an issuer and purports to be about a subject, and contains a set of additional claims, but anyone can tamper with the content of that bundle. When the canonical credential is encoded into a representation however, the representation dictates what kind of proof structure can be used to evaluate various properties of the credential, with tamper-proofness being a required check (this happens through checking a cryptographic signature). Other checks could potentially happen, for example validating a selective disclosure of information rather than an entire credential.

All of my examples in this blog article will be JSON-encoded, partly because this is what Microsoft has implemented (and therefore what I’m more experienced in) but also because many of you reading this blog are probably familiar with the JSON Web Tokens already, especially if you have worked with JWT-based implementations like OpenID Connect IDtokens, Federated Identity Credentials, or JWT bearer client authentication tokens. If you aren’t familiar with what a JSON Web Token is, check out this tutorial.

A JWT-VC is (surprise surprise) a mashup of a JSON Web Token and a canonical VC. The JWT supplies verifiability through a cryptographic signature bound to a subject identifier that is labeled as iss (aka the issuer). JWTs have a very familiar structure.

A visual representation of a JSON-encoded Verifiable Credential - 3 sections - the header contains the alg, typ and kid claims. The payload contains iss, nbf, exp, jti, sub claims and an object called vc, which contains a @context, type, and the credential subject object. That object contains various claims. The last of the 3 major sections is the signature.

A JWT-VC has three parts, and the payload contains what I would call envelope information: the data needed to know who the credential is is bound to, who made the credential, when it was made and how it can be identified. Additionally, there is a JSON object called “vc”. Claims information is embedded inside the vc object. A JWT-VC uses an external proof, meaning in this case that signature data is not embedded inline with the credential, the signature is detached from the credential.

Here’s an example of a JSON-encoded verifiable credential that my university might issue to communicate my relationship and some additional information. Note that this is NOT a real-world Microsoft-issued example, it is a construct meant to highlight characteristics of the specification. If you see ways in which my example is not conformant, please point it out in the comments, for extra street cred!

A valid JSON object formatted as a JWT-VC with a header, payload and Signature section

The JWT-VC payload combines two different kinds of claims:

The three-letter claims (iss, nbf, exp, jti, sub in this example ) come from RFC 7519 (the JWT specification). These claims are required to validate the JWT and would be expected by any conformant JWT library or product.
The vc claim at the first level of the JWT comes from the W3C VC Data Model 1.1. The vc claim contains the following standardized claims:
- credentialSubject is the container for authoritative attribute statements about a subject (for example, the “achieved” object in my example, which is custom to the NorthAmericanUniversityGraduationRecord credential type and shows the type of accreditation earned).
- @context contains a set of URIs that point to machine-readable definitions conforming to the W3C JSON-LD @context definition. In this version of the spec, even if you aren’t using JSON-LD, you need to respect this format and ensure that the values set in @context additionally contain a reference to the VC Data Model v1.1 base context so that JWT-VC credentials won’t break JSON-LD parsers. In our example, we included the base context but also a reference to “global-diplomas.org/accreditation”, which is where the credential schema for the credential in the type attribute would be found.
- type is another property that should be constructed according to JSON-LD conventions. According to Section B.2 of the VC Data Model 1.1, @type is used to “indicate which set of claims the verifiable credential contains”. Unlike the @context property, which is always a set of URIs, @type can be a single string or an unordered set. Our NorthAmericanUniversityGraduationRecord type helps verifiers to reliably ask for a credential with predictable contents.

In order to know how a canonical credential corresponds to a JWT-VC, you need to spend quality time in section 6.3.1 of the VC data model. Here’s an example of how our JWT-VC would look decoded into the canonical data format:

Diagram mapping our example JWT-VC object to a canonical credential

Note that in the JWT-VC, both the jti and the sub properties map to a property called “id” in the credential data model, one property applies to the parent credential object and the other applies to a credentialSubject object.

Distinctions, Differences and Gotchas

The JWT-VC representation imposes rules on how a VC can be constructed, and in some cases the choice of representation has important consequences.

Single Signature/Issuer

Signed JWTs are structured to have only one signature, and that single signature is affiliated to one issuer. This is an external proof (in VC terms) and it wraps the entire assertion.

Single Subject

RFC 7519 defines a JWT sub claim as containing a single value, which maps to the id value of a credentialSubject object. There can only be one JWT sub claim, which means that a JWT-VC can’t have more than one credentialSubject object within it. Other encodings do not have this limitation, allowing multiple credentialSubject objects within a single verifiable credential.

Nbf, not iat

In the VC data model, issuance date is defined as “the date and time when a credential becomes valid”. In JWT-land, there is a difference between the moment the token was issued (iat), and the earliest moment that any verifier should consider the token as valid (nbf). The JWT authors defined two separate values to help account for clock skew, because if the system clocks aren't perfectly aligned, the token could arrive before it was even issued. The VC data model JSON encoding maps the issuanceDate claim in the abstract credential to the JWT “not before” claim rather than the JWT “issued at” claim.

Remember, the Verifiable Credential is just half of the Story!

A subject might have a signed JWT-VC stored with a holder, but to present it to a verifier, the holder needs to also construct a JWT-VP to show that the subject & holder have a legitimate relationship to this particular verifiable credential, and that the subject wants this particular VC to be presented to this particular verifier, at this particular time. Our JWT-VC claims that a globally unique subject (did:ion:pamelarosiedee) has earned a degree, now we need to convince a verifier that did:ion:pamelarosiedee is right here, right now, present in the transaction and is actively passing their credential along. Cryptographically speaking, a JWT-VP represents a proof of possession: a verifiable way to know for sure that the entity presenting the diploma is the same entity that the diploma was issued to in the first place.

Going back to my last blog entry, a verifiable presentation is an endorsement. It wraps our letter of introduction (VC) and adds an extra proof to the mix – a time specific proof that the exact subject listed in the verifiable credential is present and part of a real-time identity transaction. If my employer does all their validation checks properly, they will know that an entity with a globally unique identifier of did:ion:pamelarosiedee (me) is claiming to be the subject of an authoritative educational achievement issued by did:web:ucalgary.ca (my university) in the moment in time when the employer needs that information. The validation checks enable cryptographic trust, meaning that the data hasn’t been tampered with, but don’t be fooled that cryptographic trust is the only trust decision that must be made! What if did:web:ucalgary.ca wasn’t a real university, for example? I could go register the domain “dingleuniversity.com” and create a degree for myself that could be generated and presented with perfect cryptographic trust! The verifier needs to be able not only to know the documents are not tampered with, but that the issuers of the documents are acceptable business partners in a given context. This is where trust federations, trust registries and trust frameworks come in, topics that are out of scope for this article but that we will return to.

Heading back to our example, what would a verifiable presentation look like if I wanted, on the 8^th of August 2022, to present my University degree to my employer?

A JWT-VP issued by my decentralized identifier and audienced for microsoft’s decentralized identifier that contains the verifiable credential we’ve already studied

A Closer look at JWT-VP

The example above is a JSON-encoded verifiable presentation, or JWT-VP. The “how-to” of JSON-encoding into a JWT-VP is blended into section 6 of the spec, so there is no easy section to refer to, but there are a few significant differences if you read closely:

No Subject

The subject property isn’t needed in a VP, because the issuer is asserting a thing about itself, in other words a JWT-VC is self-issued (you will hear this term again).

Wraps a Related Verifiable Credential

Remember, the JWT-VC must prove a relationship between the presented credential and the entity doing the presentation. By far the most common relationship to prove is the “this is my credential” relationship, also called “Subject is the Holder”. If I stole my sister’s diploma and uploaded it to my employer to verify, the burden is on the verifier to detect the fraud and reject my claim to the uploaded credential. If the verifier does a bad job, the fraud could work. In verifiable credentials we use cryptographic proofs to be sure the presenter of the credential is the subject of the credential. If the issuer that signs the JWT-VP is the same entity as the subject listed in the encapsulated JWT-VC, you have a strong proof of legitimate possession. In our JWT-VP world this check is straightforward because both JWT issuers and subject are always single-valued. Validation logic for this use case is currently non-normative, but do not interpret non-normative as not important!

Constrained to an Audience

While verifiable credentials are an open-ended assertion, verifiable presentations are targeted, using a property called aud. This critical security component prevents an attacker from stealing a presentation made to one verifier and reusing it at another verifier. Creating a presentation without an audience is like broadcasting your endorsement – as long as nobody records your broadcast you might be ok, but if anyone does record that broadcast, they can replay it at will! Imagine if you uploaded your diploma to your social network and openly endorsed it as yours to get a pretty badge, but somebody recorded you endorsing your diploma and used that recording to apply for 3 different jobs in your name! Not only is it imperative to use an audience in all presentations, but it is imperative not to accept broadcasts, that is verifiers should not accept an un-audienced presentation, because it could be a replay. This does not mean that audience needs to be a strict match in the way that a subject/holder match is performed – it could be that the presentation is legitimately audienced to a group of verifiers rather than a single verifier, but every degree of freedom allowed in the audience introduces replay risk.

Time-Limited and Single-Use

The momentary nature of a verifiable presentation is what makes it safe to hold and store verifiable credentials. A JWT-VP represents the exact moment where a subject intends to share a credential. These VPs are only valuable if they can reliably represent the true intention and only the true intention of the subject in the moment they intended it and no other. Once a JWT-VP is signed and represents that intention, the token itself becomes extremely powerful. To keep abuse to a minimum, the JWT-VP exp property together with nbf creates a short window of validity. Verifiers need to track every incoming jti value (the unique token identifier) for as long as it is valid, so that they can detect any token that might fall through the cracks and get replayed. A short JWT-VP validity window reduces the number of active identifiers that have to be tracked by verifiers (helping performance), and also condenses the window of opportunity for an attacker to act, this also means a smaller horizon for risk/fraud tools to look for abuse trends.

Note that many VCs will also have a validity window calculated by the exp property, however that window can be much longer, in part because a stolen JWT-VC has less value by itself without a fresh JWT-VP to wrap it.

How Does the VC Data Model Compare to Other Assertion Formats?

If you examine a SAML assertion, an OpenID Connect IDtoken, and a VC/VP pair, all three formats have a lot in common:

They are all secure envelopes for sending set of claims around the internet.
They all rely on cryptographic digital signatures to protect issuer’s integrity of the document.
They all rely on a trust framework: some external system of trust to know which set of participant identifiers to transact with in the first place.

There are also important differences.

Separation of Issuance from Presentation

SAML assertions and IDtokens are messages that carry claims bundled in the moment for one single audience & transaction. This differs from verifiable credentials, which are bundles of claims with longer lifetimes that can be held, stored, and presented to multiple audiences as a separate transaction. This separation between the bundling of claims and the use of the bundles is really what makes the VC data model unique. Advantages that may be unlocked because of this separation include:

Usage blinding – Issuers don’t necessarily know which verifiers their credential goes to.
User-centric analytics – trends of credential use can be evaluated by the user across issuers & verifiers.
Historical presentation - expired statements can still be evaluated.
Selective disclosure - the subject chooses how much of the credential is revealed to the verifier.

Subject Autonomy

The VC data model requires credential subjects to be cryptographic participants. That is huge! IDtokens and SAML assertions communicate their assertion subject using an identifier that is only unique within the identity provider’s namespace. Verifiable credentials require a globally unique identifier for all cryptographic participants, and while the spec does not prescribe a mechanism for resolving identifiers to signature validation keys, the current commonly-accepted format for a globally unique identifier that can “resolve” into a public key for the purpose of digital signature validation is a decentralized identifier, or DID. We are going to cover the DID specification in detail in the next blog, so stay tuned for why that specification is so important.

Thrilling Conclusion

As you can see, VC and VP data formats work hard to make it possible for credentials to be issued, held, and presented at the discretion of individuals. A VC gives structure to the data and binds the data to a given globally unique subject identifer. A VP adds a timely endorsement to the credential that acts as a proof of possession for the credential. The great thing about the VC data model compared to other standards that do something similar, is that VCs are a general model, and can describe many different types of credentials. Of course, for the purposes of interoperability, it is better if folks could agree on what credentials look like within a given community, for example we probably don't want 100 different representations of what a university diploma looks like. The fun part of the next few years (in my opinion) will be figuring out what common instantiations of credentials can and should become household names. There are guardrails we will want to place, and it would be lovely if we can figure out where guardrails need to be before vehicles go over the cliff, rather than after. Early adopters of this technology will have a huge part to play in socializing the kinds of credentials that become defacto standards.

You may also have noticed that key management was a topic completely absent from this blog post, despite frequent use of the term "cryptographic signing". That's because the VC data model considers key management only obliquely. The spec states for example that "relevant metadata about the issuer property is expected to be available to the verifier". In a world where I might present my credential to verifiers with no pre-existing relationship to my issuer, how can that verifier discover the relevant metadata? This is where decentralized identifiers (or DIDs) come in. DIDs are not a required part of a verifiable credential, but they have properties that are very attractive. And you will never guess it but the next blog in the series will be a deep dive on decentralized identifiers! They are a workhorse of a decentralized identity architecture, and an important reason why there is a chance that verifiable credential ecosystems can scale to represent individual people, not just organizations. DIDs are also the bridge between many different decentralized paradigms, between unanchored and static key encoding mechanisms, and between centralized services, through the use of methods like did:web. We will dig into how the mechanism works, and why different did methods might be useful at different times.

While this blog is about the protocols not product, if you want to try issuing a verifiable credential you can do it for free, see below for the link to get started. Once you’ve tinkered with VCs for a while, let us know how it’s going and what tools you need.

Learn more about Decentralized Identity: