We talk to many customers moving structured data through queues and event streams and topics, and we see a strong desire to create more efficient and less brittle communication paths governed by rich data definitions well understood by all parties. The way those definitions are often shared are schema documents. While there is great need, the available schema options and related tool chains are often not great.
JSON Schema is popular for its relative simplicity in trivial cases, but quickly becomes unmanageable as users employ more complex constructs. The industry has largely settled on "Draft 7," with subsequent releases seeing weak adoption. There's substantial frustration among developers who try to use JSON Schema for code generation or database mapping—scenarios it was never designed for. JSON Schema is a powerful document validation tool, but it is not a data definition language. We believe it's effectively un-toolable for anything beyond pure validation; practically all available code-generation tools agree by failing at various degrees of complexity.
Avro and Protobuf schemas are better for code generation, but tightly coupled to their respective serialization frameworks. For our own work in Microsoft Fabric, we're initially leaning on an Avro-compatible schema with a small set of modifications, but we ultimately need a richer type definition language that ideally builds on people's familiarity with JSON Schema.
This isn't just a Microsoft problem. It's an industry-wide gap. That's why we've submitted JSON Structure as a set of Internet Drafts to the IETF, aiming for formal standardization as an RFC. We want a vendor-neutral, standards-track schema language that the entire industry can adopt.
What Is JSON Structure?
JSON Structure is a modern, strictly typed data definition language that describes JSON-encoded data such that mapping to and from programming languages and databases becomes straightforward. It looks familiar—if you've written "type": "object", "properties": {...} before, you'll feel right at home. But there's a key difference: JSON Structure is designed for code generation and data interchange first, with validation as an optional layer rather than the core concern.
This means you get:
- Precise numeric types:
int32,int64,decimalwith precision and scale,float,double - Rich date/time support:
date,time,datetime,duration—all with clear semantics - Extended compound types: Beyond objects and arrays, you get
set,map,tuple, andchoice(discriminated unions) - Namespaces and modular imports: Organize your schemas like code
- Currency and unit annotations: Mark a
decimalas USD or adoubleas kilograms
Here's a compact example that showcases these features. We start with the schema header and the object definition:
{
"$schema": "https://json-structure.org/meta/extended/v0/#",
"$id": "https://example.com/schemas/OrderEvent.json",
"name": "OrderEvent",
"type": "object",
"properties": {
Objects require a name for clean code generation. The $schema points to the JSON Structure meta-schema, and the $id provides a unique identifier for the schema itself.
Now let's define the first few properties—identifiers and a timestamp:
"orderId": { "type": "uuid" },
"customerId": { "type": "uuid" },
"timestamp": { "type": "datetime" },
The native uuid type maps directly to Guid in .NET, UUID in Java, and uuid in Python. The datetime type uses RFC3339 encoding and becomes DateTimeOffset in .NET, datetime in Python, or Date in JavaScript. No format strings, no guessing.
Next comes the order status, modeled as a discriminated union:
"status": {
"type": "choice",
"choices": {
"pending": { "type": "null" },
"shipped": {
"type": "object",
"name": "ShippedInfo",
"properties": {
"carrier": { "type": "string" },
"trackingId": { "type": "string" }
}
},
"delivered": {
"type": "object",
"name": "DeliveredInfo",
"properties": {
"signedBy": { "type": "string" }
}
}
}
},
The choice type is a discriminated union with typed payloads per case. Each variant can carry its own structured data—shipped includes carrier and tracking information, delivered captures who signed for the package, and pending carries no payload at all. This maps to enums with associated values in Swift, sealed classes in Kotlin, or tagged unions in Rust.
For monetary values, we use precise decimals:
"total": { "type": "decimal", "precision": 12, "scale": 2 },
"currency": { "type": "string", "maxLength": 3 },
The decimal type with explicit precision and scale ensures exact monetary math—no floating-point surprises. A precision of 12 with scale 2 gives you up to 10 digits before the decimal point and exactly 2 after.
Line items use an array of tuples for compact, positional data:
"items": {
"type": "array",
"items": {
"type": "tuple",
"properties": {
"sku": { "type": "string" },
"quantity": { "type": "int32" },
"unitPrice": { "type": "decimal", "precision": 10, "scale": 2 }
},
"tuple": ["sku", "quantity", "unitPrice"],
"required": ["sku", "quantity", "unitPrice"]
}
},
Tuples are fixed-length typed sequences—ideal for time-series data or line items where position matters. The tuple array specifies the exact order: SKU at position 0, quantity at 1, unit price at 2. The int32 type maps to int in all mainstream languages.
Finally, we add extensible metadata using set and map types:
"tags": { "type": "set", "items": { "type": "string" } },
"metadata": { "type": "map", "values": { "type": "string" } }
},
"required": ["orderId", "customerId", "timestamp", "status", "total", "currency", "items"]
}
The set type represents unordered, unique elements—perfect for tags. The map type provides string keys with typed values, ideal for extensible key-value metadata without polluting the main schema.
Here's what a valid instance of this schema looks like:
{
"orderId": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
"customerId": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
"timestamp": "2025-01-15T14:30:00Z",
"status": { "shipped": { "carrier": "Litware", "trackingId": "794644790323" } },
"total": "129.97",
"currency": "USD",
"items": [
["SKU-1234", 2, "49.99"],
["SKU-5678", 1, "29.99"]
],
"tags": ["priority", "gift-wrap"],
"metadata": { "source": "web", "campaign": "summer-sale" }
}
Notice how the choice is encoded as an object with a single key indicating the active case—{"shipped": {...}}—making it easy to parse and route. Tuples serialize as JSON arrays in the declared order. Decimals are encoded as strings to preserve precision across all platforms.
Why Does This Matter for Messaging?
When you're pushing events through Service Bus, Event Hubs, or Event Grid, schema clarity is everything. Your producers and consumers often live in different codebases, different languages, different teams. A schema that generates clean C# classes, clean Python dataclasses, and clean TypeScript interfaces—from the same source—is not a luxury. It's a requirement.
JSON Structure's type system was designed with this polyglot reality in mind. The extended primitive types map directly to what languages actually have. A datetime is a DateTimeOffset in .NET, a datetime in Python, a Date in JavaScript. No more guessing whether that "string with format date-time" will parse correctly on the other side.
SDKs Available Now
We've built SDKs for the languages you're using today: TypeScript, Python, .NET, Java, Go, Rust, Ruby, Perl, PHP, Swift, and C. All SDKs validate both schemas and instances against schemas. A VS Code extension provides IntelliSense and inline diagnostics.
Code and Schema Generation with Structurize
Beyond validation, you often need to generate code or database schemas from your type definitions. The Structurize tool converts JSON Structure schemas into SQL DDL for various database dialects, as well as self-serializing classes for multiple programming languages. It can also convert between JSON Structure and other schema formats like Avro, Protobuf, and JSON Schema.
Here's a simple example: a postal address schema on the left, and the SQL Server table definition generated by running structurize struct2sql postaladdress.json --dialect sqlserver on the right:
| JSON Structure Schema | Generated SQL Server DDL |
|---|---|
| |
The uuid type maps to UNIQUEIDENTIFIER, datetime becomes DATETIME2, and the schema's description fields are preserved as SQL Server extended properties. The tool supports PostgreSQL, MySQL, SQLite, and other dialects as well.
Mind that all this code is provided "as-is" and is in a "draft" state just like the specification set. Feel encouraged to provide feedback and ideas in the GitHub repos for the specifications and SDKs at https://github.com/json-structure/
Learn More
We've submitted JSON Structure as a set of Internet Drafts to the IETF, aiming for formal standardization as an RFC. This is an industry-wide issue, and we believe the solution needs to be a vendor-neutral standard. You can track the drafts at the IETF Datatracker.
- Main site: json-structure.org
- Primer: JSON Structure Primer
- Core specification: JSON Structure Core
- Extensions: Import | Validation | Alternate Names | Units | Composition
- IETF Drafts: IETF Datatracker
- GitHub: github.com/json-structure