Announcing Azure Schema Registry in Azure Event Hubs – GA
Published Nov 02 2021 08:03 AM 4,228 Views

Modern event streaming scenarios often use structured data in the event or message payload. Using schema-driven formats such as Avro to serialize and deserialize data is becoming increasingly important when exchanging such structured data between producer and consumer applications. Schema registries such as Azure Schema Registry for Event Hubs play a vital role in managing these schemas by providing a central repository and allowing the producer applications to make schema available for its consumers without directly managing or sharing schemas.

Today, we are announcing the general availability of the Azure Schema Registry in Azure Event Hubs.


Why Azure Schema Registry?

Azure Schema Registry is a feature of Event Hubs, which provides a central repository for schema documents for event-driven and messaging-centric applications. Azure Schema Registry allows the producer and consumer applications to exchange data without having to manage and share the schema.




Here are some of the key reasons for using Azure Schema Registry in your event streaming applications.


Validation of events stream data

When you use Azure Schema Registry, your producer and consumer applications can be implemented such that they validate event data that they publish or consume. For structured data that you produce or consume through Event Hubs, you can use schema-driven formats such as Avro and manage those schemas using Azure Schema Registry, irrespective of the event streaming protocol (AMQP, Kafka etc.) that you use to stream data. This provides better interoperability, prevents publishing or consuming invalid data, and prevents incompatibilities between producers and consumers.


Make schema available to consumers

The structured event data published to the Event Hubs needs to be consumed by the downstream consumer applications. By using Azure Schema Registry, you can make the schema available to existing and new consumers, so that they can build the consumer logic and validate event data that they are receiving.


Reducing per-event data overage
In schema-driven serialization/de-serialization of event payloads, if you don’t use a schema registry to share schemas, you need to append schema information to each event payload, which drastically increases the per-event data overage.
Azure Schema Registry reduces the event data overage by enabling the producers to only send a reference (schema ID) to the schema information with each event payload, which can be used by the consumer to retrieve the corresponding schema via Azure Schema Registry to deserialize event payload.


Schema Management
Azure Schema Registry for Event Hubs provides you a unified schema management experience. It provides a simple governance framework for reusable schemas and defines the relationship between schemas through a grouping construct called schema groups. Based on the business use cases, you can organize schemas into schema groups, define compatibility modes per each group, and create multiple versions of the schemas adhering to the compatibility mode specified in the schema group.


Schema Evolution
Schemas need to evolve with the business requirement of producers and consumers. Azure Schema Registry supports schema evolution by introducing compatibility modes feature at the schema group level. Compatibility modes allow producers and consumers to evolve independently. Based on the compatibility mode defined in the schema group, only certain operations are allowed when modifying and creating new schema versions. Azure Schema Registry currently supports Backward, Forward, and No compatibility modes.


Open Standards and Interoperability
Even though schema registries are increasingly popular, there has so far been no open and vendor-neutral standard for a lightweight interface optimized for this use-case. Microsoft has submitted the interface of the Azure Schema Registry to the Cloud Native Foundation's "Cloud Events" project in June 2020.


Event streaming with schema validation

The information flow of using Azure Schema Registry for event streaming in Azure Event Hubs is the same across different protocols that you use to publish or consume events. The following diagram shows the information flow of a Kafka event producer and consumer that uses Azure Schema Registry.



The information flow starts from the producer where the Kafka producer serializes the event data using the schema document. The producer then prepends the schema ID to the serialized event payload. Once the consumer receives the event, it can resolve the corresponding schema from the schema registry and deserialize the event payload.


Schema validation and event streaming for Kafka APIs

Azure Schema Registry for Event Hubs provides seamless integration with your Kafka Applications. If use Apache Kafka® with the Java client today and you are already using a schema-registry backed serializer, you will find that it's trivial to switch to the Azure Schema Registry's Avro serializer just by modifying the client configuration.


Kafka Producer
You can update your Kafka producer applications configuration to use Azure Schema Registry as shown in the following code snippet.



// Schema Registry configs
props.put(KafkaAvroSerializerConfig.SCHEMA_REGISTRY_URL_CONFIG, schemaRegistryUrl);
props.put(KafkaAvroSerializerConfig.AUTO_REGISTER_SCHEMAS_CONFIG, true);
props.put(KafkaAvroSerializerConfig.SCHEMA_REGISTRY_CREDENTIAL_CONFIG, credential);
props.put(KafkaAvroSerializerConfig.SCHEMA_GROUP_CONFIG, schemaGroup);




Then you can publish event data to Azure Event Hubs using its Kafka API with a strongly typed producer as shown below.



KafkaProducer<String, Order> producer = new KafkaProducer<String, Order>(props);
Order order = new Order("ID-" + i, 10.99 + i, "Sample order -" +i, "Address " + i);
ProducerRecord<String, Order> record = new ProducerRecord<String, Order>(topicName, key, order);




Kafka Consumer
Similarly, you need to update a few configurations at your Kafka consumer application to consume and deserialize event data using Azure Schema Registry.



// Schema Registry configs
props.put("schema.registry.url", registryUrl);
props.put(KafkaAvroDeserializerConfig.SCHEMA_REGISTRY_CREDENTIAL_CONFIG, credential);
props.put(KafkaAvroDeserializerConfig.AVRO_SPECIFIC_READER_CONFIG, true);
final Consumer<String, Order> consumer = new KafkaConsumer<String, Order>(props);
ConsumerRecords<String, Order> records = consumer.poll(Duration.ofMillis(5000));



For a complete sample on using Azure Schema Registry with Event Hubs Kafka API, please check this  quickstart guide


Schema validation and event streaming with Azure SDKs


You can use Azure SDKs to directly interact with Azure Schema Registry for Event Hubs. Therefore, you can build schema validations for applications that use the native protocol (AMQP) of Event Hubs, using Azure Schema Registry SDKs.

The Avro serializers for all Azure SDK languages implement the same base class or pattern that is used for serialization throughout the SDK and can therefore be used with all Azure SDK clients that already have or will offer object serialization extensibility.


Serializing data with Azure Schema Registry SDK - .NET
With the Azure Schema Registry SDK, the serializer interface turns structured data into a byte stream (or byte array) and back. It can therefore be used anywhere and with any framework or service. Here is a snippet that uses the serializer with automatic schema registration in C#:



// Create a schema registry client that you can use to serialize and validate data.  
var schemaRegistryClient = new SchemaRegistryClient(endpoint: schemaRegistryEndpoint, credential: new DefaultAzureCredential());
// Create a new order object using the generated type/class 'Order'. 
var sampleOrder = new Order { id = "12345", amount = 55.99, description = "This is a sample order." };

using var serializedMemoryStream = new MemoryStream();
// Create an Avro object serializer using the Schema Registry client object. 
var producerSerializer = new SchemaRegistryAvroObjectSerializer(schemaRegistryClient, schemaGroup, new SchemaRegistryAvroObjectSerializerOptions { AutoRegisterSchemas = true });           
// Serialize events data to the memory stream object. 
producerSerializer.Serialize(serializedMemoryStream, sampleOrder, typeof(Order), CancellationToken.None);




If you want to publish serialized event payload to Event Hubs via native (AMQP) protocol, you can do so as you would send any event data to Event Hub.



byte[] _serializedEventData = serializedMemoryStream.ToArray();
// Create event data with serialized data and add it to an event batch. 
eventBatch.TryAdd(new EventData(_serializedEventData));




De-serializing data with Azure Schema Registry SDK - .NET
You can use Azure Schema Registry SDKs to deserialize the event data as shown in the following C# code snippet.



// Retrieve serialized event data and convert it to a byte array. 
byte[] _serializedEventData = eventArgs.Data.Body.ToArray();
using var serializedMemoryStream = new MemoryStream(_serializedEventData);

var consumerSerializer = new SchemaRegistryAvroObjectSerializer(schemaRegistryClient, schemaGroup, new SchemaRegistryAvroObjectSerializerOptions { AutoRegisterSchemas = false });
serializedMemoryStream.Position = 0;

// Deserialize event data and create order object using schema registry. 
Order sampleOrder = (Order)consumerSerializer.Deserialize(serializedMemoryStream, typeof(Order), CancellationToken.None);




For a complete sample on using Azure Schema Registry SDKs and event streaming using Event Hubs native protocol, please follow this quick start guide.  


Get Started Today

To try out and learn more about Azure Schema Registry GA in Event Hubs check out the below links.






Version history
Last update:
‎Nov 02 2021 08:06 AM
Updated by: