Migrating data from Kafka clusters to Azure Event Hubs using MirrorMaker 2 (MM2)
Published Dec 14 2022 07:36 AM 1,437 Views
Microsoft

Azure Event Hubs is a fully managed, real-time data streaming service that can stream millions of events per second from any source using native AMQP or Apache Kafka API to build data streaming pipelines.

 

Azure Event Hubs enables you to connect your Kafka applications and clients using Azure Event Hubs for Apache Kafka® with far lower cost and better performance. You can migrate existing Kafka workloads to Event Hubs with zero code change and use Event Hubs as the Kafka ingestion layer without having to manage Kafka clusters. 

 

When you migrate from on-premise or managed Kafka services to Event Hubs, one of the main challange is how you migrate existing Kafka topics, partitions and event data to Event Hubs. 

 

With the introduction of Log Compaction (preview) feature of Event Hubs, you can use Apache MirrorMaker 2 to replicate metadata and data between an existing Kafka clusters and Event Hubs. 

 

Use MirrorMaker2(MM2) to migrate metadata and event data to Azure Event Hubs 

Apache MirrorMaker 2 dynamically detects changes to topics and ensures source and target topic properties are synchronized, including offsets and partitions. It can be used to replicate data bi-directionally between the Kafka cluster and Event Hubs namespace.
Therefore MirrorMaker 2 can be used to migrate data from the existing Kafka cluster to Event Hubs as well as to keep both Kafka Cluster and Event Hubs data in sync.
In this article, we focus on uni-directional data migration from existing Kafka clusters to Event Hubs. As illustrated below, you can replicate both metadata and event data from the Kafka cluster to Event Hubs using MirrorMaker 2. 

 

mm2-kafka-to-eh.png

When it comes to deploying and running MirrorMaker 2, there are several modes that you choose from: 

  • Dedicated MirrorMaker 2 cluster. 
  • Running MM2 Inside a distributed Kafka Cluster. 

The most intuitive way to run MM2 would be to run it as a dedicated MM2 cluster. To find a complete tutorial on using MM2 in dedicated mode with Azure Event Hubs see: Replicate data from Kafka cluster to Event Hubs using MirrorMaker 2. 

 

Deploying a dedicated MM2 cluster

Apache Kafka distribution comes with connect-mirror-maker.sh script that is bundled with the Kafka library that implements a distributed MirrorMaker 2 cluster. It manages the Connect workers internally based on a configuration file. Internally MirrorMaker driver creates and handles pairs of each connector – MirrorSource Connector, MirrorSink Connector, MirrorCheckpoint Connectorn and MirrorHeartbeat Connector.

 

The topology and configuration of the MirrorMaker 2 replication are configured using the MM2 properties file that you create. It has the following structure: 

 

# specify any number of cluster aliases
clusters = source, destination

# connection information for each cluster
# This is a comma separated host:port pairs for each cluster
source.bootstrap.servers = your-kafka-cluster-hostname:9092
#source.security.protocol=SASL_SSL
#source.sasl.mechanism=PLAIN
#source.sasl.jaas.config=<replace sasl jaas config of your Kafka cluster>;

destination.bootstrap.servers = <your-enventhubs-namespace>.servicebus.windows.net:9093
destination.security.protocol=SASL_SSL
destination.sasl.mechanism=PLAIN
destination.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username='$ConnectionString' password='<Your Event Hubs namepsace connection string.';

# enable and configure individual replication flows
source->destination.enabled = true

# regex which defines which topics gets replicated. For eg "foo-.*"
source->destination.topics = .*

#destination->source.enabled = true
#destination->source.topics = .*

# Setting replication factor of newly created remote topics
replication.factor=3

############################# Internal Topic Settings  #############################
# The replication factor for mm2 internal topics "heartbeats", "destination.checkpoints.internal" and
# "mm2-offset-syncs.destination.internal"
# For anything other than development testing, a value greater than 1 is recommended to ensure availability such as 3.
checkpoints.topic.replication.factor=3
heartbeats.topic.replication.factor=3
offset-syncs.topic.replication.factor=3

# The replication factor for connect internal topics "mm2-configs.destination.internal", "mm2-offsets.destination.internal" and
# "mm2-status.destination.internal"
# For anything other than development testing, a value greater than 1 is recommended to ensure availability such as 3.
offset.storage.replication.factor=3
status.storage.replication.factor=3
config.storage.replication.factor=3

 

As part of the MM2 configuration, you specify your Kafka cluster as the source and Event Hubs namespace as the destination. Then you can simply run MM2 in dedicated mode to start the replication process. 

Migrating Kafka application to Event Hubs 

Once you migrate/replicate data from the Kafka cluster to Event Hubs, you can simply migrate all existing Kafka applications just by changing the configuration but without any code changes to those applications. 

To find more details on how to migrate existing Kafka client applications to Event Hubs see: Azure Event Hubs - Apache Kafka Migration Guide. 

 

Next Steps

You can find a complete tutorial on using Apache MirrorMaker 2 in dedicated mode to migrate data from a Kafka cluster to Azure Event Hubs see: Replicate data from Kafka cluster to Event Hubs using MirrorMaker 2. 

 

Co-Authors
Version history
Last update:
‎Dec 14 2022 07:37 AM
Updated by: