Blog Post

Azure PaaS Blog
3 MIN READ

REBALANCE_IN_PROGRESS error in Azure Event Hub with Kafka

yawhu's avatar
yawhu
Icon for Microsoft rankMicrosoft
May 12, 2021

Overview

Azure Event Hub provides the feature to be compatible with Apache Kafka. If you are familiar with Apache Kafka, you may have experience in consumer rebalancing in your Kafka Cluster. Rebalancing is a process that decides which consumer in the consumer group should be responsible for which partition in Apache Kafka. The error may occur when a new consumer joins/leaves a consumer group. When you use Azure Event Hub with Apache Kafka, consumer rebalancing may occur in you Azure Event Hub namespace too.

 

Consumer rebalancing would be a normal process for Apache Kafka. However, if the REBALANCE_IN_PROGRESS error continuously and frequently occurs in your consumer group, you may need to change your configuration to reduce the occurrence frequency of consumer rebalancing. In this post, I would like to share consumer configurations that may cause the REBALANCE_IN_PROGRESS error.

 

Azure Event Hub with Kafka

The concept for Azure Event Hub and Kafka may be different, please refer to the document Use event hub from Apache Kafka app - Azure Event Hubs - Azure Event Hubs | Microsoft Docs for Kafka and Event Hub conceptual mapping.

 

REBALANCE_IN_PROGRESS error

Reference : Apache Kafka

 

REBALANCE_IN_PROGRESS is a consumer group related error. It is affected by the consumer’s configurations and behaviors. Normally, REBALANCE_IN_PROGRESS error might occur if your session timeout's value is set too small or application takes too long to process records in the consumer. During rebalancing, your consumer is not able to read any record back. In this case, you will find your consumer is not reading back any records and constantly seeing REBALANCE_IN_PROGRESS. Below are common examples for when rebalancing will happen :

  • Scenarios 1
    The read back records is empty and there is heartbeat expired error at the same time.
  • Scenarios 2
    The consumer is processing large records with long processing time and there’s timeout error.

Recommendations for REBALANCE_IN_PROGRESS error

  • Recommendations 1 : Increase session.timeout.ms in consumer configuration

    Reference : azure-event-hubs-for-kafka/CONFIGURATION.md at master · Azure/azure-event-hubs-for-kafka · GitHub

    Heartbeat expired error is usually due to application code crash, long processing time or temporary network issue. If the session.timeout.ms is too small and heartbeat is not sent before timeout, you will see heartbeat expired errors and it will then cause rebalancing.

  • Recommendations 2 : Decrease batch size for each poll()
    If the time spent on processing records is too large, try to poll less records at a time.

  • Recommendations 3 : Increase max.poll.interval.ms in consumer configuration and decrease the time spent on processing the read back records

    Reference : azure-event-hubs-for-kafka/CONFIGURATION.md at master · Azure/azure-event-hubs-for-kafka · GitHub

    You may need to add logs to track how long the code takes to process records, please refer to the below example. If processing these records exceed the max.poll.interval.ms, it may cause rebalancing.
    Consumer< String, String> consumer = new KafkaConsumer<>(props);
    consumer.subscribe(Arrays.asList(topic));
    
    While (true) {
        log.info("Message begin processing");
        ConsumerRecords<String, String> records = consumer.poll(100);
    
        for (ConsumerRecord<String, String> record : records) {
            // do something
        }
    
        log.info("Message finish processing");
    }
    
    consumer.close()​
     
Published May 12, 2021
Version 1.0
No CommentsBe the first to comment