Blog Post

Internet of Things Blog
7 MIN READ

Azure IoT Operations MQTT Broker: Performance Benchmarking on Throughput and Latency

davidemakenemi's avatar
Jul 02, 2025

1. Introduction

When deploying an MQTT broker in a production environment, understanding its performance characteristics is crucial. Whether you're handling IoT sensor data, real-time event streams, or enterprise messaging, knowing how the broker performs under load helps in optimizing deployments.

In this post, we evaluate the performance of the Azure IoT Operations MQTT Broker (subsequently referred to as Broker for brevity), focusing on:

  • Throughput – How many messages per second the broker can handle.
  • Latency – The time taken for messages to travel from publishers to subscribers.

All tests were conducted using MQTT QoS 1 to ensure consistent balance between reliability and throughput.

By following a structured performance testing approach, we aim to provide insights into how the Broker scales and where potential bottlenecks may arise.

👉 If you're looking for a quick summary, jump to the Key Takeaways section below.

2. Test Setup

For accurate benchmarking, we set up Standard_D4s_v5 virtual machines (VMs) to ensure consistent and efficient message handling. To replicate our performance results, use the same VM SKU and test configuration.

2. 1 Infrastructure configuration

Hardware configuration

  • VM Architecture: x64
  • VM Image: Ubuntu Server 22.04 LTS - x64 Gen2
  • VM SKU: Standard_D4s_v5
  • vCPUs: 4
  • Memory: 16 GiB RAM
  • Networking: All VMs are within the same virtual network (VNet) to minimize latency and reduce external network delays

Software configuration

  • OS Flavor: Ubuntu Server 22.04 LTS
  • Version: 22.04 LTS
  • Kubernetes distribution: K3s
  • Kubernetes version: v1.28.5

2.2 Azure IoT Operation configuration

The Azure IoT Operations configuration defined below is optimized for performance testing and MUST NOT be used in production as TLS encryption, authentication, and diagnostics pods are disabled to reduce variability.

The Broker consists of frontend and backend partitions for optimized message handling: This setup is optimized for a 5-node cluster, ensuring scalability, and redundancy.

Broker Configuration:

  • Frontend:
    • 5 replicas
  • Backend:
    • 5 partitions
    • Redundancy factor of 2
    • 2 workers

Note: Increased redundancy doubles CPU usage, and therefore it also reduces the total available CPU for performing the same workload, potentially impacting overall efficiency.

  • Broker Listener: Configured with a Load Balancer port 1883
  • Broker Nodes: 5 x Azure D4s_v5 VMs (4 vCPUs, 16 GiB memory, Ubuntu 22.04)
  • Client Node: 1 x Azure D16s v5 VM (16 vCPUs, 64 GiB memory for load testing)
    • Note: A more powerful 8-core VM is recommended to prevent the client from becoming a bottleneck, as EMQTT-bench by EMQX has high CPU consumption.

The broker configuration is available in Azure IoT Mqtt Optimization. Json

3. Methodology

To evaluate the performance of IoT Operations MQTT broker we used emqtt-bench, an open source MQTT v5.0 benchmark tool designed by EMQX.   For optimal performance during testing, the inflight queue should be configured to a minimum of 100.

3.1 Client Configuration

For 5-node cluster testing, a dedicated high-performance VM is required to act as the client. This VM must be separate from the cluster to prevent resource contention, ensuring that benchmarking reflects the broker's actual optimal performance.

3.2 Understanding the Performance Metrics

  1. Maximum Throughput – Measures the highest number of messages per second the broker can process.  

Note: Optimal performance requires finding a balance—publishers should send messages fast enough to fully utilize subscribers without overwhelming them.

  1. Average Latency– The time, in milliseconds, it takes for a message to travel from a publisher to a subscriber.
  2. Message Size – Tested with 16 Bytes, 8 KB, and 255 KB payloads to understand size impact.   Evaluates how different payload sizes impact throughput and latency.
  3. Data Throughput – Measures the total volume of data transmitted per second, expected in megabytes per second (MB/sec).

3.3 Test Scenarios

We tested the broker under different conditions to observe how it handles increasing workloads:

  1. Varying Publisher Rates – Analyzing throughput changes with increasing message rates.
  2. Different Payload Sizes – Measuring the impact of small (16 B), medium (8 KB), and large (255 KB) payloads.
  3. Fan-In / Balanced / Fan-Out – Comparing multiple publishers to one subscriber (fan-in) vs. one publisher to many subscribers (fan-out) vs an equal number of publishers and subscribers (balanced).
  4. Publisher / Subscriber Configuration – Vary number of publishers and subscribers across the three scenarios.
  5. QoS - All tests were performed using MQTT QoS 1, which ensures at least once message delivery. This strikes a balance between reliability and performance, making it more representative of real-world production scenarios where message loss is unacceptable, but the overhead of QoS 2 is not justified.

 

We measured broker efficiency using different payload sizes across different publisher-to-subscriber ratios. The Fan-In test evaluated performance with a high number of publishers sending messages to a single subscriber. The Fan-Out stress test analyzed message distribution from a limited number of publishers to many subscribers under high throughput conditions. The Balanced test simulated a mixed workload with equal publishers and subscribers.

4. Results

Detailed Performance Metrics: Fan-In, Fan-Out, and Balanced Scenarios

 
ScenarioConfigurationPayload SizeMax Throughput (msg/sec)Data Throughput (MB/sec)Average Latency (ms)Workload Description
Fan-In1000 pub 1 sub16 B41,3520.63124High-Load Fan-In
 1000 pub 1 sub8 KB14,439112.6726High-Load Fan-In
 1000 pub 1 sub255 KB992246.8520High-Load Fan-In
Balanced1 pub 1 sub16 B50,7396.492Balanced Mixed-Load
 1 pub 1 sub8 KB9,50077.810Balanced Mixed-Load
 1 pub 1 sub255 KB1,314327.08540Balanced Mixed-Load
 100 pub 100 sub16 B279,9494.27350Balanced Mixed-Load
 100 pub 100 sub8 KB34,193266.95139Balanced Mixed-Load
 100 pub 100 sub255 KB2,871715.422,800Balanced Mixed-Load
Fan-Out1 pub 1000 sub16 B42,0000.644Large-scale Broadcast Fan-Out
 1 pub 1000 sub8 KB15,003117.256Large-scale Broadcast Fan-Out
 1 pub 1000 sub255 KB1,000249.86130Large-scale Broadcast Fan-Out

5.    Key Takeaways

  • Takeaway 1:  Data Throughput Scales with Payload Size. Even though the number of messages per second drops with larger payloads, data throughput (MB/sec) increases significantly. For example:
    • Fan-In at 255 KB: 246.8 MB/sec
    • Balanced at 255 KB: 715.4 MB/sec

               

                                                                                                               

  • Takeaway 2: Performs Best in Low-Latency Use Cases.

    When message sizes are small (e.g. 16 B, 8 KB) and the topology is lightweight (e.g. 1 pub to 1 sub), the broker achieves:

    • Avg latency as low as 1-2 ms
    • Throughput over 270,000 msg/sec (Balanced scenario at 16 B)

     Ideal low-latency use cases:

    • Real-time control systems (e.g. robotic arm commands, PLC feedback loops)
    • Smart home device synchronization
    • Autonomous vehicle telemetry coordination
    • Industrial automation events (e.g. triggers from sensors to actuators)


    For time-sensitive operations, our broker provides sub-10 ms latencies and massive message fanout capability, even under constrained payload sizes.

     

  • Takeaway 3: Fan-In Saturates Faster Than Fan-Out

    In QoS 1 tests, we observed Fan-In topologies (1000 devices → 1 endpoint) hit latency walls earlier than Fan-Out topologies (1 device → 1000 endpoints), even with similar message throughput.

    • Fan-In (8 KB): 14,439 msg/sec @ 26 ms latency
    • Fan-Out (8 KB): 15,003 msg/sec @ only 6 ms latency

     What this shows:

    • In Fan-In, the broker handles thousands of simultaneous inbound QoS 1 acknowledgments — creating coordination pressure.
    • In Fan-Out, a single publisher sends at a controlled rate, making it easier for the broker to fan out efficiently.


    We designed our broker to sustain intelligent traffic shaping and are continuing to enhance its performance under Fan-In workloads where coordination pressure is highest.

     

6. Optimization Strategies

The Azure IoT Operations MQTT broker is built to support scalable, high-throughput, and low-latency messaging. To harness its full potential across diverse workload patterns, optimization should focus on balanced resource utilization and minimizing message delivery bottlenecks.

 

  1. Maximize Throughput without overload

Fan-Out scenarios achieved strong throughput with consistently low latency, even under high subscriber counts. While they didn’t reach the raw message rate of Balanced workloads, their efficiency under broadcast pressure makes them ideal for scenarios requiring timely delivery to many endpoints.

Recommended Actions:

  • Batch and Compress Messages: Reduces overhead, improving payload transmission rates.
  • Balance Publish Load: Distribute publishers evenly across broker nodes to avoid overloading a single point of ingestion.

 

  1. Maintain Low Latency for Real-Time Use Cases

The broker excels in low-latency performance for small payloads and tightly coupled pub-sub pairs — as seen in 1:1 scenarios with 16 B payloads achieving 2–6 ms latency. These characteristics are crucial for real-time, control-plane workloads.

Recommended Actions:

  • Use Smaller Payloads for Time-Sensitive Ops: Critical for scenarios like robotics, actuator control, or telemetry alerting.
  • Load Balance Across Nodes: Adjust broker cardinality, including frontend replicas (for client connection distribution) and backend partitions (for message throughput scaling) to ensure even load distribution across nodes and optimal performance.
  • Enable MQTT Persistent Sessions: Minimizes reconnection overhead for frequently offline clients, learn how to enable persistent sessions with Mosquitto CLI by setting -c and --session-expiry-interval

 

  1. Optimize Deployment Scale for Workload Demands

Scalability depends on configuring the right cardinality — that is, the number of frontend replicas, backend partitions, and compute resources — to match your connection and throughput requirements.

Recommended Actions:

  • High Connection Load: Scale frontend replicas to match node count (e.g., 3 replicas for 3 nodes) to distribute client connections evenly.
  • High Message Throughput: Increase backend partitions to parallelize message processing (e.g., start with 1 partition per node and scale as needed).
  • Heavy Payload Scenarios: Allocate more memory and CPU to worker pods to avoid slowdowns from large payload serialization and transmission.
  • Backend Resiliency: Ensure redundancyFactor remains at the default of 2 (or more) so that each partition has at least two replicas, enabling failover protection without additional configuration

For detailed guidance on these optimizations, visit our documentation: Learn more →

7. Conclusion

The Azure IoT Operations MQTT broker is engineered for high performance, scalability, and efficiency, as demonstrated through rigorous benchmarking. In high-throughput balanced configurations, it sustained up to 279,949 messages/sec with 16 B payloads, showcasing best-in-class throughput for high-volume, symmetric pub-sub workloads. For bandwidth-heavy use cases, the broker handled up to 715 MB/sec (255 KB payloads), proving its scalability for large data transfers.

Balanced 1:1 scenario also delivered predictable low-latency performance, with average latencies as low as 2 ms, making them ideal for real-time messaging. Meanwhile, Fan-In configurations remain optimal for centralized data aggregation tasks like telemetry logging, handling tens of thousands of messages/secs with acceptable latency.

To maximize performance, we recommend key optimization strategies including load balancing, latency reduction, and workload-specific tuning. These approaches ensure efficiency at scale—whether you're managing high connection loads, scaling throughput, or handling large payloads in real-world deployments. For in-depth configuration guidance, visit our documentation.

Updated Jun 27, 2025
Version 1.0
No CommentsBe the first to comment