Apache Kafka on Azure HDInsightwas added last year as a preview service to help enterprises create real-time big data pipelines. Since then, large companies such as Toyota, Adobe, Bing Ads, and GE have been using this service in production to process over a million events per sec to power scenarios for connected cars, fraud detection, clickstream analysis, and log analytics. HDInsight has worked very closely with these customers to understand the challenges of running a robust, real-time production pipeline at an enterprise scale. Using our learnings, we have implemented key features in the managed Kafka service on HDInsight, which is now generally available.
Running big data streaming pipelines is hard. Doing so with open source technologies for the enterprise is even harder. Apache Kafka, a key open source technology, has emerged as the de-facto technology for ingesting large streaming events in a scalable, low-latency, and low-cost fashion. Enterprises want to leverage this technology, however, there are many challenges with installing, managing, and maintaining a streaming pipeline. Open source bits lack support and in-house talent needs to be well versed with these technologies to ensure the highest levels of up-time. Every second an ingestion pipeline is down, data is lost.