Blog Post

Messaging on Azure Blog
2 MIN READ

Azure Event Hubs - Capture event streams in Parquet format to data lakes and warehouses

Kasun_Indrasiri's avatar
Kasun_Indrasiri
Former Employee
May 24, 2022

Azure Event Hubs enables you to stream millions of events per second from any source using Kafka, AMQP or HTTPS protocols. Using Event Hubs capture feature, you can load real-time streaming data to data lakes, warehouses, and other storage services, so that they can be processed or analyzed by analytics services.

 

Today we are excited to announce the preview of Apache Parquet capturing support in Azure Event Hubs.

 

Why Apache Parquet for big data analytics?

Apache Parquet is column-oriented storage format that is designed for efficient data storage and retrieval. It’s open source and is not tied to any processing framework, data model or programming language. Parquet is ideal for storing any kind of big data and is built to support efficient compression and encoding schemes.  

 

Capture streaming data in Parquet format using Event Hubs

Using Azure Event Hubs, no code editor for event processing, you can automatically capture streaming data in an Azure Data Lake Storage Gen2 account in Parquet format. The no code editor allows you to easily develop an Azure Stream Analytics job without writing a single line of code.

 

Once data is captured, any analytics service of your choice can process or analyze Parquet data.

 

Get Started Today

You can try out Apache Parquet capturing feature in Azure Event Hubs using the following links.

 

 

Updated May 24, 2022
Version 4.0

1 Comment

  • Hi Kasun_Indrasiri ,

    I'm trying to understand the difference between the built-in AVRO capture and this method.  It looks like the built-in capture is tight with the event hub and works whenever the event hub is enabled.   Using  Stream Analytics to create Parquet, is less tightly coupled and easy to disable (inadvertently)  creating gaps in the capture.

    Is this  a correct view of things?   Is the avaliabiltiy of the whole   (EH+AVRO Capture)   vs (EH + StreamAnalytics Parquet Capture)  similiar or lowers the introduction of StreamAnaytics the overall availability?

    Thanks

    Pálmi