What’s New with Azure HDInsight at //BUILD 2019
Published May 08 2019 03:45 AM 5,487 Views



//BUILD 2019, SEATTLE, Washington, May 08, 2019 – Welcome to the //Build Conference 2019. Just a few short weeks ago, we announced the general availability of Apache Hadoop 3.0 on Azure HDInsight (details here). Today, we are excited to bring several new features that will enable you to run your open-source analytics applications faster, cost lesser and easier to visualize and troubleshoot.


HDI logo.png






Run Faster with HBASE WRITE ACCELERATION (Preview)

The Write Ahead Log (WAL) in HBase plays a critical role in avoiding data consistency issues during Region Server crashes or unavailability. To ensure that there is no data loss, for every Put/Delete request in HBase, edits are written to WAL before the actual data is written to MemStore. In standard implementation of HDInsight, Write Ahead Logs (WALs) are stored as page blobs in Azure standard Storage. This works very well for most applications, but may not meet the needs of mission critical applications that require consistent low latency and high throughput I/O. Inconsistent latencies for WAL writes can be a bottleneck for overall write performance in HBase clusters. To improve the WAL write latencies in high performance clusters, we are introducing a new feature “Accelerated Writes” This feature attaches a Premium SSD Managed disk to every Region Server (worker node) & Write Ahead Log (WAL) is configured to be written to HDFS mounted on premium managed disks instead of standard Azure page blobs. Premium managed disks are SSD-based and offer excellent I/O performance with fault tolerance. As a result, customers will see low write latencies and better resiliency for HBase based workloads. To learn more please see HBase Accelerated Writes.


YCSB Perf Image.png

Yahoo Cloud Services Benchmark: 4 D4V2 region nodes with 100-byte row size


Use AutoScale to drive higher utilization at a lower cost (Preview)

HDInsight Autoscale enables enterprises to become more productive and cost-efficient by automatically scaling clusters up or down based on the load or a customized schedule. HDInsight Autoscale allows you to customize the scaling strategy based on your own scenario. You can either define the maximum and minimum based on your cost requirements or define a schedule for each weekday to meet your own business objectives. After that, you can rest assured right scaling decision will be made and you will pay only what you need. To learn more, check out the documentation here.


Visualize Spark jobs and data in Jupyter

Data scientists use Jupyter Notebooks on HDInsight Spark to quickly explore data sets, perform trend analysis, or try different machine learning models. Without the ability to track the status of Spark jobs and intermediate data, data scientists find it difficult to monitor and optimize their queries.

Jupyter Notebooks on Azure HDInsight now enables cutting-edge job execution and visualization experiences including Spark job progress and cell execution status indicators and native matplotlib support for PySpark DataFrame. To learn more, please see our blog here.


HDInsight at //Build 2019

The HDInsight team will be available at //BUILD 2019 during all three days of the conference. Please take the time to come by and meet the team at the HDInsight booth on the Expo floor. We would be happy to show you more of what we have to offer and answer any questions you might have. And, while you are at //BUILD, do not miss the following sessions and demos:

  • Data Architect’s guide for successful Open Source patterns in Azure (link) on May 8th, 2019 @ 5pm PT.
  • Demo: Monitoring hundreds of HDInsight clusters with Azure Log Analytics
  • Demo: Data Accelerator on Azure HDInsight


Get Started Now

We are excited to see what you will build next with Azure HDInsight. Read this developer guide and follow the quick start guide to learn more about implementing open source analytics pipelines on Azure HDInsight. Stay up-to-date on the latest Azure HDInsight news and features coming up in the near future by following us on Twitter #HDInsight and @AzureHDInsight. For questions and feedback, please reach out to AskHDInsight@microsoft.com.


About Azure HDInsight

Azure HDInsight is an enterprise-ready platform for open source analytics that enables customers to easily run popular Apache open source frameworks including Apache Hadoop, Spark, Kafka, and others. Brought to you in close partnership with Cloudera, the service is available in 30 public regions and Azure Government Clouds in the US and Germany. Azure HDInsight powers mission critical applications for a wide range of sectors and use cases including ETL, streaming, and interactive querying.

Version history
Last update:
‎Mar 26 2020 01:03 PM
Updated by: