Today we are excited to announce the preview of Photon powered Delta Engine on Azure Databricks – fast, easy, and collaborative Analytics and AI service. Built from scratch in C++ and fully compatible with Spark APIs, Photon is a vectorized query engine that leverages modern CPU architecture along with Delta Lake to enhance Apache Spark 3.0’s performance by up to 20x.
As organizations worldwide embrace data-driven decision-making, it has become imperative for them to invest in a platform that can quickly analyze massive amounts and types of data. However, this has been a challenge. While storage and network performance have increased 10x, CPU processing speeds have only increased marginally.
Image: Hardware Trends, 2010-2020
This leads to the question if CPUs have become the bottleneck, how can we achieve the next level of performance? The answer with Photon lies in greater parallelism of CPU processing at both the data-level and instruction-level.
Photon powered Delta Engine is a 100% Apache Spark-compatible vectorized query engine designed to take advantage of modern CPU architecture for extremely fast parallel processing of data. Written from the ground up in C++ to take advantage of modern hardware and capitalize on data-level and CPU instruction-level parallelism, this engine uses optimization techniques described in the paper MonetDB/X100: Hyper-Pipelining Query Execution.
Photon is one of the three key components of Delta Engine in addition to an improved query optimizer and a caching layer. Together, these three components accelerate performance for big data use cases such as data engineering, data science, machine learning, and data analytics.
Image: Delta Engine’s 3 components: 1) Query optimizer,2) Photon native execution engine and 3) Caching
Azure Databricks was already blazing fast compared to Apache Spark, and now, the Photon powered Delta Engine enables even faster performance for modern analytics and AI workloads on Azure. We ran a 30TB test derived from a TPC-DS* industry-standard benchmark to measure the processing speed and found the Photon powered Delta Engine to be 20x faster than Spark 2.4.
Image: 30TB Elapsed Times, Performance Comparison
With Azure Databricks, customers can set up an optimized Apache Spark environment in minutes. Native integration with Azure Active Directory and other Azure services such as Azure Synapse Analytics and Azure Machine Learning enables customers to build an end-to-end modern data warehouse, machine learning, and real-time analytics solutions.
Now with the preview of Photon powered Delta Engine, customers can benefit from the added performance boost to gain faster insights.
Start today by requesting access to the Photon Preview here. Learn more about modern data engineering with Azure Databricks by attending a live event or viewing this webinar and ask your questions on our next Azure Databricks Office Hours.
*Since these are results of a test derived from TPC-DS, they may not be compared to published TPC-DS results.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.