Today we are excited to announce the preview of Photon powered Delta Engine on Azure Databricks – fast, easy, and collaborative Analytics and AI service. Built from scratch in C++ and fully compatible with Spark APIs, Photon is a vectorized query engine that leverages modern CPU architecture along with Delta Lake to enhance Apache Spark 3.0’s performance by up to 20x.
The need for faster insight
As organizations worldwide embrace data-driven decision-making, it has become imperative for them to invest in a platform that can quickly analyze massive amounts and types of data. However, this has been a challenge. While storage and network performance have increased 10x, CPU processing speeds have only increased marginally.
Image: Hardware Trends, 2010-2020
This leads to the question if CPUs have become the bottleneck, how can we achieve the next level of performance? The answer with Photon lies in greater parallelism of CPU processing at both the data-level and instruction-level.
Introducing Photon powered Delta Engine
Photon powered Delta Engine is a 100% Apache Spark-compatible vectorized query engine designed to take advantage of modern CPU architecture for extremely fast parallel processing of data. Written from the ground up in C++ to take advantage of modern hardware and capitalize on data-level and CPU instruction-level parallelism, this engine uses optimization techniques described in the paper MonetDB/X100: Hyper-Pipelining Query Execution.
Photon is one of the three key components of Delta Engine in addition to an improved query optimizer and a caching layer. Together, these three components accelerate performance for big data use cases such as data engineering, data science, machine learning, and data analytics.
Azure Databricks was already blazing fast compared to Apache Spark, and now, the Photon powered Delta Engine enables even faster performance for modern analytics and AI workloads on Azure. We ran a 30TB test derived from a TPC-DS* industry-standard benchmark to measure the processing speed and found the Photon powered Delta Engine to be 20x faster than Spark 2.4.
Image: 30TB Elapsed Times, Performance Comparison
Industry-leading Spark-based analytics & AI platform on Azure
With Azure Databricks, customers can set up an optimized Apache Spark environment in minutes. Native integration with Azure Active Directory and other Azure services such as Azure Synapse Analytics and Azure Machine Learning enables customers to build an end-to-end modern data warehouse, machine learning, and real-time analytics solutions.
Now with the preview of Photon powered Delta Engine, customers can benefit from the added performance boost to gain faster insights.