With the release of Cloudera Enterprise Data Hub 5.11, you can now run Spark, Hive, and MapReduce workloads in a Cloudera cluster on Azure Data Lake Store (ADLS). Running on ADLS has the following benefits:
Grow or shrink a cluster independent of the size of the data.
Data persists independently as you spin up or tear down a cluster. Other clusters and compute engines, such as Azure Data Lake Analytics or Azure SQL Data Warehouse, can execute workload on the same data.
Enable role-based access controls integrated with Azure Active Directory and authorize users and groups with fine-grained POSIX-based ACLs.
Cloud HDFS with performance optimized for analytics workload, supporting reading and writing hundreds of terabytes of data concurrently.
No limits on account size or individual file size.
Data is encrypted at rest by default using service-managed or customer-managed keys in Azure Key Vault, and is encrypted with SSL while in transit.
High data durability at lower cost as data replication is managed by Data Lake Store and exposed from HDFS compatible interface rather than having to replicate data both in HDFS and at the cloud storage infrastructure level.