ADLSGen2 uses ACLS to control access to storage. In a multi user HDInsight ESP cluster , access control to underlying ADLSG2 storage accounts is enabled via storage ACL passthrough. This example explains how access control is enabled using ACLs for AD users in an HDInsight Spark ESP cluster on Hive external tables stored on an ADLG2 storage layer. We are using Spark 2.4 (HDI 4.0) with Enterprise Security Package and restricted user sparktest1.
Understanding the Use case
In HDInsight 4.0, Spark tables and Hive tables are kept in separate Meta stores. In order to allow Spark user direct access to the hive external table we need to set the configuration to spark.hadoop.metastore.catalog.default=hive while starting the spark shell or submitting spark jobs.
When a Spark job accesses a Hive External Table using Spark Native, Spark user must have privileges to read the data files in the underlying Hive tables as Ranger policy plugin is not available for HMS only applicable if spark interact with HS2 using HWC. If Spark does not have the required privileges on the underlying data files in ADLS Gen2, a Spark SQL query against the table returns an unauthorized exception. Below Test scenarios demonstrate this use case.
For Reading, permission required to the path of External table are Read and Execute.
For Writing, permission required to the path of External table are Read Write and Execute.
Test cases:
Test Case1 : Created Hive External Table from beeline. Login with user sparktest1 in spark for accessing hive external table.
Observation: User Sparktest1 was not able to access the external table due to access restriction.
Below are the steps:
Step 1 : Create Hive External Table.
Step 2 : Login with User sparktest1 for running Spark SQL
Step 3: When running spark SQL Access denied as the user don’t have permission to storage level.
Test Case2: Created Hive External Table provided required permission to user sparktest1 in Storage Level. Login with user sparktest1 for running spark SQL for accessing hive external table.
Observation: User Sparktest1 was able to access the table without any issue.
Below are the Steps:
Step 1: Create Hive External Table.
Step 2: Provide required ACLS Permission to storage level for user sparktest1 to the complete path of external table.
Step 3: Login with user sparktest1 for running spark SQL.
Step 4: Able to access hive external table.
Updated Apr 16, 2021
Version 2.0somnathghosh
Former Employee
Joined March 22, 2021
Analytics on Azure Blog
Follow this blog board to get notified when there's new activity