How to use PolyBase by authenticating via AAD pass-through
Published Sep 18 2019 04:41 PM 5,503 Views
Microsoft

This blog highlights how to load and query using PolyBase by authenticating via Azure Active Directory (AAD) pass-through to Azure Data Lake Storage Gen2. AAD pass-through authentication with PolyBase is much more secure and compliant where you no longer need CONTROL permissions on the data warehouse to initiate a load. You can now securely and immediately achieve high throughput data ingestion with only a few steps:

 

  1. Navigate to your Azure Data Lake Storage (ADLS) Gen2 account in the portal and grant load access to the AAD User or Group by assigning the Storage Blob Data Reader, Contributor, or Owner Role to the ADLS Gen2 account:clipboard_image_1.png
  2. Connect to your data warehouse through the same AAD User or Group with load access to the ADLS Gen2 account and to create the following objects:

Create an external file format -

 

 

 

CREATE EXTERNAL FILE FORMAT CustomerFileFormat
WITH (
    FORMAT_TYPE = DelimitedText,
    FORMAT_OPTIONS (FIELD_TERMINATOR = ',')
);

 

 

 

 

 

 

Note: Requires ALTER ANY EXTERNAL FILE FORMAT permission

 

Create an external data source -

 

 

 

CREATE EXTERNAL DATA SOURCE AADPassthrough_storage
WITH (
 TYPE=hadoop,
 LOCATION='abfss://aadpassthrough@sample.dfs.core.windows.net'
);

 

 

 

Note: Requires ALTER ANY EXTERNAL DATA SOURCE

 

Create an external table for the load -

 

 

 

CREATE EXTERNAL TABLE [dbo].[customer_ext]
(
       NAME   varchar(20) not null,
       AGE  int
)
WITH (
       LOCATION='/customer/',
       DATA_SOURCE = AADPassthrough_storage,
       FILE_FORMAT = CustomerFileFormat
);

 

 

 

 

Note: Requires CREATE TABLE, ALTER ANY SCHEMA, ALTER ANY EXTERNAL DATA SOURCE, and ALTER ANY EXTERNAL FILE FORMAT.

 

No database scoped credential was required to to set up customer_ext external table where you can now load and query from your ADLS Gen2 storage account.

 

2 Comments
Copper Contributor

Hi Kevin, I tried using this process, but unfortunatly I cannot get it to work.

The AAD user is owner of the datalake and sysadmin on the DW, so it should not be a rights issue - but I get this error when attempting to create the external table:

HdfsBridge::isDirExist - Unexpected error encountered checking whether directory exists or not: AbfsRestOperationException: HEAD https://<redacted>dfs.core.windows.net/transformed/<redacted>/2019?timeout=90
StatusCode=403
StatusDescription=This request is not authorized to perform this operation using this permission.
ErrorCode=
ErrorMessage='

Microsoft

Hi Ola, can you confirm what kind of "owner"? It should be Storage Blob Data owner. If you are still running into issues, please submit a support request and we will take a look. Thanks!

Version history
Last update:
‎Apr 20 2020 12:50 PM
Updated by: