Synapse Studio error while trying to read data from Storage Account using SQL On Demand

Published Jul 09 2020 07:47 AM 5,368 Views
Microsoft

ErrorFile 'https://storageaccount.dfs.core.windows.net/parquettest/filename.snappy.parquet' cannot be opened because it does not exist or it is used by another process.

 

Scenario: Storage is with the firewall set to use Selected Network\Allow Trusted Microsoft services. Be aware If that is the case you will need to create the MSI. In order to create MSI on SQL On Demand follow the script example steps. 

The error happens when you try to run the Select on the file that exist inside of the storage account from Synapse Studio using SQL On Demand.

 

Before I proceed with the step by step. Another link that may help you: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/resources-self-help-sql-on-demand

 

Here it goes - how to do it:

 

  1. First check if your account is Data Lake Storage Gen2. This is defined during the creation of the storage account. Fig. 1:

datalake.png

Fig. 1 Data Lake

 

  1. I am using for this test Selected Network\Allow Trusted Microsoft services. But It should work with All Networks as well. Fig. 2.firewall.pngFig. 2  Firewall
  2. Add the RBAC permission to your workspace. Fig. 3.workspace_permission.pngFig. 3  RBAC
  3. Open Synapse Studio
    1. Open Data option for SQL on Demand
    2. Create a database
    3. Point to the database created.
    4. Create the MSI which requires the master key
    5. Create the datasource.
    6. Run OpenRowset using that credential as data source.

 

Follow the script:

 

CREATE DATABASE [YourDatabase]

GO

-------Change to YourDatabase

CREATE MASTER KEY ENCRYPTION BY PASSWORD'XXXXX!0000';

-- Create a database scoped credential.

CREATE DATABASE SCOPED CREDENTIAL AppCred WITH IDENTITY = 'Managed Identity'

 

script.png

 

 

Note my Storage Account is called Criativity, you should use for the MSI .dfs instead of .blob.

 

 

CREATE EXTERNAL DATA SOURCE YourDataSourceName_criativity

WITH (

LOCATION = 'https://criativity.dfs.core.windows.net',

cREDENTIAL = [AppCred]

)

SELECT  TOP 100 *

FROM OPENROWSET(

        BULK 'parquet/*.parquet',

        DATA_SOURCE = 'YourDataSourceName_criativity',

        FORMAT = 'PARQUET'

as X

 

That is it!

 

Thanks to the collegues Charl Roux and Veljko Vasic. :)

 

Liliam Leme

UK Engineer

 

 

 

 

 

2 Comments
Version history
Last update:
‎Jul 10 2020 05:51 AM
Updated by: