Blog Post

Azure Synapse Analytics Blog
2 MIN READ

Synapse Studio error while trying to read data from Storage Account using SQL On Demand

Liliam_C_Leme's avatar
Liliam_C_Leme
Icon for Microsoft rankMicrosoft
Jul 09, 2020

ErrorFile 'https://storageaccount.dfs.core.windows.net/parquettest/filename.snappy.parquet' cannot be opened because it does not exist or it is used by another process.

 

Scenario: Storage is with the firewall set to use Selected Network\Allow Trusted Microsoft services. Be aware If that is the case you will need to create the MSI. In order to create MSI on SQL On Demand follow the script example steps. 

The error happens when you try to run the Select on the file that exist inside of the storage account from Synapse Studio using SQL On Demand.

 

Before I proceed with the step by step. Another link that may help you: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/resources-self-help-sql-on-demand

 

Here it goes - how to do it:

 

  1. First check if your account is Data Lake Storage Gen2. This is defined during the creation of the storage account. Fig. 1:

Fig. 1 Data Lake

 

  1. I am using for this test Selected Network\Allow Trusted Microsoft services. But It should work with All Networks as well. Fig. 2.Fig. 2  Firewall
  2. Add the RBAC permission to your workspace. Fig. 3.Fig. 3  RBAC
  3. Open Synapse Studio
    1. Open Data option for SQL on Demand
    2. Create a database
    3. Point to the database created.
    4. Create the MSI which requires the master key
    5. Create the datasource.
    6. Run OpenRowset using that credential as data source.

 

Follow the script:

 

CREATE DATABASE [YourDatabase]

GO

-------Change to YourDatabase

CREATE MASTER KEY ENCRYPTION BY PASSWORD'XXXXX!0000';

-- Create a database scoped credential.

CREATE DATABASE SCOPED CREDENTIAL AppCred WITH IDENTITY = 'Managed Identity'

 

 

 

Note my Storage Account is called Criativity, you should use for the MSI .dfs instead of .blob.

 

 

CREATE EXTERNAL DATA SOURCE YourDataSourceName_criativity

WITH (

LOCATION = 'https://criativity.dfs.core.windows.net',

cREDENTIAL = [AppCred]

)

SELECT  TOP 100 *

FROM OPENROWSET(

        BULK 'parquet/*.parquet',

        DATA_SOURCE = 'YourDataSourceName_criativity',

        FORMAT = 'PARQUET'

as X

 

That is it!

 

Thanks to the collegues Charl Roux and Veljko Vasic. 🙂

 

Liliam Leme

UK Engineer

 

 

 

 

 

Updated Jul 10, 2020
Version 8.0

2 Comments

  • siddharthamohapatra  those permissions should be enough "

    • Role based access control (RBAC) enables you to assign a role to some Azure AD user in the tenant where your storage is placed. RBAC roles can be assigned to Azure AD users. A reader must have Storage Blob Data Reader, Storage Blob Data Contributor, or Storage Blob Data Owner role. A user who writes data in the Azure storage must have Storage Blob Data Writer or Storage Blob Data Owner role. Note that Storage Owner role does not imply that a user is also Storage Data Owner." Control storage account access for serverless SQL pool - Azure Synapse Analytics | Microsoft Docs . Were you using those permissions or something different?
  • Hi Liliam,

    Thanks for the article.

    The strange part in this set up is the RBAC access on the data lake. I was looking at a reader access to read the files.

    If you know, can you confirm as why the contributor access needed to read files through SQL?

     

    thanks

    Sid