How to read Data Lake with Synapse Serverless – Part 1
Published Jan 19 2023 10:00 AM 5,383 Views
Microsoft

ryanjadams_0-1670518238006.png

 

Author: Ryan Adams is a Program Manager in Azure Synapse Customer Success Engineering (CSE) team. 

 

There are two ways to read data inside Data Lake using the Synapse Serverless engine.  In this article, we’ll look at the first method which uses OPENROWSET to query a path within the lake. To learn how to use an external table to query a path within the lake, check out the part 2 of the article series. 

 

Synapse is a collection of tools with four different analytical engines (Dedicated Pool, Spark Pool, Serverless Pool, Data Explorer Pool).  This gives you a lot of options for ingesting, transforming, storing, and querying your data.  The article will focus on how you can use the Synapse Serverless Pool to query the data in your ADLS account.   

 

Let's start by going to the Develop Hub and click the ellipsis that appears when you hover your cursor over the SQL Scripts heading.  That will open a drop-down menu where you can select “New SQL Script”.  Enter the code below into your script, make sure the “Connect To” box shows “Built-in” to use the serverless pool, and execute the script to read your data. 

 

ryanjadams_1-1670518278817.png

 

In my example, I have some IoT data representing temperatures from two different probes.  The data is stored in my lake using Parquet format.  Here are the options I have defined: 

 

  • BULK – This is the URI to my data lake and you can see I used a wildcard to grab all files ending in, .parquet (*.parquet).  If you only point to a folder or container, then it will read all files in that location. 
  • FORMAT – The OPENROWSET function supports three file formats (Parquet, Delta, and CSV) and I am telling it to expect Parquet. 

 

The last thing I do is order my results, so the newest events are at the top.  You can easily use the following code and change it to fit your environment and start querying data in your lake using the SQL language! 

 

 

 

 

 

SELECT 
    Temp1, Temp2, ArrivalTime 
FROM 
    OPENROWSET( 
        BULK 'https://YourStorageAccountName.dfs.core.windows.net/iot/bronze/temperature_raw/*.parquet', 
        FORMAT = 'PARQUET' 
    ) AS [PARQUET FORMAT] 
ORDER BY ArrivalTime DESC

 

 

 

 

 

 

Demo

Learn how to read Data Lake using OPENROWSET by checking out this video

 

 

Resources 

You can find more details and full documentation on the OPENROWSET command with this link.  For increased performance, it’s worth noting that we have two different parsers you can use, and you define that with the PARSER_VERSION option.  Check out an excellent video demonstration we have on CSV files and the two different parser versions. 

 

Our team publishes blog(s) regularly and you can find all these blogs here: https://aka.ms/synapsecseblog  

For deeper level understanding of Synapse implementation best practices, please refer our Success By Design (SBD) site: https://aka.ms/Synapse-Success-By-Design  

Co-Authors
Version history
Last update:
‎Jan 30 2023 10:49 AM
Updated by: