Author: Ryan Adams is a Program Manager in Azure Synapse Customer Success Engineering (CSE) team.
There are two ways to read data inside Data Lake using the Synapse Serverless engine. In this article, we’ll look at the first method which uses OPENROWSET to query a path within the lake. To learn how to use an external table to query a path within the lake, check out the part 2 of the article series.
Synapse is a collection of tools with four different analytical engines (Dedicated Pool, Spark Pool, Serverless Pool, Data Explorer Pool). This gives you a lot of options for ingesting, transforming, storing, and querying your data. The article will focus on how you can use the Synapse Serverless Pool to query the data in your ADLS account.
Let's start by going to the Develop Hub and click the ellipsis that appears when you hover your cursor over the SQL Scripts heading. That will open a drop-down menu where you can select “New SQL Script”. Enter the code below into your script, make sure the “Connect To” box shows “Built-in” to use the serverless pool, and execute the script to read your data.
In my example, I have some IoT data representing temperatures from two different probes. The data is stored in my lake using Parquet format. Here are the options I have defined:
BULK – This is the URI to my data lake and you can see I used a wildcard to grab all files ending in, .parquet (*.parquet). If you only point to a folder or container, then it will read all files in that location.
FORMAT – The OPENROWSET function supports three file formats (Parquet, Delta, and CSV) and I am telling it to expect Parquet.
The last thing I do is order my results, so the newest events are at the top. You can easily use the following code and change it to fit your environment and start querying data in your lake using the SQL language!
Temp1, Temp2, ArrivalTime
FORMAT = 'PARQUET'
) AS [PARQUET FORMAT]
ORDER BY ArrivalTime DESC
Learn how to read Data Lake using OPENROWSET by checking out this video.
You can find more details and full documentation on the OPENROWSET command with this link. For increased performance, it’s worth noting that we have two different parsers you can use, and you define that with the PARSER_VERSION option. Check out an excellent video demonstration we have on CSV files and the two different parser versions.