Azure Synapse Analytics February Update 2022

Former Employee

Feb 28, 2022

Welcome to the February 2022 Azure Synapse update! This month, we have SQL and data integration features to share with you. Watch our monthly update video!

You can see the rest of our videos on the Azure Synapse Analytics YouTube channel.

Put your questions and feedback in the comments below!

SQL
- More consistent query execution times for Serverless SQL Pools
- The OPENJSON function makes it easy to get array element indexes
Data Integration

SQL

More consistent query execution times for Serverless SQL Pools

Occasionally, spikes in read latency were increasing query duration. Serverless SQL pools now automatically detect such spikes and handle them in a way that query duration is not significantly impacted, resulting in more consistent query execution times.

The diagram below shows how serverless SQL pools have been sending requests to storage before this improvement. Each request reads the same amount of data, yet in this case Request #2 took significantly more time, which resulted in longer query duration.

The following diagram shows how serverless SQL pools send requests to storage today. The request is repeated as soon as storage request latency is larger than expected and the outlawed request is abandoned. This results in more consistent query durations across executions.

The OPENJSON function makes it easy to get array element indexes

In many scenarios, if you are analyzing data with nested arrays, you would need to get the index(position) of the element from an array. In Spark pools, you can use posexplode and posexplode_outer to get the array elements with their positions in the arrays, but in T-SQL you might need to use complex window aggregate functions such as ROW_NUMBER, which might complicate the query. In serverless SQL pools, you don’t need to use the ROW_NUMBER function to identify the index of each array element. The OPENJSON function in a serverless SQL pool allows you to parse nested arrays and return one row for each JSON array element with the index of each element.

In the example above, the OPENJSON function transforms the JSON array with “month” and “temp” properties into the set of rows with the “month” and “temp” columns. It also returns the position of the array element if you specify a numeric column (month_id in this example), and specify that this column should be element’s identity by setting the path '$.sql:identity()' after the column type definition. This column will represent the unique index of each array element.

The array above is represented as a plain JSON text. However, you can also find this kind of array as the properties of CosmosDB documents or nested list types in Parquet.

Let’s imagine that we have a Parquet file with the information about countries, and where the monthly temperatures for each country are placed as a nested arrays complex type. In the serverless SQL pool, we can easily parse this Parquet file, get the JSON array representing the temperatures, and pass this array to the OPENSJON function to transform it into the rows.

The column with the '$.sql:identity()' path will automatically generate the array element indexes and enable you to filter-out only the first three months for each country. Automatically generated array indexes in the OPENJSON function might help you simplify and improve performance of the queries that most use element indexes.

Learn more by reading OPENJSON

Data Integration

Upsert supported by Copy Activity

The copy activity now supports natively loading data into a temporary table and then merging that data into a sink table. Updates and inserts are supported. This works for scenarios where the sink is SQL Server, Azure SQL Database, Azure SQL Managed Instance, or Azure Synapse Analytics.

This makes it super easy to update rows in your SQL databases without needing to write stored procedures.

Learn more by reading Upsert data in Copy activity.

Transform Dynamics Data Visually in Synapse Data Flows

When transforming data in Synapse data flow, you can now read and write to tables from Dynamics using the new Dynamics connector. You can choose to use a Dynamics dataset or an inline dataset as source and sink types to make it super easy to transform data at scale in Synapse from Dynamics.

Learn more by reading Mapping data flow properties.

Connect to your SQL sources in data flows using Always Encrypted

Synapse has added support for Always Encrypted to source transformations in SQL Server, Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse Analytics when using data flows. Now you can securely connect to your SQL databases from Synapse data flows using Always Encrypted.

Learn more by reading Using Always Encrypted

Capture descriptions from Asserts in Data Flows

Debugging data flows can be difficult when your assert expectations fail at the row or column level and you don’t have a detailed error message to debug. With this new feature, you can now define your own dynamic descriptive messages in the assert data flow transformation at the row or column level. Synapse has added a new function called assertErrorMessage() that you can use to retrieve the row-by-row error message and store it in your destination data.

Learn more by reading assertErrorMessages

Easily define schemas for complex type fields

A new feature has been added to the Parse transformation to make it super-easy to automatically detect the schema of an embedded complex field inside a string column. Just click on the “Detect type” button in the expression builder to set your target schema automatically. Now, you won’t have to manually define a complex schema and instead allow the Synapse type inference engine in data flows to do the work for you.