Azure Synapse Analytics January Update 2022

Former Employee

Jan 26, 2022

Azure Synapse Analytics January Update 2022

Welcome to the Azure Synapse January 2022 update! Our first blog of the year includes newly added database templates, a security whitepaper, and data integration updates. For the first time, we will also feature a companion video that you can watch to get the quick key updates.

Help us improve this monthly blog, we would love to hear back from you on how best to engage and inform you! Leave a comment below.

Apache Spark for Synapse
- 4 New database templates [Public Preview]
Machine Learning
Security
- Azure Synapse Analytics security overview
- TLS 1.2 required for new workspaces
Data Integration
Synapse SQL
- COPY schema discovery for complex data ingestion
- HASHBYTES easily generates hashes in Serverless SQL

Apache Spark for Synapse

4 New database templates [Public Preview]

We’ve seen a lot of enthusiasm and adoption of the 11 Synapse database templates during public preview that 4 additional templates have been recently added. You can now access Automotive, Genomics, Manufacturing, and Pharmaceuticals templates in Azure Synapse. See them in either in the gallery or by creating a new lake database from the tab and selecting + Table and then From template.

Learn more by reading Four Additional Azure Synapse Database Templates Now Available in Public Preview

Machine Learning

Improvements to the SynapseML library

The release of the Synapse ML library v0.9.5 (previously called MMLSpark) simplifies the creation of massively scalable machine learning pipelines with Apache Spark. It unifies several existing ML Frameworks and new Microsoft algorithms in a single, scalable API that’s usable across Python, R, Scala, and Java. This update includes support for the following new capabilities:

Geospatial Intelligence: Quickly apply the Azure Maps API to solve problems that require geospatial intelligence. Use distributed tools such as geocoding to make sense of informal location data at scale. Learn more about Azure maps on Spark overview.
Custom Multivariate Anomaly Detection: Train custom multivariate anomaly detection systems within your databases with only a few lines of python. Learn more about the Multivariate Anomaly Detection Python code.
Healthcare Analytics: Parse and reason about medical text using the parallelism of your Spark cluster. Extract medications, doses, medical relationships, and more. Learn an example usage code for Cognitive Services on Spark overview.
Responsible AI at Scale: Understand the predictions of opaque-box models, measure dataset bias, and probe models with Individual Conditional Expectation plots. Users can explore model biases with the new Individual Conditional Expectation transformer in our Adult Census Dataset example. Learn more about the Adult Census Dataset example.
Text to Speech: Use neural voice synthesis to generate thousands of hours of lifelike speech in minutes. Please see our cognitive service overview for example usage.

Learn more by reading the full release notes or visit the SynapseML homepage to get started.

Security

Azure Synapse Analytics security overview

We just published a white paper that explains Synapse's enterprise-grade security capabilities and industry-leading features that addresses security concerns and provides a comprehensive overview of Azure Synapse Analytics security features. This whitepaper covers the five layers of security: Authentication, Access Control, Data Protection, Network Security, and Threat Protection. Use this reference document to understand each security feature and to implement an industry-standard security baseline to protect your data on the cloud.

Learn more by reading Azure Synapse Analytics security white paper: Introduction

TLS 1.2 required for new workspaces

Starting in December 2021, TLS 1.2 is required for newly created Synapse Workspaces. TLS 1.2 provides enhanced security to safeguard against exploits. Login attempts to newly created Synapse workspace from connections using a TLS versions lower than 1.2 will fail.

Learn more by reading Azure Synapse Analytics connectivity settings

Data Integration

Data quality validation rules using Assert transformation

You can now easily add data quality, data validation, and schema validation to your Synapse ETL jobs by leveraging Assert transformation in Synapse data flows. Add expectations to your data streams that will execute from the pipeline data flow activity to evaluate whether each row or column in your data meets your assertion. Tag the rows as pass or fail and add row-level details about how a constraint has been breached. This is a critical new feature to an already effective ETL framework to ensure that you are loading and processing quality data for your analytical solutions.

Learn more by reading Assert transformation in mapping data flow

Native data flow connector for Dynamics

Synapse data flows can now read and write data directly to Dynamics through the new data flow Dynamics connector. Create data sets in data flows to read, transform, aggregate, join, etc., and then write the data back into Dynamics using the built-in Synapse Spark compute.

Learn more by reading Native data flow connector for Dynamics

IntelliSense and auto-complete added to pipeline expressions

It’s here! This much anticipated update adds IntelliSense to expression editing, making it super easy for you to create new expressions, check your expression syntax, find functions, and add code to your pipelines.

Learn more by reading IntelliSense support in Expression Builder for more productive pipeline authoring experiences

Synapse SQL

COPY schema discovery for complex data ingestion

Automatic schema discovery along with auto-table creation process makes it easy for customers to automatically map and load complex data types from Parquet files, such as arrays, and maps into Dedicated SQL pools in Synapse. Rowgroup compression is automatically enabled when customers enable the auto-create table option within the COPY command. Start taking advantage of all these features today to simplify data ingestion with Azure Synapse Analytics!

Learn more by reading how Github leveraged this functionality in Introducing Automatic Schema Discovery with auto table creation for complex datatypes

HASHBYTES easily generates hashes in Serverless SQL

SQL pools now support the HASHBYTES function! HASHBYTES is a T-SQL function which hashes values. This means that you can use the HASHBYTES function in queries that read data using external tables and the OPENROWSET function.

SELECT
    TOP 100
    HASHBYTES('sha2_256', vendorid) as hashedVendorID,
    vendorID
FROM
    OPENROWSET(
        BULK 'https://azureopendatastorage.blob.core.windows.net/nyctlc/yellow/puYear=2019/puMonth=1/*.parquet',
        FORMAT = 'parquet'
    ) AS [result];