Azure Synapse Analytics February Update 2023
Welcome to Azure Synapse Analytics February update! This month, you’ll find sections on UTF-8 and Japanese Collation support, the General Availability of Spark 3.3, and other features in SQL, Spark, and Data Integration. Read on for more details and check out our video!
Table of contents
- SQL
-  Apache Spark for Synapse 
- Azure Synapse Runtime for Apache Spark 3.3 (Generally Available)
- Azure Synapse Runtime for Apache Spark 2.4 will be retired as of September 29, 2023
- Azure Synapse runtime for Apache Spark 3.1 will be retired as of January 26, 2024
- Increased Azure Synapse Analytics Spark performance in Korea Central, Central India, and Australia Southeast (General Availability)
 
- Data Integration
SQL
UTF-8 and Japanese Collation support for Dedicated SQL Pools (General Availability)
We are thrilled to announce that support for UTF-8 and Japanese Collations for Azure Synapse Dedicated SQL pools is now Generally Available!
UTF-8 stores multilingual characters in data types CHAR and VARCHAR. When your data has a mix of Latin alphabet characters and other multilingual characters, you will save space and improve performance by using UTF-8. This can be set as a database-level or column-level default for Unicode string data.
To learn more about what UTF-8 is and how it works, read Introducing UTF-8 support for SQL Server
To learn more about this announcement, read UTF-8 and Japanese collations support for Dedicated SQL pools is now GA!
SQL Package support for Serverless SQL pools
Support for SQL Package is finally available for serverless SQL pools! SQL Package is a command-line utility that automates database development deployment tasks. Starting from release 161.8089.0 of SQL Package, you can perform Extract and Publish operations on serverless SQL pools.
Use this SQL package to simplify your CI/CD pipelines. You can Extract a database schema and metadata objects from one serverless SQL pool in one environment (for example, development) and Publish them on a different serverless SQL pool in another environment (testing).
To learn more about SQL Package support for Serverless SQL pools, read Release notes for SqlPackage.
Apache Spark for Synapse
Azure Synapse Runtime for Apache Spark 3.3 (Generally Available)
Azure Synapse Runtime for Apache Spark 3.3 has been in Public Preview since November 2022. We are excited to announce that after notable improvements in performance and stability, Azure Synapse Runtime for Apache Spark 3.3 now becomes Generally Available and ready for production workloads.
The essential changes include features that come from upgrading Apache Spark to version 3.3.1, Delta Lake to version 2.2.0, and Python to 3.10.
For additional details, review the Azure Synapse Runtime for Apache Spark 3.3 (GA) official release notes.
For a complete list of improvements, review the Apache Spark 3.3 release notes
For more details on migration, review the migration guide
Azure Synapse Runtime for Apache Spark 2.4 will be retired as of September 29, 2023
On 29 September 2023, Azure Synapse runtime for Apache Spark 2.4 will be retired in accordance with the Synapse runtime for Apache Spark lifecycle policy, and any workloads still using it will stop running. Before that date, you'll need to transition your workloads to version 3.2 or 3.3.
We recommend the most recent version, 3.3, because it offers significant enhancements such as:
- Improved reliability, performance, and management with an update of Delta to version 2.2.
- An upgrade to the Apache Log4j 2 library to improve security with better support for encryption and secure socket layers.
- Improved type annotations and new syntax for easier coding with an update of Python to version 3.10.
We recommend choosing Azure Synapse runtime for Apache Spark version 3.2 or version 3.3 for your workloads that currently use version 2.4 before 29 September 2023 to ensure they continue to run as usual. If you have code that's incompatible with the version you transition to, follow the Spark Migration guide to troubleshoot it.
Azure Synapse runtime for Apache Spark 3.1 will be retired as of January 26, 2024
On 26 January 2024, Azure Synapse runtime for Apache Spark 3.1 will be retired in accordance with the Synapse runtime for Apache Spark lifecycle policy, and any workloads still using it will stop running. Before that date, you'll need to transition your workloads to version 3.2 or 3.3.
We recommend the most recent version, 3.3, because it offers significant enhancements such as:
- Improved reliability, performance, and management with an update of Delta to version 2.2.
- An upgrade to the Apache Log4j 2 library to improve security with better support for encryption and secure socket layers.
- Improved type annotations and new syntax for easier coding with an update of Python to version 3.10.
We recommend choosing Azure Synapse runtime for Apache Spark version 3.2 or version 3.3 for your workloads that currently use version 3.1 before 26 January 2023 to ensure they continue to run as usual. If you have code that's incompatible with the version you transition to, follow the Spark Migration guide to troubleshoot it.
Increased Azure Synapse Analytics Spark performance in Korea Central, Central India, and Australia Southeast (General Availability)
We are always working to improve Azure Synapse Analytics Spark performance. Significant changes are being made that will increase Spark performance by up to 77%.
In November, we announced that we are moving your Spark pools to use Azure v5 VMs. We have over 40 regions world-wide and have completed the changes in the first 3 regions: Korea Central, Central India, and Australia Southeast. These Spark performance improvements improved CPU performance, increased temporary SSD throughput, and leveraged higher remote storage IOPS from these Azure v5 VMs.
In most cases, there are no actions that are required. After each region is upgraded, your newly created Spark Pools and jobs will complete in less time using v5 VMs. For existing Spark Pools that were created before this service update, they will continue to run on existing v3 VMs. Once they reach the idle timeout, and are re-created, they will be created with v5 VMs. You could choose to reduce the node size or the number of nodes if cost savings are more important to you than job completion elapsed time.
To learn more about increased performance, read Optimize Apache Spark jobs in Azure Synapse Analytics and Apache Spark pool configurations in Azure Synapse Analytics
Data Integration
Set pipeline output value (Public Preview)
When building complex workflows in the cloud with Azure Data Factory and Azure Synapse Pipelines, a very common pattern is to separate different workflow branches into child pipelines. Now, we have expanded the Set Variable activity to allow users to set a new system variable, called Pipeline Return Value. This allows you to customize your pipeline return value when communicating between child and parent pipelines.
To learn more about this new system variable, read Setting a pipeline return value with UI