Azure Data Explorer Blog

9 MIN READ

What’s New in Azure Synapse Data Explorer – Ignite 2022 !

Microsoft

Oct 13, 2022

Kusto team has been busy working with our customers over the last few months to bring exciting new GA & Preview features at Ignite 2022, with a raft of improvements and innovations.

Today, we are announcing a set of exciting new features that further improve the service’s performance, security, management, and integration experiences:

Integration

Cosmos DB synapse link to Azure Data Explorer (Preview)

Azure Cosmos DB is a fully managed distributed database for web, mobile, gaming, and IoT application that needs to handle massive amounts of data, reads, and writes at a global scale with near-real response times.

ADX native ingestion of Cosmos DB brings the high-throughput / low-latency transactional Cosmos DB data to the analytical world of Kusto, delivering the best of both worlds. Data can be ingested in near real time (streaming ingestion) to run analytics on the most current data or audit changes.

The feature is in private preview right now and should be available in public preview before the end of the year.

Kusto Emulator (GA)

The Kusto Emulator is a Docker Image exposing a Kusto Query Engine endpoint. You can use it to create databases, ingest and query data. The emulator understands Kusto Query Language (KQL) the same way the Azure Service does. We can therefore use it for local development and be ensured the code is going to run the same in an Azure Data Explorer cluster. We can also deploy it in a CI/CD pipeline to run automated test suites to ensure our code behaves as expected.

You can find the overview documentation on the Kusto Emulator here and a video here: Kusto Emulator video

Ingesting files from AWS S3 (GA)

Amazon S3 is one of the most popular object storage services. AWS Customers use Amazon S3 to store data for a range of use cases, such as data lakes, websites, mobile applications, backup and restore, archive, applications, IoT devices, log analytics and big data analytics.

With the native S3 ingestion support in ADX, customers can bring data from S3 natively without relying on complex ETL pipelines. Customers can also create a continuous data ingestion pipeline to bring data from S3. For more details please refer.

Azure Stream Analytics ADX output (GA)

Azure Data Explorer output for Azure Stream Analytics is now Generally Available. ASA-ADX output has been available in Preview since last year. Customers can build powerful real time analytics architecture by leveraging ASA and ADX together. With this new integration Azure Stream Analytics job can natively ingest the data into Azure Data Explorer and Synapse Data Explorer tables. Read more about the output plugin set up and ADX-ASA common use cases.

Open Telemetry exporter (GA)

OpenTelemetry (OTel) is a vendor-neutral open-source observability framework for instrumenting, generating, collecting, and exporting telemetry data such as traces, metrics, logs.

We are releasing ADX OpenTelemetry exporter which supports ingestion of data from many receivers into Azure Data Explorer allowing customers to instrument, generate, collect and store data using a vendor-neutral open-source framework. Read more.

Streaming support in Telegraf connector (GA)

Telegraf is an open source, lightweight, minimal memory footprint agent for collecting, processing, and writing telemetry data including logs, metrics, and IoT data. The Azure Data Explorer output plugin serves as the connector from Telegraf and supports ingestion of data from many types of input plugins into Azure Data Explorer.

We have added support for "managed" steaming ingestion in Telegraf which defaults to streaming ingestion providing latency up to a second when the target table is streaming enabled, with a fallback to batched or queued ingestion. Read more.

Protobuf support in Kafka sink (GA)

Protocol buffers (Protobuf) are a language and platform-neutral, extensible mechanism for serializing and deserializing structured data for use in communications protocols and data storage. Azure Data Explorer Kafka sink - a gold certified Confluent connector - helps ingest data from Kafka to Azure Data Explorer. We have added Protobuf support in the connector to help customers bring Protobuf data into ADX. Read more.

Management

No / minimal downtime SKU change

Azure Data Explorer provides full flexibility to choose the most optimal SKU based on customer’s desired cpu/cache ratio or query/storage patterns to reach optimal performance. Till now sku change took time as the operation was sequential, and the cluster waited for the new VMs to be created. Now it is done with zero downtime so you can reach the optimal performance faster with no disruption to users.

We achieve this in two steps – first, we prepare the new VMs in parallel for the old cluster continuing to provide service, and once the new VMs are ready, only then we perform the switchover to the new VMs. This new capability makes it seamless to transition to the new Lasv3 sku that delivers extreme price performance

Table Level Sharing support via Azure Data Share (GA)

With Azure Data Share, users can establish in-place sharing with Azure Data Explorer databases, allowing to easily and securely share your data with people in your company or external partners. Sharing occurs in near-real-time, with no need to build or maintain a data pipeline.

We have now added Table level sharing support via the Azure Data Share UX where users can share specific tables in the database by including or excluding certain tables or using wildcards. This allows you to provide a subset of the data using different permission sets, allows multitenant ISV solution to keep proprietary tables hidden but share specific tenant data in place to their customers.

Please read more and give it a try.

Aliasing follower databases (GA)

The follower database feature allows you to attach a database located in a different cluster to your Azure Data Explorer cluster. Prior to aliasing capability, a database named DB created on the follower cluster took precedence over a database with the same name that was created on the leader cluster, not allowing databases with same name to co-exist. But now you can override the database name while establishing a follower relationship. This allows you to follow multiple databases with the same name from multiple leader clusters or even just make a database available to users with a more user-friendly name.

You can either use a databaseNameOverride property to provide a new follower database name, or use databaseNamePrefix when following an entire cluster to add a prefix to all of the databases original names from leader cluster. Read more about the API, and usage code samples.

Leader follower discoverability

We have enhanced the discoverability of leader & follower databases in your ADX clusters You can visit the database blade in Azure portal to easily identify all the follower databases following a leader, and the leader for a given follower. The details pane also provides granularity around which specific tables, external tables, and Materialized views have been included or excluded. Read more.

Performance

Lsv3/Lasv3 SKU availability (GA)

The Lsv3 (Intel-based) and Lasv3 (AMD-based) are the recommended Storage Optimized SKU families of Azure Data Explorer. The 2 SKU families are supported in 2 configurations:

L8sv3/L8asv3: 8 vCPUs with 1.75TB
L16sv3/L16asv3: 16 vCPUs with 3.5TB.

The two SKU families are optimal for Storage bound workloads from both a cost and performance perspective. The SKUs are in the process of being rolled out worldwide, and are already available in 17 leading regions.

Improved performance of export to parquet blobs (Preview)

This new feature allows a more efficient export into Parquet. In most cases it’s faster, and creates smaller output blobs, that are more efficient to query.

To make use of this feature set the useNativeParquetWriter to true (default is false) in the one-time Export command or when creating Continuous Data Export.

Example: .export to table externalTableParquet with (useNativeParquetWriter = true) <| Usage

Improved performance for SQL requests (Preview)

Requests to SQL will become more optimized, both when using external tables, and when using the sql_request_plugin (if OutputSchema is provided). This will be achieved by pushing down predicates and operators from the KQL query, to the SQL query sent to SQL server. For example, the query `external_table(“my_sql_table”) | where Name == “John”` used to fetch all the records from the SQL, and then apply the filter on the Name column. After the change, the filter on the Name column will be pushed to the query to the SQL server, which will result in more efficient resource utilization on both SQL server and ADX, and the overall query duration should decrease significantly.

Available in Preview by the end of this year, we will share the Preview process closer to the date.

Improve performance of ingestion of parquet blobs (Preview)

This new feature allows a more efficient ingestion of Parquet blobs – the CPU utilization and the ingestion duration will decrease by tens of percent. In the preview phase, there will be an option to enable it per ingestion command, using a dedicated flag. Once the feature is GA, this will become the default mode.

syntax : `with (nativeParquetIngestion=true)`

Available in Preview by the end of this year, and planned GA around early next year.

Power to the query

Parse-kv operator (GA)

A new operator which extracts structured information from a string expression and represents the information in a key/value form.

The following extraction modes are supported:

Specified delimeter: Extraction based on specified delimiters that dictate how keys/values and pairs are separated from each other.
Non-specified delimeter: Extraction with no need to specify delimiters. Any non-alphanumeric character is considered a delimiter.
Regex: Extraction based on RE2 regular expression.

Blog Post

What’s New in Azure Synapse Data Explorer – Ignite 2022 !

2 Comments