What’s New in Azure Synapse Data Explorer – Ignite 2022 !
Published Oct 12 2022 11:50 PM 3,957 Views
Microsoft

ignitebanner.png

 

Kusto team has been busy working with our customers over the last few months to bring exciting new GA & Preview features at Ignite 2022, with a raft of improvements and innovations.  

 

Today, we are announcing a set of exciting new features that further improve the service’s performance, security, management, and integration experiences:

 

Integration 

 

Cosmos DB synapse link to Azure Data Explorer (Preview) 

Azure Cosmos DB is a fully managed distributed database for web, mobile, gaming, and IoT application that needs to handle massive amounts of data, reads, and writes at a global scale with near-real response times.  

ADX native ingestion of Cosmos DB brings the high-throughput / low-latency transactional Cosmos DB data to the analytical world of Kusto, delivering the best of both worlds. Data can be ingested in near real time (streaming ingestion) to run analytics on the most current data or audit changes. 

 

The feature is in private preview right now and should be available in public preview before the end of the year. 

 

Kusto Emulator (GA) 

The Kusto Emulator is a Docker Image exposing a Kusto Query Engine endpoint.  You can use it to create databases, ingest and query data.  The emulator understands Kusto Query Language (KQL) the same way the Azure Service does.  We can therefore use it for local development and be ensured the code is going to run the same in an Azure Data Explorer cluster.  We can also deploy it in a CI/CD pipeline to run automated test suites to ensure our code behaves as expected. 

You can find the overview documentation on the Kusto Emulator here and a video here: Kusto Emulator video 

 

Ingesting files from AWS S3 (GA) 

Amazon S3 is one of the most popular object storage services. AWS Customers use Amazon S3 to store data for a range of use cases, such as data lakes, websites, mobile applications, backup and restore, archive, applications, IoT devices, log analytics and big data analytics.  

With the native S3 ingestion support in ADX, customers can bring data from S3 natively without relying on complex ETL pipelines. Customers can also create a continuous data ingestion pipeline to bring data from S3. For more details please refer. 

 

Azure Stream Analytics ADX output (GA) 

Azure Data Explorer output for Azure Stream Analytics is now Generally Available.  ASA-ADX output has been available in Preview since last year.  Customers can build powerful real time analytics architecture by leveraging ASA and ADX together. With this new integration Azure Stream Analytics job can natively ingest the data into Azure Data Explorer and Synapse Data Explorer tables. Read more about the output plugin set up and ADX-ASA common use cases.  
 

Open Telemetry exporter (GA) 

OpenTelemetry (OTel) is a vendor-neutral open-source observability framework for instrumenting, generating, collecting, and exporting telemetry data such as traces, metrics, logs 

We are releasing ADX OpenTelemetry exporter which supports ingestion of data from many receivers into Azure Data Explorer allowing customers to instrument, generate, collect and store data using a vendor-neutral open-source framework. Read more.  

 

Streaming support in Telegraf connector (GA) 

Telegraf is an open source, lightweight, minimal memory footprint agent for collecting, processing, and writing telemetry data including logs, metrics, and IoT data.  The Azure Data Explorer output plugin serves as the connector from Telegraf and supports ingestion of data from many types of input plugins into Azure Data Explorer.  

We have added support for "managed" steaming ingestion in Telegraf which defaults to streaming ingestion providing latency up to a second when the target table is streaming enabled, with a fallback to batched or queued ingestion. Read more.  

 

Protobuf support in Kafka sink (GA) 

Protocol buffers (Protobuf) are a language and platform-neutral, extensible mechanism for serializing and deserializing structured data for use in communications protocols and data storage. Azure Data Explorer Kafka sink - a gold certified Confluent connector - helps ingest data from Kafka to Azure Data Explorer. We have added Protobuf support in the connector to help customers bring Protobuf data into ADX. Read more 

 

Management 

 

No / minimal downtime SKU change 

Azure Data Explorer provides full flexibility to choose the most optimal SKU based on customer’s desired cpu/cache ratio or query/storage patterns to reach optimal performance. Till now sku change took time as the operation was sequential, and the cluster waited for the new VMs to be created. Now it is done with zero downtime so you can reach the optimal performance faster with no disruption to users.  

We achieve this in two steps – first, we prepare the new VMs in parallel for the old cluster continuing to provide service, and once the new VMs are ready, only then we perform the switchover to the new VMs. This new capability makes it seamless to transition to the new Lasv3 sku that delivers extreme price performance   

 

Table Level Sharing support via Azure Data Share (GA) 

With Azure Data Share, users can establish in-place sharing with Azure Data Explorer databases, allowing to easily and securely share your data with people in your company or external partners. Sharing occurs in near-real-time, with no need to build or maintain a data pipeline. 

We have now added Table level sharing support via the Azure Data Share UX where users can share specific tables in the database by including or excluding certain tables or using wildcards. This allows you to provide a subset of the data using different permission sets, allows multitenant ISV solution to keep proprietary tables hidden but share specific tenant data in place to their customers.

Please read more and give it a try 

 

Anshul_Sharma_0-1665478220307.png

 

 

Aliasing follower databases (GA) 

The follower database feature allows you to attach a database located in a different cluster to your Azure Data Explorer cluster. Prior to aliasing capability, a database named DB created on the follower cluster took precedence over a database with the same name that was created on the leader cluster, not allowing databases with same name to co-exist.  But now you can override the database name while establishing a follower relationship. This allows you to follow multiple databases with the same name from multiple leader clusters or even just make a database available to users with a more user-friendly name. 

You can either use a databaseNameOverride property to provide a new follower database name, or use databaseNamePrefix  when following an entire cluster to add a prefix to all of the databases original names from leader cluster. Read more about the API, and usage code samples. 

 

Leader follower discoverability 

We have enhanced the discoverability of leader & follower databases in your ADX clusters You can visit the database blade in Azure portal to easily identify all the follower databases following a leader, and the leader for a given follower. The details pane also provides granularity around which specific tables, external tables, and Materialized views have been included or excluded. Read more. 

read-write-databases-shared.png

 

Performance 

 

Lsv3/Lasv3 SKU availability (GA) 

 

The Lsv3 (Intel-based) and Lasv3 (AMD-based) are the recommended Storage Optimized SKU families of Azure Data Explorer. The 2 SKU families are supported in 2 configurations:  

  • L8sv3/L8asv3: 8 vCPUs with 1.75TB   
  • L16sv3/L16asv3: 16 vCPUs with 3.5TB.  

The two SKU families are optimal for Storage bound workloads from both a cost and performance perspective.  The SKUs are in the process of being rolled out worldwide, and are already available in 17 leading regions.  

 

Improved performance of export to parquet blobs (Preview) 

This new feature allows a more efficient export into Parquet. In most cases it’s faster, and creates smaller output blobs, that are more efficient to query. 

To make use of this feature set the useNativeParquetWriter to true (default is false) in the one-time Export command or when creating Continuous Data Export. 

Example: .export to table externalTableParquet with (useNativeParquetWriter = true) <| Usage 

 

Improved performance for SQL requests (Preview) 

Requests to SQL will become more optimized, both when using external tables, and when using the sql_request_plugin (if OutputSchema is provided). This will be achieved by pushing down predicates and operators from the KQL query, to the SQL query sent to SQL server. For example, the query `external_table(“my_sql_table”) | where Name == “John”` used to fetch all the records from the SQL, and then apply the filter on the Name column. After the change, the filter on the Name column will be pushed to the query to the SQL server, which will result in more efficient resource utilization on both SQL server and ADX, and the overall query duration should decrease significantly. 

 

Available in Preview by the end of this year, we will share the Preview process closer to the date.   

 

Improve performance of ingestion of parquet blobs (Preview) 

This new feature allows a more efficient ingestion of Parquet blobs – the CPU utilization and the ingestion duration will decrease by tens of percent. In the preview phase, there will be an option to enable it per ingestion command, using a dedicated flag. Once the feature is GA, this will become the default mode. 

syntax : `with (nativeParquetIngestion=true)` 

 

Available in Preview by the end of this year, and planned GA around early next year. 

 

Power to the query

 

Parse-kv operator (GA) 

A new operator which extracts structured information from a string expression and represents the information in a key/value form.

The following extraction modes are supported:

  • Specified delimeter: Extraction based on specified delimiters that dictate how keys/values and pairs are separated from each other.
  • Non-specified delimeter: Extraction with no need to specify delimiters. Any non-alphanumeric character is considered a delimiter.
  • Regex: Extraction based on RE2 regular expression.

Read more. 

 

Scan operator (GA) 

This powerful operator enables efficient and scalable process mining and sequence analytics and user analytics in ADX. The user can define a linear sequence of events and ‘scan’ will quickly extract all sequences of those events. Common scenarios for using ‘scan’ include preventive maintenance for IoT devices, customers funnel analysis, recursive calculation, security scenarios looking for known attack steps and more.  Read more. 

 

New Python image for the inline python() plugin (Preview) 

We have updated the Python image to Python 3.9 and the latest version of packages. This would be the default image when enabling the plugin, while existing users can continue to use the older image (based on Python 3.6.5) for compatibility. Read more. 

 

Available in Preview by the end of this year. 

 

Reliability 

 

Continuous export: write-once guarantee (GA) 

Up until now, transient errors during continuous export sometimes generated blobs with duplicate data, and sometimes corrupted blobs, in addition to blobs containing the correct data. Following the change, data will be exported exactly once. 

 

Hyper-V support for sandboxes

We are replacing the sandboxing of the python plugin to use Hyper-V technology. Hyper-V offers enhanced isolation, thus improving security. This will allow Python and R plugins on SKUs with hyper threading improving the overall performance. 

 

Visualization 

 

ADX Dashboards (GA) 

Azure Data Explorer Dashboards is a web component that enables you to run queries and build dashboards in the stand-alone web application, the Azure Data Explorer web UI. Azure Data Explorer Dashboards provide two main advantages: 

  • Native integration with Azure Data Explorer web UI extended functionalities  
  • Optimized dashboard rendering performance 

Read more about ADX Dashboards here 

Anshul_Sharma_0-1665638737312.png

 

Generally available (GA) by the end of this year.  

 
 

Plotly support in ADX Dashboards (Preview) 

Plotly is a python graphic package which allows creation of advanced visualizations including 3D, heatmaps, animation and many more. The visualization is defined by short Python script that is supplied by the user, thus it is very flexible and can be tailored to the specific scenario by the user’s program. We are launching support of Plotly visualizations in ADX Dashboards.  

 

Anshul_Sharma_1-1665639032483.png

 

Available in Preview by the end of this year. 

 

Dashboards base query (GA) 

Base query is a new feature in ADX Dashboards that allows you to re-use queries among Dashboards’ tiles (i.e., visualizations). This feature will not only make it easier to manage the underlying queries of a dashboard but will also improve its performance. 

 

Generally Available by the end of this year. 

 

Kusto Trender (GA) 

Timeseries insights will be retired on March 31st, 2025. Meanwhile most of the customers are transitioning to Azure Data Explorer (ADX) for their (Industrial) Internet of Things use cases. Azure Data Explorer provides the best data analytics platform for streaming telemetry data. To further accelerate the transition, we upgraded the Timeseries insights client visualization component to work on ADX. 

 

Anshul_Sharma_0-1665483663441.png

The code is available on GitHub under the MIT license. Samples can be accessed via the Kusto Trender Samples Gallery. 

 

Kusto.Explorer - Automation (Preview) 

Query Automation allows you to define a workflow that contains a series of queries with rules and logic that govern the order in which they are executed. Automations can be reused, and users can re-run the workflow, to get updated results. Upon completion, the saved Automation produces an analysis report, summarizing all queries results with additional insights. 

This powerful feature is now available in Kusto.Explorer, a rich desktop app that enables you to explore your data using the KQL 

 

Anshul_Sharma_4-1665478520756.png

 

Guidance

 

POC Playbook 

We are releasing a prescriptive guidance to help our customers plan for their Azure Data Explorer proof of concept (POC) .

The playbook provides  a high-level methodology for preparing and running an effective Azure Data Explorer POC project.

Please refer the ADX POC Playbook for details.  

 

 

Lastly, if you missed Satya's Ignite 2022 keynote , do watch him talk about real-time analytics with ADX.

 

Anshul_Sharma_0-1665639866546.png

 

We would love to hear your feedback and overall experience with these new capabilities. Please let us know your thoughts in the comments. 

 

2 Comments
Co-Authors
Version history
Last update:
‎Oct 13 2022 08:47 PM
Updated by: