Breaking changes notification
Published May 21 2023 02:02 AM 2,946 Views
Microsoft

Azure CosmosDB data connection - detailed description

This breaking change is relevant to anyone using an Azure Cosmos DB data connection. 

The Managed identity of your Azure Data Explorer cluster associated with the connection will now require an additional role of “Cosmos DB Account Reader Role” Azure built-in roles - Azure RBAC | Microsoft Learn (Control plane permission)

 

This role will allow Azure Data Explorer to map the Cosmos DB account resource ID (passed through the ARM provider) to its endpoint URL, in addition it will be used to validate the Cosmos DB database and container input parameters.  During the public preview this mapping was inferred. Now, at General Availability (GA) of the feature, that mapping will be acquired by reading the Cosmos DB account properties.  


Your active data connections will continue to work without making any changes, but without this role you wouldn’t be able to create new data connections or update existing ones.

 

When data connections are provisioned or updated through Azure Portal, this role assignment will be done by the Portal on behalf of the logged in user, if the logged in user has enough privileges over the Cosmos DB account to do so.  If not, the role assignment will need to be done by a principal (user or service principal) that has sufficient privilege. 

 

Cosmos DB Build-in Data Reader (Data Plane permissions) is still required for reading the data from the cosmos DB account.

 

Required change

Add “Cosmos DB Account Reader Role” to the managed identity of the Cosmos DB data connections. 

Schedule & plan

Update your automations by end of May 2023, which is the GA data of the Cosmos DB data connection.

 

Export to Azure storage in Parquet format - detailed description 

This breaking change is relevant to anyone exporting data from Kusto to Azure Storage in parquet format, be it by using one-time export or continuous export.

The generated parquet files will start using new encodings that are not supported by Spark versions below 3.3.0, so if you're using Spark version < 3.3.0 to read Parquet files exported from Azure Data Explorer you'll be affected. The error message will include in many cases the following - “Unsupported encoding: DELTA_BYTE_ARRAY”.

The purpose of this change is to increase performance and security

 Required change  

Update the Spark version you’re using to read parquet files, exported from ADX cluster, to Spark version 3.3.0 or newer.

Schedule & plan  

Update your spark version by July 31th.

The change will be deployed and applicable starting on August 1st.

 

Extent level commands- detailed description 

Today it is possible to run certain extent-level commands without specifying the name of the table in which the source extents are in, and/or specifying a creation time range that scopes the lookup of the source extents to operate on. 

Specifically, this refers to the following commands: 

.alter[-merge] extent tags 

.drop extent tags 

.move extents 

.replace extents 

For example, the following commands are still working for you today: 

 .drop extent tags from table <tableName> <tagsSpecificationString> 

.drop extent tags  <| .show table T extents 

.alter extent tags ('t1', 't2') <| .show table T extents 

.move extents from table T1 to table T2 (extentId1, extentId2, ...) 

.move extents to table T2 <| .show table T1 extents 

.replace extents in table T1 <| {.show table T1 extents},{.show table T2 extents}  

 

With the goal of improving efficiency of these operations, we’re planning to block this for all existing clusters that are not using the commands today. 

The new behaviour will fail such commands with any of the following error messages: 

Admin command cannot be executed due to an invalid argument; argument: TableName, reason: The name of the table must be specified 

Admin command cannot be executed due to an invalid argument; argument: ExtentCreatedOnRange, reason: Both 'ExtentCreatedOnFrom' and 'ExtentCreatedOnTo' must be specified 

The current behaviour will continue to be supported on clusters that currently use it. 

We would like to ask you to modify tools you own and extent-level commands they run according to the specification below, so that we will be able to block this behaviour on clusters that are currently using this inefficient form of commands. 

 

Required change

Change extent level commands according to specification in the table below, to make sure they include: 

The name of the table that contains the source extents. 

The shortest-possible creation time range that scopes the lookup of the source extents to operate on. 

 

Command type 

  

Existing syntax 

New syntax 

Purpose of changes 

Drop extent tags 

.drop [async] extent tags from table <tableName> <tagsSpecificationString> 

  

.drop [async] extent tags  <| <innerQuery>

.drop [async] table <tableName> extent tags <tagsSpecificationString> with(extentCreatedOnFrom='...', extentCreatedOnTo='...') 

  

.drop [async] table <tableName> extent tags  with(extentCreatedOnFrom='...', extentCreatedOnTo='...') 

<| <innerQuery>

- scoping the command to a specific table 

- scoping the command to specific narrowed time range

Alter(-merge) extent tags 

.alter[-merge] (async) extent tags <tagsSpecificationString> <| <innerQuery> 

  

.alter[-merge] [async] table <tableName> extent tags <tagsSpecificationString> with(extentCreatedOnFrom='...', extentCreatedOnTo='...') <| <innerQuery> 

  

- scoping the command to a specific table 

- scoping the command to specific narrowed time range 

Move extents 

.move [async] extents from table <tableName> to table <tableName> <extentIdsSpecification> 

.move [async] extents to table <tableName> <| <innerQuery> 

.move [async] extents from table <tableName> to table <tableName> <extentIdsSpecification> with(extentCreatedOnFrom='...', extentCreatedOnTo='...') 

.move [async] extents to table <tableName> with(extentCreatedOnFrom='...', extentCreatedOnTo='...') <| <innerQuery> 

- scoping the command to specific narrowed time range 

Replace extents 

.replace [async] extents in table <tableName> <| {query for extents to be dropped from table},{query for extents to be moved to table

.replace [async] extents in table <tableName> with(extentCreatedOnFrom='...', extentCreatedOnTo='...') <| {query for extents to be dropped from table},{query for extents to be moved to table

- scoping the command to specific narrowed time range 

Schedule & plan  

The existing experience is blocked on all new clusters and all existing clusters that do not currently use it. 

For all clusters that are using the existing, to be deprecated pattern, we will block it by July 31th.

The change will be deployed and applicable starting on August 1st.

 

 

For more details or help please contact us,

Kusto team 

Co-Authors
Version history
Last update:
‎May 21 2023 07:53 AM
Updated by: