Azure Data Explorer is a big data analytics platform that takes care of everything required to ensure real time decision making can take place, or at least, near real time. This includes data ingestion, data querying, data visualization and data management.
In this blog post you’ll learn how to delete individual records from a table, and how it works behind the scenes.
There are several scenarios in which you might want to delete individual records:
This blog post will explain how to delete individual records for the last scenario in the list above.
Scenario
A common scenario for deleting individual records is when you have a table with data coming from IOT devices, and one day you find out that some device has recently started reporting wrong data, and then completely broke and stopped reporting data at all. You then wish to delete the records from the table that were produced by the faulty device in the specific timeframe.
Suppose your company manages a large number of office buildings, and in every room of every building there’s a device that measures the temperature and sends it to a table named DeviceTemperature. Here’s a small data sample from the table:
DeviceId |
Timestamp |
Temperature |
A |
2022-01-17 19:58:03 |
19.53 |
B |
2022-01-20 02:08:03 |
18.69 |
C |
2022-01-20 05:41:03 |
20.56 |
A |
2022-01-20 06:59:03 |
19.76 |
D |
2022-01-20 23:37:03 |
19.79 |
You know that these thermometers sometimes break, and when they do, they start reporting extremely high temperatures for a while, and then they stop sending data at all. So, you created a script that runs a query once a day to detect such devices, and one day you see that the query returned a result indicating that one device broke:
DeviceId |
Timestamp |
avg_Temperature |
E |
2022-01-24 10:00:00 |
885.86 |
Your next step is to write a query that returns all the records that were produced by the faulty device, based on these results:
Deleting records by a predicate
To delete records, you’ll need to provide a predicate in the form of a KQL query, that returns all the records that you want to be deleted.
The recommendation is to first run the deletion command in `whatif` mode, in order to see how many records will be deleted if you use the provided predicate, but without deleting anything just yet.
OldExtentId |
NewExtentId |
RecordsMatchPredicate |
405da552-04bf-4138-b764-5fbef6513f04 |
00000000-0000-0000-0000-000000000000 |
548 |
392d9b4d-0b21-40b1-a6af-10f9132c612c |
00000000-0000-0000-0000-000000000000 |
19 |
14779dfc-e261-469e-826f-bbfde3836555 |
00000000-0000-0000-0000-000000000000 |
1290 |
In the result above, you can see that 1857 records were returned by the predicate, and they are spread across three extents.
A quick note on what an “extent” is: Azure Data Explorer is built to support tables with a huge number of records (rows) and large amounts of data. To handle such large tables, each table's data is divided into smaller "chunks" called data shards or extents (the two terms are synonymous). The union of all the table's extents holds the table's data. An extent is a like a type of mini-table.
If the result makes sense to you, then it’s time to actually delete these records, by removing the `with (whatif=true)` part:
OldExtentId |
NewExtentId |
RecordsMatchPredicate |
405da552-04bf-4138-b764-5fbef6513f04 |
3d93e17d-06d5-4efc-b2f0-209547bf687b |
548 |
392d9b4d-0b21-40b1-a6af-10f9132c612c |
7cb7e4fc-a7bf-4b83-babd-f92bb78568cf |
19 |
14779dfc-e261-469e-826f-bbfde3836555 |
ad3e6a5e-dab7-406a-a5c7-cf5854aff4db |
1290 |
How do you read this result? Let’s look at the first record. It says that extent 405da552-04bf-4138-b764-5fbef6513f04 contained 548 records that were returned by the predicate. The command replaced this extent with a new one, 3d93e17d-06d5-4efc-b2f0-209547bf687b, containing all the records from the original extent, except the 548 records that were filtered out by the predicate.
Now when you query the DeviceTemperature table, you won’t get the records that you deleted.
How does this work behind the scenes?
The deletion process is performed using the following steps:
Summary
In this blog post you learnt how to delete individual records by providing a predicate in the form of a KQL query, and how the deletion process works behind the scenes.
Where can I learn more?
You can learn more about this feature here.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.