Many customers wonder, what's the right way of ingesting XML data into Azure Data Explorer? Well, there's no native support for XML format yet, but there's built-in function parse_xml() that accepts XML string and returns a corresponding structure of dynamic type, which we could probably use. Thus, here's the recipe.
To make it even more interesting, let's assume that the data is coming as JSON that includes a string field containing an XML chunk of data (don't ask me why, it's not uncommon to see such interesting blends all over many legacy systems):
[
{
"ItemID": "d9a205b6-9423-450e-bafb-cd643623328d",
"ItemName": "Headphones C",
"Orders": "<Orders><Order><Quantity>222</Quantity><Price>40.18</Price></Order>..."
},
{
"ItemID": "59100d7b-66a6-4812-9cde-4f95ab56f506",
"ItemName": "Book A",
"Orders": "<Orders><Order><Quantity>5577</Quantity><Price>75.25</Price></Order>..."
},
...
]
The idea is to use a temporary table with zero retention in conjunction with update policy mechanism that allows massaging ingested data and sending the outcome to another table. We are going to ingest JSON into the temporary table, and the update policy will extract XML and send it to our target table.
One important caveat though: XML chunk size cannot exceed 128KB. If there's a need to ingest a bigger XML chunks, you should probably be looking into some custom pre-processing method. For instance, setting up an Azure Function that transforms the data once it arrives to Blob storage, then sends the transformed event over Azure Event Hub to ADX, would be an option.
Let's say, our target table looks like this:
.create table Orders(ItemID: guid, ItemName: string, Quanity: real, Price: real)
First, let's define the temporary table with zero retention. Table's schema reflects the schema of JSON documents that we're going to insert, which eliminates the need to define a data mapping.
.create table TempTable(ItemID: guid, ItemName: string, Orders: string) .alter-merge table TempTable policy retention softdelete = 0d recoverability = disabled
Next, we create a function that will run as part of the update policy. The function runs on newly arrived data, parses XML and expands resulted entries to multiple rows:
.create-or-alter function ExtractOrders() { TempTable | extend Orders=parse_xml(Orders)["Orders"]["Order"] | mv-expand bagexpansion=array Orders | project ItemID, ItemName, Quanity=toreal(Orders["Quantity"]), Price=toreal(Orders["Price"]) }
Here comes the most magic part: defining an update policy.
.alter table Orders policy update '[{' ' "IsEnabled": true,' ' "Source": "TempTable",' ' "Query": "ExtractOrders()",' ' "IsTransactional": true,' ' "PropagateIngestionProperties": false' '}]'
We're ready to try and ingest some data now:
.ingest into table TempTable ("c:/tmp/orders.json") with(format="multijson")
Let's query the Orders table now, and observe the results:
ItemID | ItemName | Quanity | Price |
59100d7b-66a6-4812-9cde-4f95ab56f506 | Book A | 5577 | 75.25 |
59100d7b-66a6-4812-9cde-4f95ab56f506 | Book A | 50 | 12.05 |
d9a205b6-9423-450e-bafb-cd643623328d | Headphones C | 222 | 40.18 |
d9a205b6-9423-450e-bafb-cd643623328d | Headphones C | 42 | 45 |
As you can see, prices and quantities were extracted from the XML data and inserted into our target table.