SOLVED

split and regex in Kusco

Occasional Contributor

Hi all,

 

I have a query in Kusto to return Details from Table which returns multiple rows of sentence text:

Table
| project Details

Output:

Starting cycle 20349
Starting scheduling for cycle 20350

 

But I want to split the sentences by spaces and remove the numbers (so I can do aggregation on keywords)

The split example in the help is on string literals so I can do this:

 

Table 
| take 10
| project split(Details, ' ')

but I then get an array of values in each row as output:

 

Row 1
[
"Starting",
"cycle",
"20349"
]

Row n...
[
"Starting",
"scheduling",
"for",
"cycle",
"20350"
]


How can I split multiple lines and get a row for each word in Kusto syntax?

 

Thanks!

6 Replies

here's an example for using "split()" (depending on your actual use case, operators/functions linked-to below may be relevant as well):

 

split() function: https://docs.microsoft.com/en-us/azure/kusto/query/splitfunction 

parse operator: https://docs.microsoft.com/en-us/azure/kusto/query/parseoperator

reduce operator: https://docs.microsoft.com/en-us/azure/kusto/query/reduceoperator

replace() function: https://docs.microsoft.com/en-us/azure/kusto/query/replacefunction (for removing numeric values you mentioned was part of your goal)

 

MyTable // which has a column named "Details"
| extend Details = split(MyColumn, " ") // This assumes the "words" are separated by a single space

Hi @Yoni 

 

Thanks for your reply, I was adding more text to the post when I got your reply.

 

I used a similar format but that gives rows containing an array of the split words. I need a row for each word or the syntax to unpack the arrays so that each member is a row.

 

Thanks

you should look into using the mv-expand operator: https://docs.microsoft.com/en-us/azure/kusto/query/mvexpandoperator

@Yoni 

 

Cool, mvexpand does the trick. 

 

How can I perform further operations in the query on the expanded Keywords:

 

Table // which has a column named "Details"
| project KeyWords = split(Details, " ")
| mv-expand KeyWords

 

.... some way of referencing the Keywords output here

 

I tried some permutations but now the output from mv-expand is dynamic which fails to accept additional operations.

 

i'm not sure i understand the question - so i usually find an example to be helpful - feel free to "manipulate" this to the form which demonstrates your intention:

 

let Table = datatable(Details:string)
[
    "123 456 789 abc def",
    "The quick brown fox jumps over the lazy dog"
]
;
Table
| project KeyWords = split(Details, " ")
| mv-expand KeyWords to typeof(string)
| where KeyWords matches regex @"^\d+$"

-> this will return:

KeyWords
--------
123
456
789
best response confirmed by marked-data (Occasional Contributor)
Solution

@Yoni 

 

Hey Yoni,

 

Thanks for the example. I did get what I needed by building off your example: This query takes the log lines, splits into words, screens out numbers. and summarizes the frequency of the word occurrence.

 

Table // which has a column named "Details"| project KeyWords = split(Details, " ")
| mv-expand KeyWords 
| where KeyWords matches regex @"^[a-zA-Z]+$" 
| summarize count() by tostring(KeyWords)