SOLVED

Kusto evaluate and transform URL to subdomain.domain.topleveldomain format

Copper Contributor

Dear Team,

 

I have a question in the context of Threat Intelligence search, where I wanted to standardize free-formed URL into a specific format of subdomain.domain.topleveldomain.

Sample URL:

 

The desired output is that KQL is smart enough to evaluate and cut those URLs to the following output:

  • uni.simple.me
  • something.gov.eu
  • sample.me

I have used parse_url function which is useful, but aren't able to cut the URL to desired length format.

 

Thank you!

2 Replies
best response confirmed by rajamanir (Microsoft)
Solution

@jasonchrist 

//TLD Extraction

extend TLD = extract(@'[a-zA-Z]{1,}\.[a-zA-Z]{2,}$',0,DNS_domain)

//Convert the entire domain into array

extend f = split(DNS_domain,'.')

// format the subdomain part

extend z = array_slice(f,0,-3)

// strcat the manupulated subdomain

| extend subdomain = strcat_array(z,".")

Hello,
To be more generic and extract sub-subdomains and drill deeper in the hierarchy you can use the following:

let extract_domain = (url_or_domain:string, lowest_level :int = 3){
                let  url_parts = split(extract(@":?([A-Za-z-0-9]+\.)+([A-Za-z-0-9]+)",0, url_or_domain),".");
                let parts_count = array_length(url_parts);
                let  relevant_parts = iff(lowest_level > parts_count,
                                        array_slice(url_parts, -parts_count, -1),
                                        array_slice(url_parts, -lowest_level, -1)
                                        );
                strcat_array(relevant_parts,".")
                }
                ;
datatable (url:string)["login.ezproxy.uni.simple.me","https://submit.owa.something.gov.eu","sample.me","sample.me_","sample.me/"]
| extend parsed = extract_domain(url,lowest_level=4)
1 best response

Accepted Solutions
best response confirmed by rajamanir (Microsoft)
Solution

@jasonchrist 

//TLD Extraction

extend TLD = extract(@'[a-zA-Z]{1,}\.[a-zA-Z]{2,}$',0,DNS_domain)

//Convert the entire domain into array

extend f = split(DNS_domain,'.')

// format the subdomain part

extend z = array_slice(f,0,-3)

// strcat the manupulated subdomain

| extend subdomain = strcat_array(z,".")

View solution in original post