Forum Discussion

jasonchrist's avatar
jasonchrist
Copper Contributor
Aug 04, 2022
Solved

Kusto evaluate and transform URL to subdomain.domain.topleveldomain format

Dear Team,

 

I have a question in the context of Threat Intelligence search, where I wanted to standardize free-formed URL into a specific format of subdomain.domain.topleveldomain.

Sample URL:

  • login.ezproxy.uni.simple.me
  • https://submit.owa.something.gov.eu
  • sample.me

 

The desired output is that KQL is smart enough to evaluate and cut those URLs to the following output:

  • uni.simple.me
  • something.gov.eu
  • sample.me

I have used parse_url function which is useful, but aren't able to cut the URL to desired length format.

 

Thank you!

  • jasonchrist 

    //TLD Extraction

    extend TLD = extract(@'[a-zA-Z]{1,}\.[a-zA-Z]{2,}$',0,DNS_domain)

    //Convert the entire domain into array

    extend f = split(DNS_domain,'.')

    // format the subdomain part

    extend z = array_slice(f,0,-3)

    // strcat the manupulated subdomain

    | extend subdomain = strcat_array(z,".")

2 Replies

  • rajamanir's avatar
    rajamanir
    Former Employee

    jasonchrist 

    //TLD Extraction

    extend TLD = extract(@'[a-zA-Z]{1,}\.[a-zA-Z]{2,}$',0,DNS_domain)

    //Convert the entire domain into array

    extend f = split(DNS_domain,'.')

    // format the subdomain part

    extend z = array_slice(f,0,-3)

    // strcat the manupulated subdomain

    | extend subdomain = strcat_array(z,".")

    • User9864's avatar
      User9864
      Copper Contributor

      Hello,
      To be more generic and extract sub-subdomains and drill deeper in the hierarchy you can use the following:

      let extract_domain = (url_or_domain:string, lowest_level :int = 3){
                      let  url_parts = split(extract(@":?([A-Za-z-0-9]+\.)+([A-Za-z-0-9]+)",0, url_or_domain),".");
                      let parts_count = array_length(url_parts);
                      let  relevant_parts = iff(lowest_level > parts_count,
                                              array_slice(url_parts, -parts_count, -1),
                                              array_slice(url_parts, -lowest_level, -1)
                                              );
                      strcat_array(relevant_parts,".")
                      }
                      ;
      datatable (url:string)["login.ezproxy.uni.simple.me","https://submit.owa.something.gov.eu","sample.me","sample.me_","sample.me/"]
      | extend parsed = extract_domain(url,lowest_level=4)

Resources