SOLVED

External file support in KQL - Azure Sentinel

Microsoft
Does KQL supports external file (Like .csv or .txt etc.) as an input to process a query ?
 
12 Replies
best response confirmed by Sanket26 (Microsoft)

I've just created a file for you to try that you can access on-the-fly using the KQL query:

 

externaldata (UserID:string, DomainName:string) [@"https://raw.githubusercontent.com/jjsantanna/test_csv/master/ioc.csv"] with (format="csv",ignoreFirstRecord=true)

 

This is the easiest way to access external data. You can also create a blob within Azure and call from it. You can also read external text file, json, and many others.

 

Does this answer your question?

@jjsantanna 

The issue which I am facing right now is : The html page where the csv is hosted isn't in desired format (There are multiple lines of header before the actual data). Also downloading the file, modifying the format and then uploading to a blob isn't the best option for me.

 

I am getting this error :

Partial query failure: Wrong number of fields (E_WRONG_NUMBER_OF_FIELDS). (message: 'Kusto::Csv::Parser<>.PrepareFields: CSV has an inconsistent number of fields per line: ', details: 'Offending record: 10 (start position in stream: 531), fieldsCount: 4, currentRecordFieldCount: 4, record: # ja3_md5,Firstseen,Lastseen,Listingreason
[end record]')

 

 

If you send me the link to the HTML containing the csv I can try to help you.
Does it need to be in Log Analytics/Azure Sentinel using KQL?

@jjsantanna

Please find the link details : https://sslbl.abuse.ch/blacklist/ja3_fingerprints.csv

Also yes I was running this data in Azure sentinel.

There you go @Sanket26 

externaldata (Everything:string) [@"https://sslbl.abuse.ch/blacklist/ja3_fingerprints.csv"] with (format="txt",ignoreFirstRecord=true) // reading each line as a string
| where Everything !startswith "#" //removing the lines that started with '#'
| project Everything=parse_csv(Everything) // parsing the string as csv
| project ja3_md5=Everything[0],Firstseen=Everything[1],Lastseen=Everything[2], Listingreason=Everything[3] //splitting the csv into columns

I've added some comments for you to know what I was doing.

Let me know if this was helpful! 

@jjsantanna

Thank a lot. It really helped. The issue is resolved. I am now able to fetch data directly from the http page. The part I was missing was I didn't perform the parsing on the csv as a result I wasn't getting the schema as expected.

@Sanket26 Maybe its just me...but we are talking about security, right? While you *can* access data over https/http/remote_locations, is that really a best practice? The link I provided to the information earlier was to ensure that your blacklist/whitelist information was being stored within your own tenant/network. I suspect, if you have Analytics Rules enabled, that URL you shared may show up as an entity for an investiation. 🙂

@Rod_Trent 

I totally agree to your point. The https link I provided is just a sample to identify if there is any option where these feeds can be directly utilized in kusto. Accessing those data over third party https/http/remote_locations is definitely not a best security practice. We will be uploading these to our internal websites and from there we will be accessing those.

Let me know if this clarifies your concern?

@Sanket26 It does. I also appreciate it. It gives me some things to think about from a security perspective that we may need to look at from a product view.

Great discussion. My point on answering the question was to show that the product can do more than people have in mind.

Great content and examples!  Thank you for taking the time to share.   Do you know if you can have a dynamic StorageConnectionString rather than a static one (@"https://storageaccount.blob.core.windows.net/storagecontainer/users.txt")?  Would like to query something like "https://source/json/.../173.0.x.x" @jjsantanna 

1 best response

Accepted Solutions
best response confirmed by Sanket26 (Microsoft)