Best practice for tracking data source

Occasional Contributor

I have 50 computers that publish 10 different flat files which I ingest into data explorer where I combine the information by Timestamps.

 

As well as aggregations across all the computers I need to drill down to a particular source and particular file, so somehow need to keep track of the filename and source where the data came from.

 

I could add columns for filename and source name/IP for each log line during ingestion (inefficient), or create a database for each source (tough to manage as computers come and go), or ....

 

What's the correct way to maintain traceability to data sources in Data Explorer? 

4 Replies

do the options mentioned in this blog post allow you to achieve your intention? (add another column, at ingestion time, but in a simple and efficient manner)

https://yonileibowitz.github.io/kusto.blog/blog-posts/ingestion-time-metadata.html 

Hey Yoni, 

 

Indeed, adding columns was my first thought. However using that method I'll have one column of data and many many columns of tracing info. Thought there might be a more efficient way. perhaps using a tag per file.

 

Regarding the method of defining additional columns with the ingestion string, my data is ingested by an Event Hub subscription using a Column mapping (using this tutorial: https://docs.microsoft.com/en-us/azure/data-explorer/ingest-data-event-grid)

 

Where can I find the column mapping I created to add the additional commands?

Where can I get the source and filename from to add? Is there a list of parameters that can be used in the ingest string $source, $blob_name etc. 

 

Thanks.

 

ColumnMapping.PNG

If your source data is formatted as JSON, a JSON mapping will allow you to specify 2 (and only these 2) special transformations: SourceLocation and SourceLineNumber, which enable you to enrich your records with both the name of the file that included the record, and the line number of that record in the source file: https://docs.microsoft.com/en-us/azure/kusto/management/mappings#json-mapping

 

as for viewing existing mappings which have already been created, you can use this command: https://docs.microsoft.com/en-us/azure/kusto/management/tables#show-ingestion-mappings

If the source data is TXT, is it possible to specify the SourceLocation and SourceLineNumber? I haven't been able to find a solution for this scenario.