Forum Discussion

marked-data's avatar
marked-data
Copper Contributor
Mar 21, 2019

Best practice for tracking data source

I have 50 computers that publish 10 different flat files which I ingest into data explorer where I combine the information by Timestamps.

 

As well as aggregations across all the computers I need to drill down to a particular source and particular file, so somehow need to keep track of the filename and source where the data came from.

 

I could add columns for filename and source name/IP for each log line during ingestion (inefficient), or create a database for each source (tough to manage as computers come and go), or ....

 

What's the correct way to maintain traceability to data sources in Data Explorer? 

4 Replies

    • marked-data's avatar
      marked-data
      Copper Contributor

      Hey Yoni, 

       

      Indeed, adding columns was my first thought. However using that method I'll have one column of data and many many columns of tracing info. Thought there might be a more efficient way. perhaps using a tag per file.

       

      Regarding the method of defining additional columns with the ingestion string, my data is ingested by an Event Hub subscription using a Column mapping (using this tutorial: https://docs.microsoft.com/en-us/azure/data-explorer/ingest-data-event-grid)

       

      Where can I find the column mapping I created to add the additional commands?

      Where can I get the source and filename from to add? Is there a list of parameters that can be used in the ingest string $source, $blob_name etc. 

       

      Thanks.

       

Resources