Forum Discussion
Data Transformation Question
When you say 'new Columns based on the Columns that are already created in logs', do you mean that the specific Columns exist in the logs that you're ingesting? If so, maybe it's already being handled by a Syslog box and therefore no further transformation is necessary. The assumption in that Bridewell post is that you're ingesting json format and want to parse it accordingly.
Overall, the idea behind data transformation is to make the logs more useful, eligible, but also potentially save space.
Let's assume that your ingested logs contain a field called X in a json format where some useful data is being held. Before applying transformation, the data is difficult to decipher to a human reader, makes querying it more complex and less efficient at the same time (think of indexing!).
Scenario 1 : You extend ALL the attribute:value tuples from the json and project-away the X.
You're essentially surfacing all the data and make it easier to manipulate, sort etc. This operation WILL have some impact on the actual size of the log, and in effect cost. My guess is that it would be heavily dependent on a use case, but overall negligible in a grand scheme of things. And in return your data is now formatted in a way that makes it easier to work with.
Scenario 2 : Maybe you don't need ALL of what's in the original json?
Afterall, you can pick and choose which bits of data are needed and omit the ones of no value to you, and this would actually make it more cost-efficient as you're stripping the raw logs of any useless data and only ingest (and pay) for what's actually useful.