Data Transformation Question

Copper Contributor

Hey team,


I'm working through this Bridewell transformation for the CommonSecurityLog table(


In here, there are a ton of "extend" operators creating new Columns based on the Columns that are already created in logs. For example, | extend Category_CF = tostring(f[“Category”])


Does creating a custom field to replace a preexisting field, add cost or subtract cost from the transformation?


This is also a part of the tutorial built into Microsoft, but they don't include the reasoning behind creating a new column:


Essentially we are creating all of these fields, and then project-away'ing it in the last line of code.


Let me know your thoughts on this. It seems like an extra way to do things since you can just "project-away" everything you don't want to see, but trying to figure out if there is a specific reason for using the "extend" operator in this sense.


Thanks guys,


1 Reply



When you say 'new Columns based on the Columns that are already created in logs', do you mean that the specific Columns exist in the logs that you're ingesting? If so, maybe it's already being handled by a Syslog box and therefore no further transformation is necessary. The assumption in that Bridewell post is that you're ingesting json format and want to parse it accordingly.


Overall, the idea behind data transformation is to make the logs more useful, eligible, but also potentially save space.


Let's assume that your ingested logs contain a field called X in a json format where some useful data is being held. Before applying transformation, the data is difficult to decipher to a human reader, makes querying it more complex and less efficient at the same time (think of indexing!).

Scenario 1 : You extend ALL the attribute:value tuples from the json and project-away the X.

You're essentially surfacing all the data and make it easier to manipulate, sort etc. This operation WILL have some impact on the actual size of the log, and in effect cost. My guess is that it would be heavily dependent on a use case, but overall negligible in a grand scheme of things. And in return your data is now formatted in a way that makes it easier to work with.

Scenario 2 : Maybe you don't need ALL of what's in the original json?

Afterall, you can pick and choose which bits of data are needed and omit the ones of no value to you, and this would actually make it more cost-efficient as you're stripping the raw logs of any useless data and only ingest (and pay) for what's actually useful.