Jan 28 2021 02:30 PM
Jan 28 2021 02:30 PM
Hi hope someone can help, (I also hope I can explain this issue)
I created a pipeline to bring in a CSV, stick it in blob storage and then modify it and stick it in a sql database.
But while using data flow to help tidy the contents up I've come unstuck. I created a derived column to split rdfsLabel which contains names of stuff in different languages. Each separated with a |. The issue is that there's no consistency with what order each language is in and each time I run the pipeline the order can change from source.
Can someone give me pointer on how to populate a column with the text from the string with @en at the end, once I get this I can then duplicate this for each of the languages and then go in and create another derived column and trim out the language identifiers.
I'm hoping its something really silly that I've missed.
Thanks in advance
Feb 01 2021 03:35 AM
Its an open data set and the link I'm using is https://data.food.gov.uk/codes/reference-number/authority?_format=csv&_view=with_metadata
To note datafactory doesn't like the "@id" title so to get round this I created sql table and then deleted first row.
I was going to create another field called Name, and NameCY to put the content of the arrays but this is where I'm having issues.
Thanks for offering to look
Feb 01 2021 04:31 AM
@John Dorrian , I can see various values in the specified field as follows ,
'Asiantaeth Safonau Bwyd'@cy|'Food Standards Agency'@en , 'Adur District Council'@en, ...
Please confirm that you need to just filter out the substring which is depicting the language @en. , i.e.,
'Food Standards Agency' 'Adur District Council' ...
For your NOTE: datafactory doesn't like headers starting with '@' , rather than creating a SQL table, you can just enable 'skip n rows' to 1 from blob dataset settings.
Feb 01 2021 04:39 AM
Feb 01 2021 04:43 AM
@John Dorrian No need to do duplicacy over the column, you can create a new derived column from this as I assume you need @en as your values, so just split with '|' and then in the next step use another derived column to select an index value prior to '@en' index from split array column from the previous step.
Feb 01 2021 08:27 AM
Thanks I did manage the split column part on the |, apologies I am a noob and couldn't find an index value from the list of expression elements.
I've looked at the "byitem" and "byname" functions but can't see how to get these to select the entry with @en in the string.
Feb 02 2021 06:51 AMSolution
Hey @John Dorrian , tried the expression builder and here you go.
Hope this is what you were looking for and I might have resolved your issue.
If so, kindly mark this reply as an answer or upvote here!
Thanks and regards,
Feb 02 2021 02:41 PM
Feb 02 2021 09:37 PM
@John Dorrian, there is no such thing that I do for this, but whatever I need to do or want, I'll just figure out the possible functions and logic and then hit and try the possible functions in the mapping data flow.
You can follow https://docs.microsoft.com/en-us/azure/data-factory/data-flow-expression-functions expression language as your reference guide.
Thanks and Regards,