SOLVED

Select text from split function

Brass Contributor

Hi hope someone can help, (I also hope I can explain this issue)

 

I created a pipeline to bring in a CSV, stick it in blob storage and then modify it and stick it in a sql database.

 

But while using data flow to help tidy the contents up I've come unstuck.  I created a derived column to split rdfsLabel which contains names of stuff in different languages.  Each separated with a |.  The issue is that there's no consistency with what order each language is in and each time I run the pipeline the order can change from source.  

outline issue.png

 

Can someone give me pointer on how to populate a column with the text from the string with @en at the end, once I get this I can then duplicate this for each of the languages and then go in and create another derived column and trim out the language identifiers.

 

I'm hoping its something really silly that I've missed. 

 

Thanks in advance

 

John

9 Replies

@John Dorrian, Can you share some sample records for this field from the source and the final targeted fields that define how do you want the data to be inserted in destination fields?

@SLalwani 

 

Its an open data set and the link I'm using is https://data.food.gov.uk/codes/reference-number/authority?_format=csv&_view=with_metadata 

 

To note datafactory doesn't like the "@id" title so to get round this I created sql table and then deleted first row.

 

I was going to create another field called Name, and NameCY to put the content of the arrays but this is where I'm having issues.

 

Thanks for offering to look

 

John.

@John Dorrian , I can see various values in the specified field as follows , 

'Asiantaeth Safonau Bwyd'@cy|'Food Standards Agency'@en ,
'Adur District Council'@en,
...

 Please confirm that you need to just filter out the substring which is depicting the language @en. , i.e., 

'Food Standards Agency'
'Adur District Council'
...

 
For your NOTE: datafactory doesn't like headers starting with '@' , rather than creating a SQL table, you can just enable 'skip n rows' to 1 from blob dataset settings.

Regards,
Sunaina

If that's an easier workround to get started then yes if its a case of filtering by @en. I have to say I'm just dipping in and out of Data Factory as the need arises but I really need to commit a lot more time on this.

I could then duplicate the original column and create another filter for @cy when required.

@John Dorrian No need to do duplicacy over the column, you can create a new derived column from this as I assume you need @en as your values, so just split with '|' and then in the next step use another derived column to select an index value prior to '@en' index from split array column from the previous step.

@SLalwani 

 

Thanks I did manage the split column part on the |, apologies I am a noob and couldn't find an index value from the list of expression elements.

 

I've looked at the "byitem" and "byname" functions but can't see how to get these to select the entry with @en in the string.

best response confirmed by John Dorrian (Brass Contributor)
Solution

Hey @John Dorrian , tried the expression builder and here you go.

regexp_adf.PNG

output_regexp_adf.PNG

 

Hope this is what you were looking for and I might have resolved your issue.

If so, kindly mark this reply as an answer or upvote here!

Thanks and regards,
Sunaina Lalwani

Thanks for this I was miles of and making derived columns of derived columns.

Are there any resources you'd recommend using to get a better idea of what functions to use and when, or is it more a case of practice practice and practice with a lot of trial and error.

Kind Regards

John

@John Dorrian, there is no such thing that I do for this, but whatever I need to do or want, I'll just figure out the possible functions and logic and then hit and try the possible functions in the mapping data flow.
You can follow https://docs.microsoft.com/en-us/azure/data-factory/data-flow-expression-functions expression language as your reference guide.

Thanks and Regards,
Sunaina 

1 best response

Accepted Solutions
best response confirmed by John Dorrian (Brass Contributor)
Solution

Hey @John Dorrian , tried the expression builder and here you go.

regexp_adf.PNG

output_regexp_adf.PNG

 

Hope this is what you were looking for and I might have resolved your issue.

If so, kindly mark this reply as an answer or upvote here!

Thanks and regards,
Sunaina Lalwani

View solution in original post