Forum Discussion
Powerquery keep just substring
Hey all,
I Have records with a string that can have 1 ore more substring in surrounded by brackets.
I just want those substrings, the rest i want to toss.
Any handy tips or functions i can use?
In vba it's simple:
[Quick and dirty looping through the whole string]
Function Return_PartialString(ByVal ls_string As String) As String
'simple loop through string and get what we want from it.
'decl vars
Dim lb_wanted As Boolean
Dim ls_newstring As String
'init vars
lb_wanted = False
ls_newstring = ""
For i = 1 To Len(ls_string)
If lb_wanted = True Then
ls_newstring = ls_newstring & Mid(ls_string, i, 1)
End If
If Mid(ls_string, i, 1) = "{" Then
lb_wanted = True
End If
If Mid(ls_string, i, 1) = "}" Then
lb_wanted = False
End If
Next i
ls_newstring = Replace(ls_newstring, "}", " ")
Return_PartialString = ls_newstring
End Function
[Slightly more neat using a split function]
Function Return_PartialString_Strict(ByVal ls_string As String) As String
'simple loop through string and get what we want from it.
'decl vars
Dim ls_newstring As String
Dim las_string() As String
'init vars
ls_newstring = ""
las_string = Split(ls_string, "}")
For i = LBound(las_string) To UBound(las_string)
If InStr(1, las_string(i), "{", vbTextCompare) > 0 Then
ls_newstring = Replace(ls_newstring & Mid(las_string(i), InStr(1, las_string(i), "{", vbTextCompare) + 1), " ", "_") & "#"
End If
Next i
ls_newstring = Replace(ls_newstring, "#", " ")
Return_PartialString_Strict = ls_newstring
End Function
I am hoping there is something like this i can do in power query? Other then dozens of nested if you escape when there are none left or something.
Edit: Put both into a code block for more readability.
Perhaps I misunderstood your requirements, initial code returns only very internal string.
If all substrings are to be returned, you may
1) Use almost UI only solutions - split using variant of splitter with Any, list delimiters {"{","}"} and with some gap on number of columns; unpivot all columns of the result
2) Use formula which splits all texts into the lists and expand them
let Source = Table1, splitTexts = Table.AddColumn( Source, "Custom", each List.RemoveMatchingItems(Text.SplitAny([A],"{}"),{""}) ), expandSubstrings = Table.ExpandListColumn( splitTexts, "Custom") in expandSubstrings
3) Use Imke's function which do practically the same, but is more universal solution
All 3 variants are in attached file.
6 Replies
- SergeiBaklanDiamond Contributor
It could be done from user interface without coding:
- split column by right most opening bracket
- split second column by left most opening bracket
- remove other but middle column
Generate script will be like
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content], #"Split Column by Delimiter" = Table.SplitColumn(Source, "A", Splitter.SplitTextByEachDelimiter({"{"}, QuoteStyle.Csv, true), {"A.1", "A.2"}), #"Split Column by Delimiter1" = Table.SplitColumn(#"Split Column by Delimiter", "A.2", Splitter.SplitTextByEachDelimiter({"}"}, QuoteStyle.Csv, false), {"A.2.1", "A.2.2"}), #"Removed Other Columns" = Table.SelectColumns(#"Split Column by Delimiter1",{"A.2.1"}) in #"Removed Other Columns"
- CareLessCopper Contributor
I responded, but i don't see it appear.. so another response than... 🙂
I found a function in M to do it, if your solution does the same it's probably easier and maybe performs better. I will check it out when i have the time and i'll let you know. Thanks!let func = (InputText as text, StartDelimiter as text, EndDelimiter as text, optional Alternative_Dummy as text) => let // If the end delimiter contains the same characters than the start delimiter then that portion has to be replaced in the end parameter by a dummy. Otherwise the query fails. AlternativeDummy = Alternative_Dummy, Dummy = if AlternativeDummy is null then "*+*" else AlternativeDummy, String = InputText, ParaStart = StartDelimiter, ParaEnd = EndDelimiter, // If the delimiters are identical, a different logic has to be applied. IdenticalDelimiters = ParaEnd = ParaStart, // Start of logic for identical delimiters SplitText = Text.Split(String, ParaStart), ResultIdenticalDelimiters = List.Alternate(SplitText,1,1), // Start of logic for different start- and end-delimiters: Create dummy end-delimiter. DummyParaEnd = Text.Replace(ParaEnd, ParaStart, Dummy), DummyString = Text.Replace(String, ParaEnd, DummyParaEnd), SplitByParaStart = List.Skip(Text.Split(DummyString, ParaStart),1), ConvertToTable = Table.FromList(SplitByParaStart, Splitter.SplitByNothing(), null, null, ExtraValues.Error), ExtractBeforeParaEnd = Table.AddColumn(ConvertToTable, "Result", each Text.BeforeDelimiter([Column1], DummyParaEnd)), Cleanup = Table.RemoveColumns(ExtractBeforeParaEnd,{"Column1"})[Result], Result = if Text.Contains(String, ParaStart) and Text.Contains(String, ParaEnd) then (if IdenticalDelimiters then ResultIdenticalDelimiters else Cleanup) else error Error.Record( "At least one of the delimiters is not found in the string" ) in Result , documentation = [ Documentation.Name = " Text.BetweenDelimitersOccAll ", Documentation.Description = " Returns all occurrances of text strings between a delimiter pair. ", Documentation.LongDescription = " Returns all occurrances of text strings between a delimiter pair. Optional parameter Alternative_Dummy can be used if Dummy shall not be *+*. ", Documentation.Category = " Text ", Documentation.Source = " https://wp.me/p6lgsG-Y5 ", Documentation.Version = " 1.0 ", Documentation.Author = " Imke Feldmann: www.TheBIccountant.com: https://wp.me/p6lgsG-Y5 . ", Documentation.Examples = {[Description = " See this blogpost: https://wp.me/p6lgsG-Y5 ", Code = " TextBetweenDelimitersOccAll(""Meanwhile I prefer *Python!* to *R!*"", ""*"", ""!*"") ", Result = " {""Python"", ""R""} "]}] in Value.ReplaceType(func, Value.ReplaceMetadata(Value.Type(func), documentation))
- SergeiBaklanDiamond Contributor
Perhaps I misunderstood your requirements, initial code returns only very internal string.
If all substrings are to be returned, you may
1) Use almost UI only solutions - split using variant of splitter with Any, list delimiters {"{","}"} and with some gap on number of columns; unpivot all columns of the result
2) Use formula which splits all texts into the lists and expand them
let Source = Table1, splitTexts = Table.AddColumn( Source, "Custom", each List.RemoveMatchingItems(Text.SplitAny([A],"{}"),{""}) ), expandSubstrings = Table.ExpandListColumn( splitTexts, "Custom") in expandSubstrings
3) Use Imke's function which do practically the same, but is more universal solution
All 3 variants are in attached file.
- CareLessCopper Contributor
Thanks for your answer. I'm going to try that as well.
I found a function to do this, but if your method works it's probably easier and better.
I'll implement it when I have the time. Thank you!
let func = (InputText as text, StartDelimiter as text, EndDelimiter as text, optional Alternative_Dummy as text) => let // If the end delimiter contains the same characters than the start delimiter then that portion has to be replaced in the end parameter by a dummy. Otherwise the query fails. AlternativeDummy = Alternative_Dummy, Dummy = if AlternativeDummy is null then "*+*" else AlternativeDummy, String = InputText, ParaStart = StartDelimiter, ParaEnd = EndDelimiter, // If the delimiters are identical, a different logic has to be applied. IdenticalDelimiters = ParaEnd = ParaStart, // Start of logic for identical delimiters SplitText = Text.Split(String, ParaStart), ResultIdenticalDelimiters = List.Alternate(SplitText,1,1), // Start of logic for different start- and end-delimiters: Create dummy end-delimiter. DummyParaEnd = Text.Replace(ParaEnd, ParaStart, Dummy), DummyString = Text.Replace(String, ParaEnd, DummyParaEnd), SplitByParaStart = List.Skip(Text.Split(DummyString, ParaStart),1), ConvertToTable = Table.FromList(SplitByParaStart, Splitter.SplitByNothing(), null, null, ExtraValues.Error), ExtractBeforeParaEnd = Table.AddColumn(ConvertToTable, "Result", each Text.BeforeDelimiter([Column1], DummyParaEnd)), Cleanup = Table.RemoveColumns(ExtractBeforeParaEnd,{"Column1"})[Result], Result = if Text.Contains(String, ParaStart) and Text.Contains(String, ParaEnd) then (if IdenticalDelimiters then ResultIdenticalDelimiters else Cleanup) else error Error.Record( "At least one of the delimiters is not found in the string" ) in Result , documentation = [ Documentation.Name = " Text.BetweenDelimitersOccAll ", Documentation.Description = " Returns all occurrances of text strings between a delimiter pair. ", Documentation.LongDescription = " Returns all occurrances of text strings between a delimiter pair. Optional parameter Alternative_Dummy can be used if Dummy shall not be *+*. ", Documentation.Category = " Text ", Documentation.Source = " https://wp.me/p6lgsG-Y5 ", Documentation.Version = " 1.0 ", Documentation.Author = " Imke Feldmann: www.TheBIccountant.com: https://wp.me/p6lgsG-Y5 . ", Documentation.Examples = {[Description = " See this blogpost: https://wp.me/p6lgsG-Y5 ", Code = " TextBetweenDelimitersOccAll(""Meanwhile I prefer *Python!* to *R!*"", ""*"", ""!*"") ", Result = " {""Python"", ""R""} "]}] in Value.ReplaceType(func, Value.ReplaceMetadata(Value.Type(func), documentation))