Powerquery keep just substring

CareLess · ‎Jul 09 2020

Hey all,

I Have records with a string that can have 1 ore more substring in surrounded by brackets.

I just want those substrings, the rest i want to toss.

Any handy tips or functions i can use?
In vba it's simple:
[Quick and dirty looping through the whole string]

Function Return_PartialString(ByVal ls_string As String) As String
'simple loop through string and get what we want from it.

'decl vars
Dim lb_wanted As Boolean
Dim ls_newstring As String

'init vars
lb_wanted = False
ls_newstring = ""

For i = 1 To Len(ls_string)

If lb_wanted = True Then
ls_newstring = ls_newstring & Mid(ls_string, i, 1)
End If

If Mid(ls_string, i, 1) = "{" Then
lb_wanted = True
End If

If Mid(ls_string, i, 1) = "}" Then
lb_wanted = False
End If

Next i
ls_newstring = Replace(ls_newstring, "}", " ")

Return_PartialString = ls_newstring
End Function

[Slightly more neat using a split function]

Function Return_PartialString_Strict(ByVal ls_string As String) As String
'simple loop through string and get what we want from it.

'decl vars
Dim ls_newstring As String
Dim las_string() As String

'init vars
ls_newstring = ""
las_string = Split(ls_string, "}")

For i = LBound(las_string) To UBound(las_string)
If InStr(1, las_string(i), "{", vbTextCompare) > 0 Then
ls_newstring = Replace(ls_newstring & Mid(las_string(i), InStr(1, las_string(i), "{", vbTextCompare) + 1), " ", "_") & "#"
End If
Next i

ls_newstring = Replace(ls_newstring, "#", " ")



Return_PartialString_Strict = ls_newstring
End Function

I am hoping there is something like this i can do in power query? Other then dozens of nested if you escape when there are none left or something.

Edit: Put both into a code block for more readability.

Sergei Baklan · ‎Jul 09 2020

@CareLess

It could be done from user interface without coding:

- split column by right most opening bracket

- split second column by left most opening bracket

- remove other but middle column

Generate script will be like

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Split Column by Delimiter" = Table.SplitColumn(Source, "A", Splitter.SplitTextByEachDelimiter({"{"}, QuoteStyle.Csv, true), {"A.1", "A.2"}),
    #"Split Column by Delimiter1" = Table.SplitColumn(#"Split Column by Delimiter", "A.2", Splitter.SplitTextByEachDelimiter({"}"}, QuoteStyle.Csv, false), {"A.2.1", "A.2.2"}),
    #"Removed Other Columns" = Table.SelectColumns(#"Split Column by Delimiter1",{"A.2.1"})
in
    #"Removed Other Columns"

CareLess · ‎Jul 10 2020

@Sergei Baklan

Thanks for your answer. I'm going to try that as well.

I found a function to do this, but if your method works it's probably easier and better.

I'll implement it when I have the time. Thank you!

let func =   
 (InputText as text, StartDelimiter as text, EndDelimiter as text, optional Alternative_Dummy as text) =>

let
    // If the end delimiter contains the same characters than the start delimiter then that portion has to be replaced in the end parameter by a dummy. Otherwise the query fails.
    AlternativeDummy = Alternative_Dummy,
    Dummy = if AlternativeDummy is null then "*+*" else AlternativeDummy,
    String = InputText,
    ParaStart = StartDelimiter,
    ParaEnd = EndDelimiter,
    
    // If the delimiters are identical, a different logic has to be applied.
    IdenticalDelimiters = ParaEnd = ParaStart,
    // Start of logic for identical delimiters
    SplitText = Text.Split(String, ParaStart),
    ResultIdenticalDelimiters = List.Alternate(SplitText,1,1),
    
    // Start of logic for different start- and end-delimiters: Create dummy end-delimiter.
    DummyParaEnd = Text.Replace(ParaEnd, ParaStart, Dummy),
    DummyString = Text.Replace(String, ParaEnd, DummyParaEnd),
    SplitByParaStart = List.Skip(Text.Split(DummyString, ParaStart),1),
    ConvertToTable = Table.FromList(SplitByParaStart, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    ExtractBeforeParaEnd = Table.AddColumn(ConvertToTable, "Result", each Text.BeforeDelimiter([Column1], DummyParaEnd)),
    Cleanup = Table.RemoveColumns(ExtractBeforeParaEnd,{"Column1"})[Result],
    
    Result = if Text.Contains(String, ParaStart) and Text.Contains(String, ParaEnd) 
                then (if IdenticalDelimiters 
                        then ResultIdenticalDelimiters 
                        else Cleanup) 
                else error Error.Record( "At least one of the delimiters is not found in the string" )
in
    Result ,
documentation = [
Documentation.Name =  " Text.BetweenDelimitersOccAll ",
Documentation.Description = " Returns  all occurrances of text strings between a delimiter pair.  ",
Documentation.LongDescription = " Returns  all occurrances of text strings between a delimiter pair. Optional parameter Alternative_Dummy can be used if Dummy shall not be *+*. ",
Documentation.Category = " Text ",
Documentation.Source = " https://wp.me/p6lgsG-Y5 ",
Documentation.Version = " 1.0 ",
Documentation.Author = " Imke Feldmann: www.TheBIccountant.com: https://wp.me/p6lgsG-Y5  . ",
Documentation.Examples = {[Description =  " See this blogpost: https://wp.me/p6lgsG-Y5 ",
Code = " TextBetweenDelimitersOccAll(""Meanwhile I prefer *Python!* to *R!*"", ""*"", ""!*"") ",
Result = " {""Python"", ""R""} "]}]
  
 in  
  Value.ReplaceType(func, Value.ReplaceMetadata(Value.Type(func), documentation))

CareLess · ‎Jul 10 2020

@Sergei Baklan

I responded, but i don't see it appear.. so another response than... :)
I found a function in M to do it, if your solution does the same it's probably easier and maybe performs better. I will check it out when i have the time and i'll let you know. Thanks!

let func =   
 (InputText as text, StartDelimiter as text, EndDelimiter as text, optional Alternative_Dummy as text) =>

let
    // If the end delimiter contains the same characters than the start delimiter then that portion has to be replaced in the end parameter by a dummy. Otherwise the query fails.
    AlternativeDummy = Alternative_Dummy,
    Dummy = if AlternativeDummy is null then "*+*" else AlternativeDummy,
    String = InputText,
    ParaStart = StartDelimiter,
    ParaEnd = EndDelimiter,
    
    // If the delimiters are identical, a different logic has to be applied.
    IdenticalDelimiters = ParaEnd = ParaStart,
    // Start of logic for identical delimiters
    SplitText = Text.Split(String, ParaStart),
    ResultIdenticalDelimiters = List.Alternate(SplitText,1,1),
    
    // Start of logic for different start- and end-delimiters: Create dummy end-delimiter.
    DummyParaEnd = Text.Replace(ParaEnd, ParaStart, Dummy),
    DummyString = Text.Replace(String, ParaEnd, DummyParaEnd),
    SplitByParaStart = List.Skip(Text.Split(DummyString, ParaStart),1),
    ConvertToTable = Table.FromList(SplitByParaStart, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    ExtractBeforeParaEnd = Table.AddColumn(ConvertToTable, "Result", each Text.BeforeDelimiter([Column1], DummyParaEnd)),
    Cleanup = Table.RemoveColumns(ExtractBeforeParaEnd,{"Column1"})[Result],
    
    Result = if Text.Contains(String, ParaStart) and Text.Contains(String, ParaEnd) 
                then (if IdenticalDelimiters 
                        then ResultIdenticalDelimiters 
                        else Cleanup) 
                else error Error.Record( "At least one of the delimiters is not found in the string" )
in
    Result ,
documentation = [
Documentation.Name =  " Text.BetweenDelimitersOccAll ",
Documentation.Description = " Returns  all occurrances of text strings between a delimiter pair.  ",
Documentation.LongDescription = " Returns  all occurrances of text strings between a delimiter pair. Optional parameter Alternative_Dummy can be used if Dummy shall not be *+*. ",
Documentation.Category = " Text ",
Documentation.Source = " https://wp.me/p6lgsG-Y5 ",
Documentation.Version = " 1.0 ",
Documentation.Author = " Imke Feldmann: www.TheBIccountant.com: https://wp.me/p6lgsG-Y5  . ",
Documentation.Examples = {[Description =  " See this blogpost: https://wp.me/p6lgsG-Y5 ",
Code = " TextBetweenDelimitersOccAll(""Meanwhile I prefer *Python!* to *R!*"", ""*"", ""!*"") ",
Result = " {""Python"", ""R""} "]}]
  
 in  
  Value.ReplaceType(func, Value.ReplaceMetadata(Value.Type(func), documentation))

CareLess · ‎Jul 11 2020

@CareLess

Perhaps I misunderstood your requirements, initial code returns only very internal string.

If all substrings are to be returned, you may

1) Use almost UI only solutions - split using variant of splitter with Any, list delimiters {"{","}"} and with some gap on number of columns; unpivot all columns of the result

2) Use formula which splits all texts into the lists and expand them

let
    Source = Table1,
    splitTexts = Table.AddColumn(
        Source,
        "Custom",
        each List.RemoveMatchingItems(Text.SplitAny([A],"{}"),{""})
    ),
    expandSubstrings = Table.ExpandListColumn(
        splitTexts,
        "Custom")
in
    expandSubstrings

3) Use Imke's function which do practically the same, but is more universal solution

All 3 variants are in attached file.

CareLess · ‎Jul 11 2020

Thanks. I should have formulated it better. Thanks for all the effort.

Kind reg.

Sergei Baklan · ‎Jul 11 2020

@CareLess , you are welcome

CareLess · ‎Jul 11 2020

@CareLess

Perhaps I misunderstood your requirements, initial code returns only very internal string.

If all substrings are to be returned, you may

1) Use almost UI only solutions - split using variant of splitter with Any, list delimiters {"{","}"} and with some gap on number of columns; unpivot all columns of the result

2) Use formula which splits all texts into the lists and expand them

let
    Source = Table1,
    splitTexts = Table.AddColumn(
        Source,
        "Custom",
        each List.RemoveMatchingItems(Text.SplitAny([A],"{}"),{""})
    ),
    expandSubstrings = Table.ExpandListColumn(
        splitTexts,
        "Custom")
in
    expandSubstrings

3) Use Imke's function which do practically the same, but is more universal solution

All 3 variants are in attached file.

View solution in original post

Powerquery keep just substring

Powerquery keep just substring

Re: Powerquery keep just substring

Re: Powerquery keep just substring

Re: Powerquery keep just substring

Re: Powerquery keep just substring

Re: Powerquery keep just substring

Re: Powerquery keep just substring

Re: Powerquery keep just substring

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs

Most Active Hubs

Video Hub

Powerquery keep just substring