Forum Discussion

CareLess's avatar
CareLess
Copper Contributor
Jul 09, 2020
Solved

Powerquery keep just substring

Hey all,

 

I Have records with a string that can have 1 ore more substring in surrounded by brackets.

I just want those substrings, the rest i want to toss.

Any handy tips or functions i can use?
 In vba it's simple:
[Quick and dirty looping through the whole string]

Function Return_PartialString(ByVal ls_string As String) As String
'simple loop through string and get what we want from it.

'decl vars
Dim lb_wanted As Boolean
Dim ls_newstring As String

'init vars
lb_wanted = False
ls_newstring = ""

For i = 1 To Len(ls_string)

If lb_wanted = True Then
ls_newstring = ls_newstring & Mid(ls_string, i, 1)
End If

If Mid(ls_string, i, 1) = "{" Then
lb_wanted = True
End If

If Mid(ls_string, i, 1) = "}" Then
lb_wanted = False
End If

Next i
ls_newstring = Replace(ls_newstring, "}", " ")

Return_PartialString = ls_newstring
End Function

 

[Slightly more neat using a split function]

Function Return_PartialString_Strict(ByVal ls_string As String) As String
'simple loop through string and get what we want from it.

'decl vars
Dim ls_newstring As String
Dim las_string() As String

'init vars
ls_newstring = ""
las_string = Split(ls_string, "}")

For i = LBound(las_string) To UBound(las_string)
If InStr(1, las_string(i), "{", vbTextCompare) > 0 Then
ls_newstring = Replace(ls_newstring & Mid(las_string(i), InStr(1, las_string(i), "{", vbTextCompare) + 1), " ", "_") & "#"
End If
Next i

ls_newstring = Replace(ls_newstring, "#", " ")



Return_PartialString_Strict = ls_newstring
End Function

 

I am hoping there is something like this i can do in power query? Other then dozens of nested if you escape when there are none left or something.

 

Edit: Put both into a code block for more readability.

  • CareLess 

    Perhaps I misunderstood your requirements, initial code returns only very internal string. 

    If all substrings are to be returned, you may

    1) Use almost UI only solutions - split using variant of splitter with Any, list delimiters {"{","}"} and with some gap on number of columns; unpivot all columns of the result

     

    2) Use formula which splits all texts into the lists and expand them

    let
        Source = Table1,
        splitTexts = Table.AddColumn(
            Source,
            "Custom",
            each List.RemoveMatchingItems(Text.SplitAny([A],"{}"),{""})
        ),
        expandSubstrings = Table.ExpandListColumn(
            splitTexts,
            "Custom")
    in
        expandSubstrings

     

    3) Use Imke's function which do practically the same, but is more universal solution

     

    All 3 variants are in attached file.

6 Replies

  • SergeiBaklan's avatar
    SergeiBaklan
    Diamond Contributor

    CareLess 

    It could be done from user interface without coding:

    - split column by right most opening bracket

    - split second column by left most opening bracket

    - remove other but middle column

     

    Generate script will be like

    let
        Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
        #"Split Column by Delimiter" = Table.SplitColumn(Source, "A", Splitter.SplitTextByEachDelimiter({"{"}, QuoteStyle.Csv, true), {"A.1", "A.2"}),
        #"Split Column by Delimiter1" = Table.SplitColumn(#"Split Column by Delimiter", "A.2", Splitter.SplitTextByEachDelimiter({"}"}, QuoteStyle.Csv, false), {"A.2.1", "A.2.2"}),
        #"Removed Other Columns" = Table.SelectColumns(#"Split Column by Delimiter1",{"A.2.1"})
    in
        #"Removed Other Columns"
    • CareLess's avatar
      CareLess
      Copper Contributor

      SergeiBaklan 

       

      I responded, but i don't see it appear.. so another response than... 🙂
      I found a function in M to do it, if your  solution does the same it's probably easier and maybe performs better. I will check it out when i have the time and i'll let you know. Thanks!

       

      let func =   
       (InputText as text, StartDelimiter as text, EndDelimiter as text, optional Alternative_Dummy as text) =>
      
      let
          // If the end delimiter contains the same characters than the start delimiter then that portion has to be replaced in the end parameter by a dummy. Otherwise the query fails.
          AlternativeDummy = Alternative_Dummy,
          Dummy = if AlternativeDummy is null then "*+*" else AlternativeDummy,
          String = InputText,
          ParaStart = StartDelimiter,
          ParaEnd = EndDelimiter,
          
          // If the delimiters are identical, a different logic has to be applied.
          IdenticalDelimiters = ParaEnd = ParaStart,
          // Start of logic for identical delimiters
          SplitText = Text.Split(String, ParaStart),
          ResultIdenticalDelimiters = List.Alternate(SplitText,1,1),
          
          // Start of logic for different start- and end-delimiters: Create dummy end-delimiter.
          DummyParaEnd = Text.Replace(ParaEnd, ParaStart, Dummy),
          DummyString = Text.Replace(String, ParaEnd, DummyParaEnd),
          SplitByParaStart = List.Skip(Text.Split(DummyString, ParaStart),1),
          ConvertToTable = Table.FromList(SplitByParaStart, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
          ExtractBeforeParaEnd = Table.AddColumn(ConvertToTable, "Result", each Text.BeforeDelimiter([Column1], DummyParaEnd)),
          Cleanup = Table.RemoveColumns(ExtractBeforeParaEnd,{"Column1"})[Result],
          
          Result = if Text.Contains(String, ParaStart) and Text.Contains(String, ParaEnd) 
                      then (if IdenticalDelimiters 
                              then ResultIdenticalDelimiters 
                              else Cleanup) 
                      else error Error.Record( "At least one of the delimiters is not found in the string" )
      in
          Result ,
      documentation = [
      Documentation.Name =  " Text.BetweenDelimitersOccAll ",
      Documentation.Description = " Returns  all occurrances of text strings between a delimiter pair.  ",
      Documentation.LongDescription = " Returns  all occurrances of text strings between a delimiter pair. Optional parameter Alternative_Dummy can be used if Dummy shall not be *+*. ",
      Documentation.Category = " Text ",
      Documentation.Source = " https://wp.me/p6lgsG-Y5 ",
      Documentation.Version = " 1.0 ",
      Documentation.Author = " Imke Feldmann: www.TheBIccountant.com: https://wp.me/p6lgsG-Y5  . ",
      Documentation.Examples = {[Description =  " See this blogpost: https://wp.me/p6lgsG-Y5 ",
      Code = " TextBetweenDelimitersOccAll(""Meanwhile I prefer *Python!* to *R!*"", ""*"", ""!*"") ",
      Result = " {""Python"", ""R""} "]}]
        
       in  
        Value.ReplaceType(func, Value.ReplaceMetadata(Value.Type(func), documentation))

       

      • SergeiBaklan's avatar
        SergeiBaklan
        Diamond Contributor

        CareLess 

        Perhaps I misunderstood your requirements, initial code returns only very internal string. 

        If all substrings are to be returned, you may

        1) Use almost UI only solutions - split using variant of splitter with Any, list delimiters {"{","}"} and with some gap on number of columns; unpivot all columns of the result

         

        2) Use formula which splits all texts into the lists and expand them

        let
            Source = Table1,
            splitTexts = Table.AddColumn(
                Source,
                "Custom",
                each List.RemoveMatchingItems(Text.SplitAny([A],"{}"),{""})
            ),
            expandSubstrings = Table.ExpandListColumn(
                splitTexts,
                "Custom")
        in
            expandSubstrings

         

        3) Use Imke's function which do practically the same, but is more universal solution

         

        All 3 variants are in attached file.

    • CareLess's avatar
      CareLess
      Copper Contributor

      SergeiBaklan 

      Thanks for your answer. I'm going to try that as well.

      I found a function to do this, but if your method works it's probably easier and better.

      I'll implement it when I have the time. Thank you!

       

      let func =   
       (InputText as text, StartDelimiter as text, EndDelimiter as text, optional Alternative_Dummy as text) =>
      
      let
          // If the end delimiter contains the same characters than the start delimiter then that portion has to be replaced in the end parameter by a dummy. Otherwise the query fails.
          AlternativeDummy = Alternative_Dummy,
          Dummy = if AlternativeDummy is null then "*+*" else AlternativeDummy,
          String = InputText,
          ParaStart = StartDelimiter,
          ParaEnd = EndDelimiter,
          
          // If the delimiters are identical, a different logic has to be applied.
          IdenticalDelimiters = ParaEnd = ParaStart,
          // Start of logic for identical delimiters
          SplitText = Text.Split(String, ParaStart),
          ResultIdenticalDelimiters = List.Alternate(SplitText,1,1),
          
          // Start of logic for different start- and end-delimiters: Create dummy end-delimiter.
          DummyParaEnd = Text.Replace(ParaEnd, ParaStart, Dummy),
          DummyString = Text.Replace(String, ParaEnd, DummyParaEnd),
          SplitByParaStart = List.Skip(Text.Split(DummyString, ParaStart),1),
          ConvertToTable = Table.FromList(SplitByParaStart, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
          ExtractBeforeParaEnd = Table.AddColumn(ConvertToTable, "Result", each Text.BeforeDelimiter([Column1], DummyParaEnd)),
          Cleanup = Table.RemoveColumns(ExtractBeforeParaEnd,{"Column1"})[Result],
          
          Result = if Text.Contains(String, ParaStart) and Text.Contains(String, ParaEnd) 
                      then (if IdenticalDelimiters 
                              then ResultIdenticalDelimiters 
                              else Cleanup) 
                      else error Error.Record( "At least one of the delimiters is not found in the string" )
      in
          Result ,
      documentation = [
      Documentation.Name =  " Text.BetweenDelimitersOccAll ",
      Documentation.Description = " Returns  all occurrances of text strings between a delimiter pair.  ",
      Documentation.LongDescription = " Returns  all occurrances of text strings between a delimiter pair. Optional parameter Alternative_Dummy can be used if Dummy shall not be *+*. ",
      Documentation.Category = " Text ",
      Documentation.Source = " https://wp.me/p6lgsG-Y5 ",
      Documentation.Version = " 1.0 ",
      Documentation.Author = " Imke Feldmann: www.TheBIccountant.com: https://wp.me/p6lgsG-Y5  . ",
      Documentation.Examples = {[Description =  " See this blogpost: https://wp.me/p6lgsG-Y5 ",
      Code = " TextBetweenDelimitersOccAll(""Meanwhile I prefer *Python!* to *R!*"", ""*"", ""!*"") ",
      Result = " {""Python"", ""R""} "]}]
        
       in  
        Value.ReplaceType(func, Value.ReplaceMetadata(Value.Type(func), documentation))

       

Resources