Forum Discussion

Insert_Key's avatar
Insert_Key
Copper Contributor
Nov 04, 2024
Solved

Power Query - removing identical generic text from the start and end of multiple columns

Hi all. I am after a hand in removing some default system values from a Jira export that I am using to build a report around.

 

When multiple values are recorded in the system, a new column is created in the export and number is appended to any duplicate columns example "Label" and "Label2". I previously received assistance in understanding the code to merge the values in these columns and rename them, thank you Lorenzo. For any columns that have required merging, I have been able to address their naming as part of the process, but I have multiple columns whose headers still require attention and can be handled as part of a grouped transformation process due to each containing identical redundant values. 

 

When a custom field is created, Jira wraps the value with "Custom field (" and ")", for example "Custom field (Country of Origin)". Some of the headers have valid values contained within brackets, so I can't simply replace all brackets - I have tried using the "Custom field (" as a string to identify relevant headers to use Text.Start(_,Text.Length(_)-1) to get rid of the trailing bracket and then Text.End(_,Text.Length(_)-14) to remove the leading text... but ended up with a list of correct headers but no data. Clearly out of my depth but felt frustratingly close. I also attempted to use List.Transform(Table.ColumnNames( ) and removed the leading text and bracket... but then couldn't get rid of the trailing bracket.

 

I had a fair crack at it today but haven't managed to get it to do what want and now my brain hurts, so have created a very simplified version of what I am trying to do and am seeking your help. In the Output tab of the sample file, merging of columns has been completed and I would like the columns Custom field (Weight), Custom field (Organic (Y/N)), and Custom field (Multipack (Y/N)) transformed into Weight, Organic (Y/N), and Multipack (Y/N). I am after a solution that searches for and replaces text in the headers rather than explicitly referencing column names as the export's contents are variable and dynamic in nature, and there are far more of them. Thanks in advance ๐Ÿ™‚๐Ÿค—

  • Lorenzo's avatar
    Lorenzo
    Nov 08, 2024

    Hi Insert_Key 

    Thanks for sharing all what you shared and TBH I wouldn't have bet on GenAI regarding data types. What they came up with is very similar to what I've seen so far and what I referred to in my previous reply. Though, AI got it wrong when it comes to numbers vs. whole numbers... In the workbook you shared - just for testing purpose - enter a decimal number in column [Parent id] then refresh the query... ๐Ÿ˜’

    In attached v1.2 I fixed it as follow:

    DetectDataType = (column as list) as type =>
    if List.IsEmpty(column) then type nullable text // Default to text if empty
    else 
        let
            sampleValue = List.First(List.RemoveNulls(column)), // Get the first non-null value
            valueType = Value.Type(sampleValue)
        in
            if valueType = type text then type nullable text
            else if valueType = type number and (sampleValue = Number.Round(sampleValue)) then type nullable Int64.Type // Check for whole numbers
            else if valueType = type number then type nullable number   // Decimal numbers
            else if valueType = type logical then type nullable logical
            else if valueType = type date or valueType = type datetime then type nullable datetime // Handle Date/Time
            else type nullable text, // Fallback to text

    and IMHO the last else should be type nullable any and not text but that doesn't matter in your context as any will end as text once the query loads to the Data Model

    A comment, just in case... I see you added a Sort step at the end of your query. I guess that's for checks on the query output within Excel. Otherwise, as far as I know, the Data Model doesn't mind about ordered data

    Last thing. Now that I understand your solution has no external connection, you can slightly speed up the time to refresh your query by disabling the PQ Firewall. Excel data (tab) > Get Data > Query Options... > CURRENT WORKBOOK > Privacy > Choose Ignore the Privacy Levels...

    Hope this helps and makes sense

24 Replies

  • Lorenzo's avatar
    Lorenzo
    Silver Contributor

    Insert_Key 

    Attached Query3 merges the Label & Custom fields in 1 step with List.Accumulate:

    let
        Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
        SourceLabelColumnNames = Table.ColumnNames( Source ),
        TransformedList = List.Transform(SourceLabelColumnNames, each
            if Text.Contains(_, "(" ) then Text.BetweenDelimiters(_, "(", ")" )
            else try Text.Range(_, 0, Text.PositionOfAny(_, {"0".."9"} ) ) otherwise _
        ),
        DistinctList = List.Distinct( TransformedList ),
        ColumnsToMerge = List.Distinct( List.Difference( TransformedList, DistinctList ) ),
        MergedColumns = List.Accumulate( ColumnsToMerge, Source,
            (accum,name) as table =>
                Table.CombineColumns(
                    accum,
                    List.Select( SourceLabelColumnNames, each Text.Contains(_, name ) ),
                    each Combiner.CombineTextByDelimiter(" ; ", QuoteStyle.None)(List.RemoveNulls(_) ),
                    name
                )
        ),
        Result = if List.IsEmpty( ColumnsToMerge ) then Source else MergedColumns
    in
        Result
    • Insert_Key's avatar
      Insert_Key
      Copper Contributor

      Thanks for your help again, Lorenzo 

       

      Unfortunately, when I open the files, the renaming does not appear to have been successful: 

      If I didn't do a good job of explaining myself in my post, please see this image that Riny_van_Eekelen posted above as the end result of his solution:

      The column names above are exactly what I am after in this sample table ๐Ÿ™‚

      • Lorenzo's avatar
        Lorenzo
        Silver Contributor

        Insert_Key 

        A better option (in Query5):

            TransformedHeaders = Table.TransformColumnNames( MergedColumns,
                (name) => 
                    if Text.Contains( name, "(" )
                    then Text.BeforeDelimiter(
                            Text.AfterDelimiter( name , "(" ), ")",
                            {0, RelativePosition.FromEnd}
                         )
                    else name
            ),
            Result = if List.IsEmpty( ColumnsToMerge ) then Source else TransformedHeaders
        in
            Result

         

  • Lorenzo's avatar
    Lorenzo
    Silver Contributor

    Insert_Key 

     

    Looked at identifying the custom fields to merge in a more direct way, logic is:

        ListCustomFields = List.Skip( SourceLabelColumnNames,
            List.PositionOf( SourceLabelColumnNames, List.Last( ListLabelColumns ) ) +1
        ),
        TransformedList = List.Transform( ListCustomFields, each
            Text.BetweenDelimiters( _ , "(", ")" )
        ),
        RemovedDuplicates = List.Distinct( TransformedList ),
        ListsDifference = List.Difference( TransformedList, RemovedDuplicates ),
        ListOfCustomFieldsToMerge = List.Select( ListCustomFields, each
            Text.Contains(_, List.First( ListsDifference ) )
        ),

    in attached Query2

  • Riny_van_Eekelen's avatar
    Riny_van_Eekelen
    Platinum Contributor

    Insert_Key 

    You wrote that all column merges are completed and you are now left with removing "Custom field (" and the right-most ")", without explicitly referencing column names.

     

    Would that not work by inserting the following steps to your existing query?

     

        Unpivot = Table.UnpivotOtherColumns(#"Created merged ""Country of Origin"" Column", {"ID", "Description", "Labels"}, "Attribute", "Value"),
        Split = Table.SplitColumn(Unpivot, "Attribute", Splitter.SplitTextByEachDelimiter({")"}, QuoteStyle.Csv, true), {"Attribute", "Attribute.1"}),
        Replace = Table.ReplaceValue(Split,"Custom field (","",Replacer.ReplaceText,{"Attribute"}),
        Remove = Table.RemoveColumns(Replace,{"Attribute.1"}),
        Pivot = Table.Pivot(Remove, List.Distinct(Remove[Attribute]), "Attribute", "Value")
    in
        Pivot

     

    The end result will look like this:

    Please ignore is I've misunderstood.

    • Insert_Key's avatar
      Insert_Key
      Copper Contributor

      Riny_van_Eekelen - thank you for your help!

       

      You definitely haven't misunderstood; the outcome is precisely what I am looking for. Unfortunately, the sample data I provided is massively simplified compared to the real file, which, I believe contains something in the order of 35 to 40 initial columns, 10 to 15 of which would need to be addressed by this renaming process.

       

      Your solution works flawlessly, but I feel using a "search and replace" within the header text would be a better fit for me. I work with Jira exports quite a lot and they are often peppered with these "Custom field" header names. With this simple example, only three columns ("ID", "Description", "Labels") needed to be excluded from the unpivot, but for me to implement it in other exports I'd have to specify many more than that - maybe 15 to 20 - and to customise the majority of those for each export. None of this additional context was explained in my post or sample, apologies; I was trying to present the bare requirements with minimal information.

      I might be wrong in my thinking, but the "search and replace" approach feels more flexible without needing to significantly adjust the code each time. Thanks again ๐Ÿ™‚

  • Lorenzo's avatar
    Lorenzo
    Silver Contributor

    Hi Insert_Key 

     

    See if attached Query1 is fine (very quick work) for what you need to do as I made the following assumption:

        ListCustomFields = List.Skip( SourceLabelColumnNames,
            List.PositionOf( SourceLabelColumnNames, List.Last( ListLabelColumns ) ) +1
        ),

    (if you need explanation feel free)

Resources