Forum Discussion

Brass Contributor

Nov 03, 2024

Solved

Power Query - removing identical generic text from the start and end of multiple columns

Hi all. I am after a hand in removing some default system values from a Jira export that I am using to build a report around. When multiple values are recorded in the system, a new column is crea...

Sample Data.xlsx22 KB

BI & Data Analysis

excel

office 365

Lorenzo
Nov 07, 2024
Hi Insert_Key
Thanks for sharing all what you shared and TBH I wouldn't have bet on GenAI regarding data types. What they came up with is very similar to what I've seen so far and what I referred to in my previous reply. Though, AI got it wrong when it comes to numbers vs. whole numbers... In the workbook you shared - just for testing purpose - enter a decimal number in column [Parent id] then refresh the query... 😒
In attached v1.2 I fixed it as follow:
DetectDataType = (column as list) as type => if List.IsEmpty(column) then type nullable text // Default to text if empty else let sampleValue = List.First(List.RemoveNulls(column)), // Get the first non-null value valueType = Value.Type(sampleValue) in if valueType = type text then type nullable text else if valueType = type number and (sampleValue = Number.Round(sampleValue)) then type nullable Int64.Type // Check for whole numbers else if valueType = type number then type nullable number // Decimal numbers else if valueType = type logical then type nullable logical else if valueType = type date or valueType = type datetime then type nullable datetime // Handle Date/Time else type nullable text, // Fallback to text
and IMHO the last else should be type nullable any and not text but that doesn't matter in your context as any will end as text once the query loads to the Data Model
A comment, just in case... I see you added a Sort step at the end of your query. I guess that's for checks on the query output within Excel. Otherwise, as far as I know, the Data Model doesn't mind about ordered data
Last thing. Now that I understand your solution has no external connection, you can slightly speed up the time to refresh your query by disabling the PQ Firewall. Excel data (tab) > Get Data > Query Options... > CURRENT WORKBOOK > Privacy > Choose Ignore the Privacy Levels...
Hope this helps and makes sense

Insert_Key

Brass Contributor

Nov 07, 2024

Thanks Lorenzo 😊

I guess the most helpful thing for me to do is describe a common structure for my workbooks. I aim to build templates, so that I am not always reinventing the wheel each time that statistics are required.

First tab is a quick ReadMe/UserGuide for customers
Second tab is the raw data, formatted as an Excel table
The contents of next tab varies depending on the audience, and purpose of the file. If I load the transformed data for the customers to play with directly, this is where it resides.
The next tab would usually contain dynamic, interactive statistics and charts - a dashboard if you will
There may be a final hidden sheet, with helper tables etc, if I've included/required them

For my work, I have found that using a table to store my raw data has been more flexible or maybe more robust than using a range - and definitely more robust than linking to an external data source, as the files get sent everywhere. I've learned through experience that it's best for me to contain the raw data with the transformations and the output :)

The template contains a table with on the raw data tab with the header row and an empty first data row. The complete range of raw data is copied from the CSV and pasted as Values Only into the table - meaning that formatting is retained and that the table expands in size to fit. I hope that this answers your question 🙂

Trying to keep my skills contemporary, I hopped into some GenAI (tried Copilot, then Gemini Advanced, and then back to Copilot) for assistance with those last two things that I wanted to solve (convert blank to null and check Data Type) and am satisfied with the outcome. The amended QueryResult code is below and I've attached the file in case you are curious about what AI chose to do:

let
    Source = JiraExport,

    SourceHeaders = Table.FromColumns( { Table.ColumnNames( Source ) }, {"JiraHeader"} ),
    SelectedAssumedHeaders = Table.SelectRows( SourceHeaders, each
        Text.StartsWith( [JiraHeader], Labels_StartsWith ) or
        Text.StartsWith( [JiraHeader], CustomFields_StartsWith )
    ),
    HeaderStartNames = Table.AddColumn(SelectedAssumedHeaders, "JiraHeaderStartName", each
        if Text.StartsWith( [JiraHeader], Labels_StartsWith ) then Labels_StartsWith
        else Text.BeforeDelimiter( [JiraHeader], ")", {0, RelativePosition.FromEnd} )),
    GroupedStartNames = Table.Group( HeaderStartNames,
        {"JiraHeaderStartName"},
            {
                {"DATA", each _, type table},
                {"Count", each Table.RowCount(_), Int64.Type}
            }
    ),
    FilteredCount = Table.SelectRows( GroupedStartNames, each ( [Count] > 1 ) ),
    List_HeadersToMerge = Table.ExpandTableColumn(
        Table.SelectColumns( FilteredCount, {"DATA"} ),
        "DATA", {"JiraHeader"}, {"JiraHeaders"}
    )[JiraHeaders],
    List_HeadersStartsWith = FilteredCount[JiraHeaderStartName],
    MergedColumns = List.Accumulate( List_HeadersStartsWith, Source,
        (accum,name) as table =>
            Table.CombineColumns(
                accum,
                List.Select( List_HeadersToMerge, each Text.StartsWith(_, name ) ),
                each Combiner.CombineTextByDelimiter(" ; ", QuoteStyle.None)(List.RemoveNulls(_) ),
                name
            )
    ),
    ColumnsToRename = List.Select(
        Table.ColumnNames( MergedColumns ), each
        Text.StartsWith(_, CustomFields_StartsWith )
    ),
    NewColumnNames = List.Transform( ColumnsToRename, each
        let
            BetweenDelimiters = Text.BetweenDelimiters(_, CustomFields_StartsWith, ")" )
        in
            if Text.Contains( BetweenDelimiters, "(" ) then BetweenDelimiters & ")" else BetweenDelimiters
    ),
    RenamedMergedColumns = Table.RenameColumns( MergedColumns,
        List.Zip( { ColumnsToRename, NewColumnNames} )
    ),

    Result = if Table.IsEmpty( FilteredCount ) then Source else RenamedMergedColumns,
    EmptyToNull = Table.TransformColumns(
        Result,
        List.Transform(
            Table.ColumnNames(Result),
            (colName) => {colName, each if _ = "" then null else _}
        )
    ),

    // Function to detect data type
    DetectDataType = (column as list) as type =>
    if List.IsEmpty(column) then type nullable text // Default to text if empty
    else 
        let
            sampleValue = List.First(List.RemoveNulls(column)), // Get the first non-null value
            valueType = Value.Type(sampleValue)
        in
            if valueType = type text then type nullable text
            else if valueType = type number and (sampleValue = Number.Round(sampleValue)) then type nullable number // Check for whole numbers
            else if valueType = type logical then type nullable logical
            else if valueType = type date or valueType = type datetime then type nullable datetime // Handle Date/Time
            else type nullable text, // Fallback to text

    // Apply the function to each column
    ColumnTypes = List.Transform(
        Table.ColumnNames(EmptyToNull),
        (colName) => 
            let
                columnValues = Table.Column(EmptyToNull, colName) // Get the entire column values
            in
                {colName, DetectDataType(columnValues)} // Pass the column values to the function
    ),
    
    // Transform the table with detected types
    DetectDataTypes = Table.TransformColumnTypes(EmptyToNull, ColumnTypes),
    #"Sorted Rows" = Table.Sort(DetectDataTypes,{{"Issue key", Order.Ascending}})
in
    #"Sorted Rows"

PQ_JiraMergeRename_InsertKey_v1.1.xlsx36 KB

Lorenzo

Silver Contributor

Nov 07, 2024

Hi Insert_Key

Thanks for sharing all what you shared and TBH I wouldn't have bet on GenAI regarding data types. What they came up with is very similar to what I've seen so far and what I referred to in my previous reply. Though, AI got it wrong when it comes to numbers vs. whole numbers... In the workbook you shared - just for testing purpose - enter a decimal number in column [Parent id] then refresh the query... 😒

In attached v1.2 I fixed it as follow:

DetectDataType = (column as list) as type =>
if List.IsEmpty(column) then type nullable text // Default to text if empty
else 
    let
        sampleValue = List.First(List.RemoveNulls(column)), // Get the first non-null value
        valueType = Value.Type(sampleValue)
    in
        if valueType = type text then type nullable text
        else if valueType = type number and (sampleValue = Number.Round(sampleValue)) then type nullable Int64.Type // Check for whole numbers
        else if valueType = type number then type nullable number   // Decimal numbers
        else if valueType = type logical then type nullable logical
        else if valueType = type date or valueType = type datetime then type nullable datetime // Handle Date/Time
        else type nullable text, // Fallback to text

and IMHO the last else should be type nullable any and not text but that doesn't matter in your context as any will end as text once the query loads to the Data Model

A comment, just in case... I see you added a Sort step at the end of your query. I guess that's for checks on the query output within Excel. Otherwise, as far as I know, the Data Model doesn't mind about ordered data

Last thing. Now that I understand your solution has no external connection, you can slightly speed up the time to refresh your query by disabling the PQ Firewall. Excel data (tab) > Get Data > Query Options... > CURRENT WORKBOOK > Privacy > Choose Ignore the Privacy Levels...

Hope this helps and makes sense

Lorenzo
Silver Contributor
Nov 08, 2024
Insert_Key
You're very welcome. It's been a real pleasure as it's not common to have such exchange on forums & as PQ +/- challenging cases are not that frequent that's been an opportunity to do a bit of M coding
Re. Date vs Datetime: if your sampleValue is of type any and in your scenario it is, I don't see another option than checking if the Time value of the DateTime value = 0. Implemented in attached file where I used your [Parent id] column with a DateTime value:
else if valueType = type datetime and ( Number.From( Time.From( sampleValue ) ) = 0 ) then type date // Handle Date else if valueType = type datetime then type nullable datetime // Handle Date/Time

All the best... Sincerely
PQ_JiraMergeRename_InsertKey_v1.3.xlsx36 KB
Insert_Key
Brass Contributor
Nov 08, 2024
Thank you, again, so much Lorenzo !

Yes, Copilot couldn't solve the whole number scenario - I kept on pointing out the error of its ways but gave up after, I think, the third attempt because as I mentioned above "...I did modify JiraExport in its own query (i.e. so the source is updated before it's handled by your solution) by removing the Issue id and Parent id columns...". Outcome: the affected columns won't actually make it to the Data Model.

That said, your solution for it will no doubt be useful for other exports/reports and I will definitely use it 😊 I wish that this platform used post numbers, or time & date so that I could better reference quotes/comments!

I think I will also need to tweak the
else if valueType = type date or valueType = type datetime then type nullable datetime // Handle Date/Time
line, because I assume it will assign any date as datetime, which is *not* what I wanted 😆 but easily enough changed. I'll perhaps expand the entire ValueType argument to specify all Types that I anticipate; was just in too much of a rush today. Was doing quite a bit in Excel's front-end for other bits of work today.

I'll have a look to see if I can actually adjust the Privacy Levels in Excel/Power Query. Being that I am employed within s Government setting, everything is pretty locked down but I'll definitely investigate and very much appreciate the advice/insight. I've been waiting on a replacement work device for three months now! Apparently a hardware shortage, but it is allegedly in my city now. Hopefully worth the wait!

By the end of the day, I had taken my standalone development file and repacked its contents into my "master" file that I had used for all other Q3 reporting. As it now contains the Change Management data, I can start creating statistics around that next week, and need to develop something similar for another Service Notifications and then consolidate that back into the master, re-run the stats, and re-author an expanded, enhanced Q3 report.
Again, thank you so much for your help, curiosity and desire to find an effectively, sustainable, performant solution for me. It's genuinely appreciated! 👏👏👏