Forum Discussion
Power Query - removing identical generic text from the start and end of multiple columns
- Nov 08, 2024
Hi Insert_Key
Thanks for sharing all what you shared and TBH I wouldn't have bet on GenAI regarding data types. What they came up with is very similar to what I've seen so far and what I referred to in my previous reply. Though, AI got it wrong when it comes to numbers vs. whole numbers... In the workbook you shared - just for testing purpose - enter a decimal number in column [Parent id] then refresh the query... 😒
In attached v1.2 I fixed it as follow:
DetectDataType = (column as list) as type => if List.IsEmpty(column) then type nullable text // Default to text if empty else let sampleValue = List.First(List.RemoveNulls(column)), // Get the first non-null value valueType = Value.Type(sampleValue) in if valueType = type text then type nullable text else if valueType = type number and (sampleValue = Number.Round(sampleValue)) then type nullable Int64.Type // Check for whole numbers else if valueType = type number then type nullable number // Decimal numbers else if valueType = type logical then type nullable logical else if valueType = type date or valueType = type datetime then type nullable datetime // Handle Date/Time else type nullable text, // Fallback to textand IMHO the last else should be type nullable any and not text but that doesn't matter in your context as any will end as text once the query loads to the Data Model
A comment, just in case... I see you added a Sort step at the end of your query. I guess that's for checks on the query output within Excel. Otherwise, as far as I know, the Data Model doesn't mind about ordered data
Last thing. Now that I understand your solution has no external connection, you can slightly speed up the time to refresh your query by disabling the PQ Firewall. Excel data (tab) > Get Data > Query Options... > CURRENT WORKBOOK > Privacy > Choose Ignore the Privacy Levels...
Hope this helps and makes sense
Thanks Lorenzo 😊
I guess the most helpful thing for me to do is describe a common structure for my workbooks. I aim to build templates, so that I am not always reinventing the wheel each time that statistics are required.
- First tab is a quick ReadMe/UserGuide for customers
- Second tab is the raw data, formatted as an Excel table
- The contents of next tab varies depending on the audience, and purpose of the file. If I load the transformed data for the customers to play with directly, this is where it resides.
- The next tab would usually contain dynamic, interactive statistics and charts - a dashboard if you will
- There may be a final hidden sheet, with helper tables etc, if I've included/required them
For my work, I have found that using a table to store my raw data has been more flexible or maybe more robust than using a range - and definitely more robust than linking to an external data source, as the files get sent everywhere. I've learned through experience that it's best for me to contain the raw data with the transformations and the output :)
The template contains a table with on the raw data tab with the header row and an empty first data row. The complete range of raw data is copied from the CSV and pasted as Values Only into the table - meaning that formatting is retained and that the table expands in size to fit. I hope that this answers your question 🙂
Trying to keep my skills contemporary, I hopped into some GenAI (tried Copilot, then Gemini Advanced, and then back to Copilot) for assistance with those last two things that I wanted to solve (convert blank to null and check Data Type) and am satisfied with the outcome. The amended QueryResult code is below and I've attached the file in case you are curious about what AI chose to do:
let
Source = JiraExport,
SourceHeaders = Table.FromColumns( { Table.ColumnNames( Source ) }, {"JiraHeader"} ),
SelectedAssumedHeaders = Table.SelectRows( SourceHeaders, each
Text.StartsWith( [JiraHeader], Labels_StartsWith ) or
Text.StartsWith( [JiraHeader], CustomFields_StartsWith )
),
HeaderStartNames = Table.AddColumn(SelectedAssumedHeaders, "JiraHeaderStartName", each
if Text.StartsWith( [JiraHeader], Labels_StartsWith ) then Labels_StartsWith
else Text.BeforeDelimiter( [JiraHeader], ")", {0, RelativePosition.FromEnd} )),
GroupedStartNames = Table.Group( HeaderStartNames,
{"JiraHeaderStartName"},
{
{"DATA", each _, type table},
{"Count", each Table.RowCount(_), Int64.Type}
}
),
FilteredCount = Table.SelectRows( GroupedStartNames, each ( [Count] > 1 ) ),
List_HeadersToMerge = Table.ExpandTableColumn(
Table.SelectColumns( FilteredCount, {"DATA"} ),
"DATA", {"JiraHeader"}, {"JiraHeaders"}
)[JiraHeaders],
List_HeadersStartsWith = FilteredCount[JiraHeaderStartName],
MergedColumns = List.Accumulate( List_HeadersStartsWith, Source,
(accum,name) as table =>
Table.CombineColumns(
accum,
List.Select( List_HeadersToMerge, each Text.StartsWith(_, name ) ),
each Combiner.CombineTextByDelimiter(" ; ", QuoteStyle.None)(List.RemoveNulls(_) ),
name
)
),
ColumnsToRename = List.Select(
Table.ColumnNames( MergedColumns ), each
Text.StartsWith(_, CustomFields_StartsWith )
),
NewColumnNames = List.Transform( ColumnsToRename, each
let
BetweenDelimiters = Text.BetweenDelimiters(_, CustomFields_StartsWith, ")" )
in
if Text.Contains( BetweenDelimiters, "(" ) then BetweenDelimiters & ")" else BetweenDelimiters
),
RenamedMergedColumns = Table.RenameColumns( MergedColumns,
List.Zip( { ColumnsToRename, NewColumnNames} )
),
Result = if Table.IsEmpty( FilteredCount ) then Source else RenamedMergedColumns,
EmptyToNull = Table.TransformColumns(
Result,
List.Transform(
Table.ColumnNames(Result),
(colName) => {colName, each if _ = "" then null else _}
)
),
// Function to detect data type
DetectDataType = (column as list) as type =>
if List.IsEmpty(column) then type nullable text // Default to text if empty
else
let
sampleValue = List.First(List.RemoveNulls(column)), // Get the first non-null value
valueType = Value.Type(sampleValue)
in
if valueType = type text then type nullable text
else if valueType = type number and (sampleValue = Number.Round(sampleValue)) then type nullable number // Check for whole numbers
else if valueType = type logical then type nullable logical
else if valueType = type date or valueType = type datetime then type nullable datetime // Handle Date/Time
else type nullable text, // Fallback to text
// Apply the function to each column
ColumnTypes = List.Transform(
Table.ColumnNames(EmptyToNull),
(colName) =>
let
columnValues = Table.Column(EmptyToNull, colName) // Get the entire column values
in
{colName, DetectDataType(columnValues)} // Pass the column values to the function
),
// Transform the table with detected types
DetectDataTypes = Table.TransformColumnTypes(EmptyToNull, ColumnTypes),
#"Sorted Rows" = Table.Sort(DetectDataTypes,{{"Issue key", Order.Ascending}})
in
#"Sorted Rows"
Hi Insert_Key
Thanks for sharing all what you shared and TBH I wouldn't have bet on GenAI regarding data types. What they came up with is very similar to what I've seen so far and what I referred to in my previous reply. Though, AI got it wrong when it comes to numbers vs. whole numbers... In the workbook you shared - just for testing purpose - enter a decimal number in column [Parent id] then refresh the query... 😒
In attached v1.2 I fixed it as follow:
DetectDataType = (column as list) as type =>
if List.IsEmpty(column) then type nullable text // Default to text if empty
else
let
sampleValue = List.First(List.RemoveNulls(column)), // Get the first non-null value
valueType = Value.Type(sampleValue)
in
if valueType = type text then type nullable text
else if valueType = type number and (sampleValue = Number.Round(sampleValue)) then type nullable Int64.Type // Check for whole numbers
else if valueType = type number then type nullable number // Decimal numbers
else if valueType = type logical then type nullable logical
else if valueType = type date or valueType = type datetime then type nullable datetime // Handle Date/Time
else type nullable text, // Fallback to textand IMHO the last else should be type nullable any and not text but that doesn't matter in your context as any will end as text once the query loads to the Data Model
A comment, just in case... I see you added a Sort step at the end of your query. I guess that's for checks on the query output within Excel. Otherwise, as far as I know, the Data Model doesn't mind about ordered data
Last thing. Now that I understand your solution has no external connection, you can slightly speed up the time to refresh your query by disabling the PQ Firewall. Excel data (tab) > Get Data > Query Options... > CURRENT WORKBOOK > Privacy > Choose Ignore the Privacy Levels...
Hope this helps and makes sense
- LorenzoNov 08, 2024Silver Contributor
You're very welcome. It's been a real pleasure as it's not common to have such exchange on forums & as PQ +/- challenging cases are not that frequent that's been an opportunity to do a bit of M coding
Re. Date vs Datetime: if your sampleValue is of type any and in your scenario it is, I don't see another option than checking if the Time value of the DateTime value = 0. Implemented in attached file where I used your [Parent id] column with a DateTime value:
else if valueType = type datetime and ( Number.From( Time.From( sampleValue ) ) = 0 ) then type date // Handle Date else if valueType = type datetime then type nullable datetime // Handle Date/TimeAll the best... Sincerely
- Insert_KeyNov 08, 2024Brass Contributor
Thank you, again, so much Lorenzo !
Yes, Copilot couldn't solve the whole number scenario - I kept on pointing out the error of its ways but gave up after, I think, the third attempt because as I mentioned above "...I did modify JiraExport in its own query (i.e. so the source is updated before it's handled by your solution) by removing the Issue id and Parent id columns...". Outcome: the affected columns won't actually make it to the Data Model.
That said, your solution for it will no doubt be useful for other exports/reports and I will definitely use it 😊 I wish that this platform used post numbers, or time & date so that I could better reference quotes/comments!
I think I will also need to tweak theelse if valueType = type date or valueType = type datetime then type nullable datetime // Handle Date/Timeline, because I assume it will assign any date as datetime, which is *not* what I wanted 😆 but easily enough changed. I'll perhaps expand the entire ValueType argument to specify all Types that I anticipate; was just in too much of a rush today. Was doing quite a bit in Excel's front-end for other bits of work today.
I'll have a look to see if I can actually adjust the Privacy Levels in Excel/Power Query. Being that I am employed within s Government setting, everything is pretty locked down but I'll definitely investigate and very much appreciate the advice/insight. I've been waiting on a replacement work device for three months now! Apparently a hardware shortage, but it is allegedly in my city now. Hopefully worth the wait!
By the end of the day, I had taken my standalone development file and repacked its contents into my "master" file that I had used for all other Q3 reporting. As it now contains the Change Management data, I can start creating statistics around that next week, and need to develop something similar for another Service Notifications and then consolidate that back into the master, re-run the stats, and re-author an expanded, enhanced Q3 report.Again, thank you so much for your help, curiosity and desire to find an effectively, sustainable, performant solution for me. It's genuinely appreciated! 👏👏👏