Forum Discussion
Removing duplicates and sorting data in columns instead of rows
Hello, one of our software exports data in the following format. I don't want to touch this file since it will be exported frequently.
I want to create a different Excel file with formulas which is linked to this file and configure it in a way to obtain the results in the following way:
Basically removing the duplicates and sorting the data in columns and leaving blanks when nothing is found.
Thank you!
As variant bit more close to UI.
First query, SourceRaw, loads the file and declares data types (not sure which exactly locale shall be used)
let pFilepath = "C:\Test\My_Projects_&_Quotations_2024_06_05_22_36.csv", Source = Csv.Document( File.Contents(pFilepath), [Delimiter=",", Columns=15, Encoding=65001, QuoteStyle=QuoteStyle.None] ), PromoteHeaders = Table.PromoteHeaders(Source, [PromoteAllScalars=true]), DeclareType = Table.TransformColumnTypes( PromoteHeaders, { {"Gross Value", Currency.Type} , {"Date Created", type datetime} , {"Date Modified", type datetime} , {"Completion date", type date} }, "fr-LU") in DeclareType
Referenced SourceRaw query, Projects, keeps columns with repeated values for each project and removes dupicates
let Source = SourceRaw, SelectColumns = Table.RemoveColumns(Source,{"Industry Name", "Industry", "Quoted to"}), RemoveDuplicates = Table.Distinct(SelectColumns) in RemoveDuplicates
Query Industry pivots projects on them
let Source = SourceRaw, SelectColumns = Table.SelectColumns(Source,{"Project Name", "Industry", "Industry Name"}), ProjectIndustry = Table.Pivot( SelectColumns, List.Distinct(SelectColumns[Industry]), "Industry", "Industry Name", List.Min) in ProjectIndustry
Query QutedTo expands quotes to columns
let Source = SourceRaw, SelectColumns = Table.SelectColumns(Source,{"Project Name", "Quoted to"}), NoBlanks = Table.SelectRows(SelectColumns, each ([Quoted to] <> "")), RemoveDuplicates = Table.Distinct(NoBlanks), CountRecords = Table.Group( RemoveDuplicates, {"Project Name"}, {{"Count", each Table.RowCount(_), Int64.Type}}), QuoteCount = List.Max(CountRecords[Count]), names = List.Transform( {1..QuoteCount}, (q) => "Quote to #" & Text.From(q)), ListQutedTo = Table.Group( RemoveDuplicates, {"Project Name"}, {{"Quoted to", each [#"Quoted to"] }}), CombineQuotedTo = Table.TransformColumns( ListQutedTo, {"Quoted to", each Text.Combine(List.Transform(_, Text.From), "="), type text}), ExpandQuotedTo = Table.SplitColumn( CombineQuotedTo, "Quoted to", Splitter.SplitTextByDelimiter("=", QuoteStyle.Csv), names) in ExpandQuotedTo
Finally merge all togeteher
let MergeWithIndustry = Table.NestedJoin( Projects, {"Project Name"}, Industry, {"Project Name"}, "Industry", JoinKind.LeftOuter), IndustryNames = List.Skip( Table.ColumnNames(MergeWithIndustry{0}[Industry]) ), ExpandIndustry = Table.ExpandTableColumn( MergeWithIndustry, "Industry", IndustryNames), MergeQuotedTo = Table.NestedJoin( ExpandIndustry, {"Project Name"}, QuotedTo, {"Project Name"}, "QuotedTo", JoinKind.LeftOuter), QuotedToNames = List.Skip( Table.ColumnNames(MergeQuotedTo{0}[QuotedTo]) ), ExpandQuotedTo = Table.ExpandTableColumn( MergeQuotedTo, "QuotedTo", QuotedToNames) in ExpandQuotedTo
- Riny_van_EekelenPlatinum Contributor
Jeffino I would use Power Query. See if you can follow the example in the attached file. Come back here if you get stuck.
- JeffinoCopper Contributor
Riny_van_Eekelen you opened a new world to me with Power Query! Thank you
I partially solved my problem
Could you please have a look at the attached and help me solving the below issue?
CSV file is the database
EXCEL File has the Query and my Target on a separate worksheet
You might need to link them again
Basically I need to transpose COLUMN: "Quoted to" and not succeeding
- Riny_van_EekelenPlatinum Contributor
Jeffino That seems pretty straight-forward, though I don't have time right now. If not someone else jumps in, I can have a look it tomorrow morning (my time CET).
- PeterBartholomew1Silver Contributor
Using aggressively modern formulae
= LET( crosstab, PIVOTBY(project, category, item, CONCAT,,0,,0), totalCost, DROP(SUMIFS(cost, project, TAKE(crosstab,,1)),1), HSTACK(crosstab, VSTACK("Total cost", totalCost)) )