Forum Discussion

Jeffino's avatar
Jeffino
Copper Contributor
May 25, 2024

Removing duplicates and sorting data in columns instead of rows

Hello, one of our software exports data in the following format. I don't want to touch this file since it will be exported frequently.

I want to create a different Excel file with formulas which is linked to this file and configure it in a way to obtain the results in the following way:

Basically removing the duplicates and sorting the data in columns and leaving blanks when nothing is found.

Thank you!

 

  • Jeffino 

    As variant bit more close to UI.

    First query, SourceRaw, loads the file and declares data types (not sure which exactly locale shall be used)

    let
        pFilepath = "C:\Test\My_Projects_&_Quotations_2024_06_05_22_36.csv",
    
        Source = Csv.Document(
            File.Contents(pFilepath),
            [Delimiter=",", Columns=15, Encoding=65001, QuoteStyle=QuoteStyle.None]
        ),
        PromoteHeaders = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
        DeclareType = Table.TransformColumnTypes(
            PromoteHeaders,
            {
                  {"Gross Value", Currency.Type}
                , {"Date Created", type datetime}
                , {"Date Modified", type datetime}
                , {"Completion date", type date}
            }, "fr-LU")
    in
        DeclareType

    Referenced SourceRaw query, Projects, keeps columns with repeated values for each project and removes dupicates

    let
        Source = SourceRaw,
        SelectColumns = Table.RemoveColumns(Source,{"Industry Name", "Industry", "Quoted to"}),
        RemoveDuplicates = Table.Distinct(SelectColumns)
    in
        RemoveDuplicates

    Query Industry pivots projects on them

    let
        Source = SourceRaw,
        SelectColumns = Table.SelectColumns(Source,{"Project Name", "Industry", "Industry Name"}),
        ProjectIndustry = Table.Pivot(
            SelectColumns,
            List.Distinct(SelectColumns[Industry]),
            "Industry", "Industry Name", List.Min)
    in
        ProjectIndustry

    Query QutedTo expands quotes to columns

    let
        Source = SourceRaw,
        SelectColumns = Table.SelectColumns(Source,{"Project Name", "Quoted to"}),
        NoBlanks = Table.SelectRows(SelectColumns, each ([Quoted to] <> "")),
        RemoveDuplicates = Table.Distinct(NoBlanks),
        CountRecords = Table.Group(
            RemoveDuplicates,
            {"Project Name"},
            {{"Count", each Table.RowCount(_), Int64.Type}}),
    
        QuoteCount = List.Max(CountRecords[Count]),
        names = List.Transform( {1..QuoteCount}, (q) => "Quote to #" & Text.From(q)),
    
        ListQutedTo = Table.Group(
            RemoveDuplicates,
            {"Project Name"},
            {{"Quoted to", each [#"Quoted to"] }}),
        CombineQuotedTo = Table.TransformColumns(
            ListQutedTo,
            {"Quoted to", each Text.Combine(List.Transform(_, Text.From), "="), type text}),
        ExpandQuotedTo = Table.SplitColumn(
            CombineQuotedTo,
            "Quoted to",
            Splitter.SplitTextByDelimiter("=", QuoteStyle.Csv),
            names)
    in
        ExpandQuotedTo

    Finally merge all togeteher

    let
        MergeWithIndustry = Table.NestedJoin(
            Projects, {"Project Name"},
            Industry, {"Project Name"},
            "Industry",
            JoinKind.LeftOuter),
        IndustryNames = List.Skip( Table.ColumnNames(MergeWithIndustry{0}[Industry]) ),
        ExpandIndustry = Table.ExpandTableColumn(
            MergeWithIndustry,
            "Industry",
            IndustryNames),
    
        MergeQuotedTo = Table.NestedJoin(
            ExpandIndustry, {"Project Name"},
            QuotedTo, {"Project Name"},
            "QuotedTo",
            JoinKind.LeftOuter),
        QuotedToNames = List.Skip( Table.ColumnNames(MergeQuotedTo{0}[QuotedTo]) ),
        ExpandQuotedTo = Table.ExpandTableColumn(
            MergeQuotedTo,
            "QuotedTo",
            QuotedToNames)
    in
        ExpandQuotedTo
  • Jeffino 

    Using aggressively modern formulae

    = LET(
        crosstab,  PIVOTBY(project, category, item, CONCAT,,0,,0),
        totalCost, DROP(SUMIFS(cost, project, TAKE(crosstab,,1)),1),
        HSTACK(crosstab, VSTACK("Total cost", totalCost))
      )

     

Resources