SOLVED

Removing duplicates and sorting data in columns instead of rows

Copper Contributor

Hello, one of our software exports data in the following format. I don't want to touch this file since it will be exported frequently.

Jeffino_2-1716614208403.png

I want to create a different Excel file with formulas which is linked to this file and configure it in a way to obtain the results in the following way:

Jeffino_3-1716614232577.png

Basically removing the duplicates and sorting the data in columns and leaving blanks when nothing is found.

Thank you!

 

15 Replies

@Jeffino I would use Power Query. See if you can follow the example in the attached file. Come back here if you get stuck.

Riny_van_Eekelen_0-1716616129760.png

 

@Riny_van_Eekelen you opened a new world to me with Power Query! Thank you

I partially solved my problem

 

Could you please have a look at the attached and help me solving the below issue?

CSV file is the database

EXCEL File has the Query and my Target on a separate worksheet

You might need to link them again

 

Basically I need to transpose COLUMN: "Quoted to" and not succeeding

@Jeffino That seems pretty straight-forward, though I don't have time right now. If not someone else jumps in, I can have a look it tomorrow morning (my time CET).

@Jeffino 

online sql:

select colExclude[Quotedto],replace(group_concat(distinct(Quotedto)),',','</td><td>') q from import_csv where Quotedto<>'' group by ProjectName;

 

Screenshot_2024-06-08-09-25-33-243_com.mmbox.xbrowser.pro.jpg

@Jeffino 

 

let
  Source = Csv.Document(File.Contents("XXXX/My_Projects_&_Quotations_2024_06_05_22_36.csv"), [Delimiter = ",", Columns = 15, Encoding = 65001, QuoteStyle = QuoteStyle.None]),
  PromotedHeaders = Table.PromoteHeaders(Source, [PromoteAllScalars = true]),
  groupingKey = List.Difference(Table.ColumnNames(PromotedHeaders), {"Industry Name", "Industry", "Quoted to"}),
  grouped = Table.Group(PromotedHeaders, groupingKey, 
                        {"Records", each 
                          let 
                            quotedToList = List.Select(List.Distinct([Quoted to]), each _ <> ""),
                            quotedToHeaders = List.Transform({0..List.Count(quotedToList)-1}, each "Quoted to" & " " & Number.ToText(_+1)),
                            quotedToRecord = Record.FromList({[Industry Name]{0}, [Industry]{0}} & quotedToList, {"Industry Name", "Industry"} & quotedToHeaders)
                          in 
                            quotedToRecord
                        }),
  headers = List.Union(List.Transform(grouped[Records], each Record.FieldNames(_))),
  ExpandedRecords = Table.ExpandRecordColumn(grouped, "Records", headers, headers)
in
  ExpandedRecords

 

rachel_0-1717823760364.png

 

@rachel Thank you so much for this but i dont really know how to apply these steps...could you please elaborate it more or share with me the file? Much appreciate it

Thank you @peiyezhu but i dont really know how to apply these steps on the excel file. could you please elaborate or share with me the file? Appreciate it
best response confirmed by Jeffino (Copper Contributor)
Solution

@Jeffino 

As variant bit more close to UI.

First query, SourceRaw, loads the file and declares data types (not sure which exactly locale shall be used)

let
    pFilepath = "C:\Test\My_Projects_&_Quotations_2024_06_05_22_36.csv",

    Source = Csv.Document(
        File.Contents(pFilepath),
        [Delimiter=",", Columns=15, Encoding=65001, QuoteStyle=QuoteStyle.None]
    ),
    PromoteHeaders = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    DeclareType = Table.TransformColumnTypes(
        PromoteHeaders,
        {
              {"Gross Value", Currency.Type}
            , {"Date Created", type datetime}
            , {"Date Modified", type datetime}
            , {"Completion date", type date}
        }, "fr-LU")
in
    DeclareType

Referenced SourceRaw query, Projects, keeps columns with repeated values for each project and removes dupicates

let
    Source = SourceRaw,
    SelectColumns = Table.RemoveColumns(Source,{"Industry Name", "Industry", "Quoted to"}),
    RemoveDuplicates = Table.Distinct(SelectColumns)
in
    RemoveDuplicates

Query Industry pivots projects on them

let
    Source = SourceRaw,
    SelectColumns = Table.SelectColumns(Source,{"Project Name", "Industry", "Industry Name"}),
    ProjectIndustry = Table.Pivot(
        SelectColumns,
        List.Distinct(SelectColumns[Industry]),
        "Industry", "Industry Name", List.Min)
in
    ProjectIndustry

Query QutedTo expands quotes to columns

let
    Source = SourceRaw,
    SelectColumns = Table.SelectColumns(Source,{"Project Name", "Quoted to"}),
    NoBlanks = Table.SelectRows(SelectColumns, each ([Quoted to] <> "")),
    RemoveDuplicates = Table.Distinct(NoBlanks),
    CountRecords = Table.Group(
        RemoveDuplicates,
        {"Project Name"},
        {{"Count", each Table.RowCount(_), Int64.Type}}),

    QuoteCount = List.Max(CountRecords[Count]),
    names = List.Transform( {1..QuoteCount}, (q) => "Quote to #" & Text.From(q)),

    ListQutedTo = Table.Group(
        RemoveDuplicates,
        {"Project Name"},
        {{"Quoted to", each [#"Quoted to"] }}),
    CombineQuotedTo = Table.TransformColumns(
        ListQutedTo,
        {"Quoted to", each Text.Combine(List.Transform(_, Text.From), "="), type text}),
    ExpandQuotedTo = Table.SplitColumn(
        CombineQuotedTo,
        "Quoted to",
        Splitter.SplitTextByDelimiter("=", QuoteStyle.Csv),
        names)
in
    ExpandQuotedTo

Finally merge all togeteher

let
    MergeWithIndustry = Table.NestedJoin(
        Projects, {"Project Name"},
        Industry, {"Project Name"},
        "Industry",
        JoinKind.LeftOuter),
    IndustryNames = List.Skip( Table.ColumnNames(MergeWithIndustry{0}[Industry]) ),
    ExpandIndustry = Table.ExpandTableColumn(
        MergeWithIndustry,
        "Industry",
        IndustryNames),

    MergeQuotedTo = Table.NestedJoin(
        ExpandIndustry, {"Project Name"},
        QuotedTo, {"Project Name"},
        "QuotedTo",
        JoinKind.LeftOuter),
    QuotedToNames = List.Skip( Table.ColumnNames(MergeQuotedTo{0}[QuotedTo]) ),
    ExpandQuotedTo = Table.ExpandTableColumn(
        MergeQuotedTo,
        "QuotedTo",
        QuotedToNames)
in
    ExpandQuotedTo

@Jeffino 

Using aggressively modern formulae

= LET(
    crosstab,  PIVOTBY(project, category, item, CONCAT,,0,,0),
    totalCost, DROP(SUMIFS(cost, project, TAKE(crosstab,,1)),1),
    HSTACK(crosstab, VSTACK("Total cost", totalCost))
  )

image.png 

Youtube video:
https://www.youtube.com/watch?v=EZXOQylIjVo

Start watching from 1:30, it shows where to paste the M code.
You will need to change the file path of the csv in "Source" too...
It is "close to UI" but still has lots of funky syntax....

@rachel 

Only 4 manually added lines in all queries, to generate and get intermediate tables column names.

@Jeffino 

You run sql with this online sql or other sql tools.

https://e.anyoupin.cn/EData/?s=1282

 

Thank you all very much! Really appreciate it!

 

I have used @SergeiBaklan response since it is more in line with my capabilities and knowledge

 

Great community!

@Jeffino , thank you.

Independently of your choice. Everything could be done in several ways. That's a good approach to select one which is easier in maintenance if only performance is not a showstopper.

1 best response

Accepted Solutions
best response confirmed by Jeffino (Copper Contributor)
Solution

@Jeffino 

As variant bit more close to UI.

First query, SourceRaw, loads the file and declares data types (not sure which exactly locale shall be used)

let
    pFilepath = "C:\Test\My_Projects_&_Quotations_2024_06_05_22_36.csv",

    Source = Csv.Document(
        File.Contents(pFilepath),
        [Delimiter=",", Columns=15, Encoding=65001, QuoteStyle=QuoteStyle.None]
    ),
    PromoteHeaders = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    DeclareType = Table.TransformColumnTypes(
        PromoteHeaders,
        {
              {"Gross Value", Currency.Type}
            , {"Date Created", type datetime}
            , {"Date Modified", type datetime}
            , {"Completion date", type date}
        }, "fr-LU")
in
    DeclareType

Referenced SourceRaw query, Projects, keeps columns with repeated values for each project and removes dupicates

let
    Source = SourceRaw,
    SelectColumns = Table.RemoveColumns(Source,{"Industry Name", "Industry", "Quoted to"}),
    RemoveDuplicates = Table.Distinct(SelectColumns)
in
    RemoveDuplicates

Query Industry pivots projects on them

let
    Source = SourceRaw,
    SelectColumns = Table.SelectColumns(Source,{"Project Name", "Industry", "Industry Name"}),
    ProjectIndustry = Table.Pivot(
        SelectColumns,
        List.Distinct(SelectColumns[Industry]),
        "Industry", "Industry Name", List.Min)
in
    ProjectIndustry

Query QutedTo expands quotes to columns

let
    Source = SourceRaw,
    SelectColumns = Table.SelectColumns(Source,{"Project Name", "Quoted to"}),
    NoBlanks = Table.SelectRows(SelectColumns, each ([Quoted to] <> "")),
    RemoveDuplicates = Table.Distinct(NoBlanks),
    CountRecords = Table.Group(
        RemoveDuplicates,
        {"Project Name"},
        {{"Count", each Table.RowCount(_), Int64.Type}}),

    QuoteCount = List.Max(CountRecords[Count]),
    names = List.Transform( {1..QuoteCount}, (q) => "Quote to #" & Text.From(q)),

    ListQutedTo = Table.Group(
        RemoveDuplicates,
        {"Project Name"},
        {{"Quoted to", each [#"Quoted to"] }}),
    CombineQuotedTo = Table.TransformColumns(
        ListQutedTo,
        {"Quoted to", each Text.Combine(List.Transform(_, Text.From), "="), type text}),
    ExpandQuotedTo = Table.SplitColumn(
        CombineQuotedTo,
        "Quoted to",
        Splitter.SplitTextByDelimiter("=", QuoteStyle.Csv),
        names)
in
    ExpandQuotedTo

Finally merge all togeteher

let
    MergeWithIndustry = Table.NestedJoin(
        Projects, {"Project Name"},
        Industry, {"Project Name"},
        "Industry",
        JoinKind.LeftOuter),
    IndustryNames = List.Skip( Table.ColumnNames(MergeWithIndustry{0}[Industry]) ),
    ExpandIndustry = Table.ExpandTableColumn(
        MergeWithIndustry,
        "Industry",
        IndustryNames),

    MergeQuotedTo = Table.NestedJoin(
        ExpandIndustry, {"Project Name"},
        QuotedTo, {"Project Name"},
        "QuotedTo",
        JoinKind.LeftOuter),
    QuotedToNames = List.Skip( Table.ColumnNames(MergeQuotedTo{0}[QuotedTo]) ),
    ExpandQuotedTo = Table.ExpandTableColumn(
        MergeQuotedTo,
        "QuotedTo",
        QuotedToNames)
in
    ExpandQuotedTo

View solution in original post