Combine data from multiple rows into a single row

Jilly · ‎May 06 2019

Hi Excel community

I have a dataset that looks something like this

Incident # Name of person reporting Role of person reporting Location of Incident

123 John Smith VMP 4W

123 Mary Burns Nurse 4W

123 Jack John Physio 4W

789 Joy Charm Nurse 3F

789 Kevin Bacon Administator 3F

456 William Hunt VMP 2A

I would like to only have one row per incident. The real dataset has many more columns, most of which have unique results in each column but 3 questions give the user the option of multiple answers for the same question (Name of reporter, Role of reporter) which forces the results over multiple rows. The problem is there could be up to 10 people in the name/role section so no two incidents are guaranteed to result in the same number of rows.

How can I force the rows that allow multiple responses onto a single row with the responses as separate columns rather than rows (or even one column with Name1 +Role1; Name2 + Role2; etc.)?

Jan Karel Pieterse · ‎May 07 2019

If you don't really need the information from those three columns which are causing the repeat of rows, you could just use Data, Remove Duplicates and exclude those three columns. Note that Excel will keep the FIRST row of every duplicate, so you may want to consider sorting the data prior to removing duplicates so the rows you want to keep are at the top.

Jilly · ‎May 07 2019

@Jilly , that could be combined to this

by Power Query. Steps are in attached file.

Jilly · ‎May 07 2019

@Sergei Baklan

Thanks Sergei. You are a genius!

Jilly · ‎May 07 2019

@Jan Karel Pieterse

Thanks for the reply. Unfortunately I do need to keep all the info. Sergei has given me the answer I need. Hopefully helps others.

Cheers

Jilly · ‎May 08 2019

@Jilly

Hi again Serge. I managed to follow the query you set up when I applied it to the full dataset. I have one problem. When I get to the stage where I split the concatenated column where a semicolon exists, it only splits the cell into 2 columns even if there are more than 2 semicolons (more than 2 combinations of person and role). Could you please tell me where the code is forcing the output to 2 columns only?

Jilly · ‎May 08 2019

@Sergei Baklan

Hi again Serge. I managed to follow the query you set up when I applied it to the full dataset. I have one problem. When I get to the stage where I split the concatenated column where a semicolon exists, it only splits the cell into 2 columns even if there are more than 2 semicolons (more than 2 combinations of person and role). Could you please tell me where the code is forcing the output to 2 columns only?

Jan Karel Pieterse · ‎May 08 2019

If you click the little cogwheel next to the split step in the PQ editor window you should be able to change the settings of that step. Look near the bottom of the settings window to see what is selected for "Split at"

Geoff1951 · ‎May 09 2019

Hi Jilly

My take on your query is that you need to create a second reporting worksheet that combines the data in your first sheet. That way, your original data is able to be added to from your original database in the form you are importing it, and you just switch over to the report worksheet to see the information displayed as you want it.

I would add a helper column to the right of the data in the data sheet as column E with the following formula

=IF(A2<>A1,B2,E1&" , "&B2)

In the reporting sheet, list the report numbers in column A, and in column B type the formula

=INDEX(Sheet1!$E$1:$E$200,MATCH($A1+1,Sheet1!$A$1:$A$200,0)-1,1)

You can add more columns as you need to capture other data if required

Sergei Baklan · ‎May 09 2019

@Jilly , in addition to @Jan Karel Pieterse comment. I assumed reporters are in one cell separated by semicolons. If you'd like to split them on separate cells you shall take care about variable number of persons reported per incident (thus variable number of columns). One option is to define number of columns which is more than any possible maximum.

Another option, to fit number of columns into actual current maximum, requires couple of line of additional M script code:

    // Last step of previous version
    RemoveUnused = Table.SelectColumns(RenameBack,
        {"Incident #", "Persons reporting", "Location of Incident"}),

    // Number of semicolons in each record for Persons Reporting
    SepsPerRecord = Table.AddColumn(RemoveUnused, "SepsInString",
        each    Text.Length([Persons reporting])-
                Text.Length(Text.Replace([Persons reporting],";","")) ),

    // Max fpr all records number of semicolns 
    MaxSeps = List.Max(SepsPerRecord[SepsInString]),

    // Generate list of sequental column names
    // from "Reporter 1" to "Reporter <MaxSeps+1>"
    ReportersColumnNames = List.Transform({1..MaxSeps+1},
        each "Person " & Number.ToText(_)),

    // Make standard split and substiture automatically
    // generates list with column names on ReportersColumnNames
    // Another option is to set number of columns which is 
    // definetelly more than possible maximum.
    SplitReporters = Table.SplitColumn(RemoveUnused,
        "Persons reporting",
        Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv),
        ReportersColumnNames)

in
    SplitReporters

Jilly · ‎May 09 2019

@Jan Karel Pieterse

Thanks Jan. That was a quick fix to the problem. I selected the advance option and it allowed me to say how many columns and the default was 2 so I changed it to a number more than what I expect I will need and it has worked perfectly. Thank you

Sergei Baklan · ‎May 11 2019

@Jilly , nonetheless, when you have more time better to shift on dynamic number of columns generation. That's only 3 lines of code. If, for example, now you have max 4 persons who may report incidents, in a while your business will grow and you add couple of more persons in staff - and have to rebuild the query. Or at least define max number as parameter - it will be easier in maintenance.

wildjosher · ‎Aug 14 2019

@Sergei Baklan

I am hoping to achieve a similar outcome, however I was unable to edit your code properly.

I am trying to get this:

To become like this:

Using this Code:

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    GroupByEmail = Table.Group({"Email"}, {{"What Section", each _, type table [Email=text, What Section=text]}}),
    ListWithInfo = Table.AddColumn(GroupByEmail, "What Section", each [What Section][What Section]),
    ExtractFromList = Table.TransformColumns(ListWithWhatSection, {"What Section", each Text.Combine(List.Transform(_, Text.From), "; "), type text}),
    SplitEmail = Table.SplitColumn(ExtractFromList, "Email", Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), {"Email.1", "Email.2"}),
    ToNumber = Table.TransformColumnTypes(SplitEmail,{{"Email.1", Int64.Type}, {"Email.2", type text}}),
    RemoveUnused = Table.SelectColumns("ID", "Did They Finish Section?")
in
    RemoveUnused

But I must have broken something along the way because I get this error:

Expression.Error: We cannot convert the value "ID" to type Table. Details: Value=ID Type=Type

Sergei Baklan · ‎Aug 14 2019

@wildjosher

I'd suggest bit different approach for such transformation

In query first pivot the table on column What Section with values Did They..

Remove ID if you don't need it and Group by email.

Resulting tables for each record FillDown, FillUp Remove Duplicates to have only one record for each email.

Keep only this resulting column and expand it.

Script is

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    PivotSections = Table.Pivot(
        Source,
        List.Distinct(Source[#"What Section"]),
        "What Section", "Did They Finish Section?"
    ),
    RemoveID = Table.RemoveColumns(
        PivotSections,{"ID"}
    ),
    GroupEmails = Table.Group(
        RemoveID, {"Email"},
        {{"Count",
            each _,
            type table
                [
                    Email=text,
                    Admin Section=text,
                    Internal Section=text,
                    External Section=text,
                    All Sections=text
                ]
        }}
    ),
    FillDown = Table.AddColumn(
        GroupEmails,
        "Custom",
        each Table.FillDown([Count],Table.ColumnNames([Count]))
    ),
    FillUp = Table.AddColumn(
        FillDown,
        "Custom.1",
        each Table.FillUp([Custom],Table.ColumnNames([Custom]))
    ),
    KeepDistinct = Table.AddColumn(
        FillUp,
        "Custom.2",
        each Table.Distinct([Custom.1],{"Email"})
    ),
    OnlyFinalTables = Table.SelectColumns(
        KeepDistinct,
        {"Custom.2"}
    ),
    ExpandThem = Table.ExpandTableColumn(
        OnlyFinalTables,
        "Custom.2",
        {"Email", "Admin Section", "Internal Section", "External Section", "All Sections"},
        {"Email", "Admin Section", "Internal Section", "External Section", "All Sections"}
    )
in
    ExpandThem

and in attached file

Jilly · ‎May 07 2019

@Jilly , that could be combined to this

by Power Query. Steps are in attached file.

View solution in original post

Combine data from multiple rows into a single row

Combine data from multiple rows into a single row

Re: Combine data from multiple rows into a single row

Re: Combine data from multiple rows into a single row

Re: Combine data from multiple rows into a single row

Re: Combine data from multiple rows into a single row

Re: Combine data from multiple rows into a single row

Re: Combine data from multiple rows into a single row

Re: Combine data from multiple rows into a single row

Re: Combine data from multiple rows into a single row

Re: Combine data from multiple rows into a single row

Re: Combine data from multiple rows into a single row

Re: Combine data from multiple rows into a single row

Re: Combine data from multiple rows into a single row

Re: Combine data from multiple rows into a single row

Re: Combine data from multiple rows into a single row

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs

Most Active Hubs

Video Hub

Combine data from multiple rows into a single row