SOLVED

How to Return Column Data Based on All Populated Cells in Row Range and Match to Other Worksheet

Brass Contributor

Hi Everyone,

I need help with an issue I can't seem to find an exact answer for. Basically, I need to return the column data for all populated cells with the letter "X" in the row. Then return the column header to different cells in the row. 

Example:

User IDJob 1Job 2Job 3Job 4Output 1Output 2 
Jonh.SmithX  XJob 1Job 4 
Bob.LangXX  Job 1Job 2 


The formula will be applied to about 500 rows of data with about 30 columns. I would like all headers to be returned if it has an x in the column for each row. I would also like all the data to be next to each other in regards to the columns (e.g. no blank cells between output data). I assume this can be taken care of with one or two formulas that can be copied to the other cells. 

In addition, I have another issue. After I get this data, I want to do something a bit more complex. There's another sheet in which I have all of the classes that each user should be taking based on what their job is. Each user will be taking multiple classes. The next thing I need help with is matching the user's job to the classes they should be taking. The above data will be match to the below data.

 Class 1Class 2Class 3Class 4
Job 1 X X
Job 2X X 
Job 3 XX 
Job 4XXX 


The other sheet contains data that shows which class a user should be taking denoted by the letter "X" in each row. 

The final output should be something like this:

User IDJob 1Job 2Job 3Job 4Output 1Output 2Output 3Output 4Output 5
Jonh.SmithX  XJob 1Job 4Class 2Class 4Class 3
Bob.LangXX  Job 1Job 2Class 1Class 2Class 3


Another output format could be:

 

User IDOutput 1Output 3Output 4Output 5
Jonh.SmithJob 1Class 2Class 4

Class 3
 Job 4Class 1Class 2 
Bob.LangJob 1Class 2

Class 4
 Job 2Class 1Class 4

Note: In this case, duplicate values are fine. 


I am not an advanced user of Excel so any help is greatly appreciated.

16 Replies

In your example I can see that there are two jobs for each participant as well.  It appears that each job is only associated with two classes.  Is this realistic?  Or do you have a more precise data set that can display a better representation of your data.

 

For example:

 

1. How far does your data go column wise for jobs? 

2.  How far does your data go column wise for classes?

3. What is the max number of jobs?

4. What is the max number of classes for an individual job?

 

I've spent a fair amount of time on issues like this in the past only to find that the representation of data provided by the end user was not accurate which caused further questions.  In order to potentially avoid this issue would it be possible for you to confirm that this is the case or provide a more data.  There may be a formula that can give you what you need but it's more likely that the only way to have a simple solution would be to implement some kind of more complex logic using VBA code.  I believe it may save some headaches for yourself as well as other people replying to your inquiry if you could please provide a little more information.

I don't have the spreadsheet in front of me at the moment, but a good estimate would be about 60 jobs and 45 classes. I saw a similar situation to mine before and the gentleman who answered it used helping columns to solve for the first part (adding the jobs, column headers, to the row in consecutive cells). He used a complex index function to search if a cell contained a value and then added the column header data to the cell where the function was written. He did this for each value in the row. Unfortunately, my data is a bit different than that of the person he was helping. I don't know how to go about accomplishing the second part. In the first sheet, it has a user's data and the letter "X" to denote which job that person has. There are probably 60 or so columns for jobs. In the second sheet, the jobs are the row headers and the classes are the column headers. There are probably about 45 classes. Each job has about 5 classes or less. 

http://www.fiscal.ca.gov/training-academy/documents/SummaryRolesCourseMatrix20171220.xlsx

 

This is the document from which I'll be pulling courses depending on a user's roles. However, I will not be using all of these roles and courses. Not all of the user's have all of the jobs listed. 

best response confirmed by Anthony Smith (Brass Contributor)
Solution

Hi Anthony,

 

That could be done by Power Query. Here is the simple mockup to illustrate idea based on your initial pictures, for the real data queries will be bit more complex.

Here on the left are source tables and resulting one are on the right

image.png

First query generates list of roles for users, here it dynamically split on columns

let

    Source = Excel.CurrentWorkbook(){[Name="User"]}[Content],
    UnpivotJobs = Table.UnpivotOtherColumns(Source, {"User ID"}, "Attribute", "Value"),
    RemoveValues = Table.RemoveColumns(UnpivotJobs,{"Value"}),
    GroupRows = Table.Group(RemoveValues, {"User ID"}, {{"Role", each [Attribute]}}),
    MaxInList = Table.AddColumn(GroupRows, "ListMax", each List.Count([Role])),
    ElementNumber = List.Max(MaxInList[ListMax]),
    ListColumns = List.Transform({1..ElementNumber}, each "Role " & Number.ToText(_)),
    ExtractElements = Table.TransformColumns(GroupRows,
        {"Role", each Text.Combine(List.Transform(_, Text.From), ","), type text}),
    SplitElements = Table.SplitColumn(ExtractElements, "Role",
        Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), ListColumns)

in
    SplitElements

For the second table we first create connection query which maps roles on lists of classes

let
    Source = Excel.CurrentWorkbook(){[Name="Job"]}[Content],
    UnpivotClasses = Table.UnpivotOtherColumns(Source, {"Job"}, "Attribute", "Value"),
    RemoveValue = Table.RemoveColumns(UnpivotClasses,{"Value"}),
    GroupRows = Table.Group(RemoveValue, {"Job"}, {{"Class", each [Attribute]}})
in
    GroupRows

and here we take users with roles and merge with that query

let

    Source = Excel.CurrentWorkbook(){[Name="User"]}[Content],
    UnpivotJobs = Table.UnpivotOtherColumns(Source, {"User ID"}, "Job", "Value"),
    RemoveValues = Table.RemoveColumns(UnpivotJobs,{"Value"}),
    MergeWithClasses = Table.NestedJoin(RemoveValues,{"Job"},ClassesForRoles,
{"Job"},"Classes",JoinKind.LeftOuter), ExpandClasses = Table.ExpandTableColumn(MergeWithClasses, "Classes", {"Class"}, {"Class"}), ExtractClasses = Table.TransformColumns(ExpandClasses, {"Class",
each Text.Combine(List.Transform(_, Text.From), ", "), type text}), SortUsers = Table.Sort(ExtractClasses,{{"User ID", Order.Descending},
{"Job", Order.Ascending}}) in SortUsers

Here classes are as lists, could be split dynamically same way as roles in first query. Sample file is attached.

Sergei-

 

Great example of how robust Power Query can be in a situation like the below described.  Thanks for jumping in...I had thought of a few different ways to attack the issue using VBA, but I believe this to be a much simpler solution.

Matt, thank you. I believe that could be done with VBA and even with formulas, but PQ is naturally for tasks like this.

Hi Sergei,

 

Thank you so much for the help. For this query, do both worksheets have to be in the same workbook? 

My job currently uses Microsoft Excel 2013, and I am unable to install the Power Query add-on. It looks as though I will not be able to do this.
In order to replace all of the "X's" with column headers, I just used an IF statement and deleted all of the in between blank cells. I just need to figure out how to assign the classes based on roles using a formula.

It's a shame you can't use Power Query, it's ideal for this situation.  Maybe try something like this (see attached file):

Job_Class.png

Thank you, that was really helpful. I was actually working on organizing my data into that format already. For some reason, the first class (the one in the second column) wouldn't appear using the formula. Luckily, the class is the same for all jobs so I just copied it down. I'm not sure if I am entering the range incorrectly, but it seems to be fine.

I got it to work by moving my row headers to the end. What part of the formula causes this discrepancy?

Again, thank you for all of your help. 

Hey Anthony-

 

Glad you were able to get it working.  I used the Column() function as well as an offset value in order to auto increment the column reference in the Vlookup() function.  You can adjust the offset value or just hard code it.  It's really up to you.  The Column() function returns the number of the Column the formula is in.  So for example if the formula was in Column A it will return 1, Column B returns 2 .....etc.

OK, that makes sense. Changing the number of preceding columns worked. Now I know why. So I have one more thing if you wouldn't mind assisting me. The issue I am currently having is there are many duplicate classes for each person because many of the roles require the same classes. It's probably complicated to try and remove the duplicate classes since they're assigned to the roles. Instead of doing this, is there a way to just remove the roles and keep the classes for each person while removing the duplicates? In regards to presentation, the duplicate classes make hard to digest. 

I was able to accomplish what I wanted by moving the roles column to the end of the pivot table. However, I would like to do the same thing but keep the roles column at the beginning. 

Another option would be to take your data:

 

1. Copy > PasteSpecial > Transpose

2. Remove Duplicates for each column > Data Tab > Remove Duplicates

3. Copy > PasteSpecial > Transpose to get it back in the same format

 

If it's possible to post a non-sensitive file of what you are currently working with I'm happy to assist.  It's often much easier to help with an example.

 

1 best response

Accepted Solutions
best response confirmed by Anthony Smith (Brass Contributor)
Solution

Hi Anthony,

 

That could be done by Power Query. Here is the simple mockup to illustrate idea based on your initial pictures, for the real data queries will be bit more complex.

Here on the left are source tables and resulting one are on the right

image.png

First query generates list of roles for users, here it dynamically split on columns

let

    Source = Excel.CurrentWorkbook(){[Name="User"]}[Content],
    UnpivotJobs = Table.UnpivotOtherColumns(Source, {"User ID"}, "Attribute", "Value"),
    RemoveValues = Table.RemoveColumns(UnpivotJobs,{"Value"}),
    GroupRows = Table.Group(RemoveValues, {"User ID"}, {{"Role", each [Attribute]}}),
    MaxInList = Table.AddColumn(GroupRows, "ListMax", each List.Count([Role])),
    ElementNumber = List.Max(MaxInList[ListMax]),
    ListColumns = List.Transform({1..ElementNumber}, each "Role " & Number.ToText(_)),
    ExtractElements = Table.TransformColumns(GroupRows,
        {"Role", each Text.Combine(List.Transform(_, Text.From), ","), type text}),
    SplitElements = Table.SplitColumn(ExtractElements, "Role",
        Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), ListColumns)

in
    SplitElements

For the second table we first create connection query which maps roles on lists of classes

let
    Source = Excel.CurrentWorkbook(){[Name="Job"]}[Content],
    UnpivotClasses = Table.UnpivotOtherColumns(Source, {"Job"}, "Attribute", "Value"),
    RemoveValue = Table.RemoveColumns(UnpivotClasses,{"Value"}),
    GroupRows = Table.Group(RemoveValue, {"Job"}, {{"Class", each [Attribute]}})
in
    GroupRows

and here we take users with roles and merge with that query

let

    Source = Excel.CurrentWorkbook(){[Name="User"]}[Content],
    UnpivotJobs = Table.UnpivotOtherColumns(Source, {"User ID"}, "Job", "Value"),
    RemoveValues = Table.RemoveColumns(UnpivotJobs,{"Value"}),
    MergeWithClasses = Table.NestedJoin(RemoveValues,{"Job"},ClassesForRoles,
{"Job"},"Classes",JoinKind.LeftOuter), ExpandClasses = Table.ExpandTableColumn(MergeWithClasses, "Classes", {"Class"}, {"Class"}), ExtractClasses = Table.TransformColumns(ExpandClasses, {"Class",
each Text.Combine(List.Transform(_, Text.From), ", "), type text}), SortUsers = Table.Sort(ExtractClasses,{{"User ID", Order.Descending},
{"Job", Order.Ascending}}) in SortUsers

Here classes are as lists, could be split dynamically same way as roles in first query. Sample file is attached.

View solution in original post