Forum Discussion

Kyle_Thompson's avatar
Kyle_Thompson
Copper Contributor
Feb 27, 2025
Solved

Removing Duplicates from an Array based on Multiple Criteria

I have several spreadsheets which represent the output from a lighting audit done at several local high schools. Each space and fixture type within the school has its own row, so there are close to 1000 lines for each location. However, there are duplicates in cases where spaces are similar. For example, many classrooms use the same fixture as each other and require the same LED retrofit kit. I would like to condense the information we received as much as possible without losing the information I need to calculate energy savings.

The columns that I want to sort by are "EX Code", which is a code that describes the existing fixtures, "PR Code", a code describing the work to be done, and "Usage Type", which describes the type of space and determines the fixture's annual operating hours.

I can do this to eliminate duplicates from one column: 

=SORT(UNIQUE('Energy Audit - Sorted'!EXCode))

However, there are cases where one EX Code may apply to multiple Usage Types or PR Codes. For example, the "ECFLP26SC6" (an existing CFL fixture) was used all over the school in 12 different Usage Types, each of which has a different associated Annual Operating Hours.

 

The process I used was:

  1. Start with clean sheet
  2. type 

=SORT(UNIQUE('Energy Audit - Sorted'!EXCode))

into column B,

  1. set column C equal to column B,
  2. copy column C and paste special-> paste values into column D,
  3. type 

=SORT(UNIQUE(INDEX(FILTER(EnergyAudit_Sorted[[Usage Type]:[EX Code]],'Energy Audit - Sorted'!EXCode=$D2,),,1)))

into column E,

  1. insert rows as needed wherever I got a #SPILL error.

Now, I have a sheet associating EX Codes with Usage Types, but I would have to repeat that process for the PR Codes and then repeat it again on the other spreadsheets. I'm looking for a more intelligent way to accomplish this task. I tried using FILTER with an AND function for the "include" argument, but it gave a #VALUE error:

=FILTER(EnergyAudit_Sorted[[Usage Type]:[EX Code]], AND('Energy Audit - Sorted'!EXCode=$D2, EnergyAudit_Sorted[Usage Type]=$E2))

 

Is there a correct syntax for the FILTER or UNIQUE functions to sort by multiple criteria? Or is there another function I can use to automate this?

 

Thank you,

-Kyle Thompson

 

  • Kyle_Thompson's avatar
    Kyle_Thompson
    Copper Contributor

    I have since found a solution, described here: How to Filter with Multiple Criteria in Excel (4 Easy Ways)

     

    The correct syntax for the formula I showed earlier would be:

    =FILTER(EnergyAudit_Sorted[[Usage Type]:[EX Code]], ('Energy Audit - Sorted'!EXCode=$D2) * (EnergyAudit_Sorted[Usage Type]=$E2))

     

    The '*' operator performs the AND function in this case.

  • Patrick2788's avatar
    Patrick2788
    Silver Contributor

    One way to remove dupes while considering multiple columns is to use GROUPBY like this:

    =GROUPBY(HSTACK(EnergyAudit_Sorted[EX Code],EnergyAudit_Sorted[Usage Type]),,,,0)

     

    All you need to do is provide the columns in the rows argument of GROUPBY. HSTACK is used because those two columns aren't next to each other.  The commas indicate arguments are being skipped. The '0' supresses the totals. The results are sorted A-Z by default.

    Here's another example for EX Code and PR:

    =GROUPBY(HSTACK(EnergyAudit_Sorted[EX Code],EnergyAudit_Sorted[PR Code]),,,,0)

    I'm not entirely certain of the desired return but this may give you enough to go on.

    • Kyle_Thompson's avatar
      Kyle_Thompson
      Copper Contributor

      Thanks for your input, I'll try to work with those functions!

       

      To clarify the desired return:

      • A row where ALL THREE values (EX Code, PR Code, and Usage Type) are identical would be considered a duplicate row.
      • Any other row, where less than all three values match, would be unique. 

      For example:

      There are (5) rows for which EX Code = "EW4232N", Usage Type = "Class (HS)", and PR Code = "4TLED10C-2". Those would be condensed into (1) row. 

      There are (9) rows for which EX Code = "EW4232N", Usage Type = "Storage (Active)", and PR Code = "4TLED10C-2". This would be condensed into (1) row but would be separate from the row where the Usage Type was "Class (HS)".

Resources