Forum Discussion

noneofyourbiznez's avatar
noneofyourbiznez
Copper Contributor
Jul 31, 2017

Select unique rows with multiple columns and include only A column group member with highest B value

Hi. I have a file with over a million rows in SHEET 1, plus 50 columns. Many of the rows are duplicate IDs (column A), but for different years (column B). I want to select only the most recent entry for each ID. I want to populate SHEET 2 with the same 50 columns, but with no duplicate IDs in column A. And each column A unique ID should always have been selected based on having the largest Column B value. For Column C: Column AX, the values should all be for the same row as the corresponding ID in Column A. Example below:

BEFORE

 

 

AFTER

Much appreciated if someone can provide assistance! 

 

 

 

1 Reply

  • Anonymous's avatar
    Anonymous

    Hello,

     

    maybe, this could be done in PowerQuery. I have uploaded a sample file including a Get & Transform Query. For creating the query, I transformed the data range to a table first. Please note the use of the Table.Buffer function applied after the sorting and which I inserted manually to the M-Code.

     

    As I don't have a Mac, I could not test it in Excel for Mac. And, in case you have data rows where the ID's and years are equal, the query will take the first occurence them.

     

    If you are not able to adapt the query for your data or it does not work, I think uploading a sample file with all columns in it and some sample data rows, would be good to get more assistance here.

     

    Best,

    Mourad

Resources