Select unique rows with multiple columns and include only A column group member with highest B value

Copper Contributor

Hi. I have a file with over a million rows in SHEET 1, plus 50 columns. Many of the rows are duplicate IDs (column A), but for different years (column B). I want to select only the most recent entry for each ID. I want to populate SHEET 2 with the same 50 columns, but with no duplicate IDs in column A. And each column A unique ID should always have been selected based on having the largest Column B value. For Column C: Column AX, the values should all be for the same row as the corresponding ID in Column A. Example below:

BEFOREBEFORE

 

 

AFTERAFTER

Much appreciated if someone can provide assistance! 

 

 

 

1 Reply

Hello,

 

maybe, this could be done in PowerQuery. I have uploaded a sample file including a Get & Transform Query. For creating the query, I transformed the data range to a table first. Please note the use of the Table.Buffer function applied after the sorting and which I inserted manually to the M-Code.

 

As I don't have a Mac, I could not test it in Excel for Mac. And, in case you have data rows where the ID's and years are equal, the query will take the first occurence them.

 

If you are not able to adapt the query for your data or it does not work, I think uploading a sample file with all columns in it and some sample data rows, would be good to get more assistance here.

 

Best,

Mourad