Jul 30 2017
09:50 PM
- last edited on
Jul 12 2019
10:49 AM
by
TechCommunityAP
Jul 30 2017
09:50 PM
- last edited on
Jul 12 2019
10:49 AM
by
TechCommunityAP
Hi. I have a file with over a million rows in SHEET 1, plus 50 columns. Many of the rows are duplicate IDs (column A), but for different years (column B). I want to select only the most recent entry for each ID. I want to populate SHEET 2 with the same 50 columns, but with no duplicate IDs in column A. And each column A unique ID should always have been selected based on having the largest Column B value. For Column C: Column AX, the values should all be for the same row as the corresponding ID in Column A. Example below:
Much appreciated if someone can provide assistance!
Jul 31 2017 01:30 AM - edited Jul 31 2017 07:47 AM
Hello,
maybe, this could be done in PowerQuery. I have uploaded a sample file including a Get & Transform Query. For creating the query, I transformed the data range to a table first. Please note the use of the Table.Buffer function applied after the sorting and which I inserted manually to the M-Code.
As I don't have a Mac, I could not test it in Excel for Mac. And, in case you have data rows where the ID's and years are equal, the query will take the first occurence them.
If you are not able to adapt the query for your data or it does not work, I think uploading a sample file with all columns in it and some sample data rows, would be good to get more assistance here.
Best,
Mourad