Forum Discussion
Power Query - Large Data Set Question
Hello - I combined a handful of .xlsx files and there's about 5 million rows total. I'm trying to identify duplicate "project IDs" (the long strings below in rows 22-29), basically what this filter is showing is all of the project IDs that have 3, 4, 5, 6, etc. duplicates all the way through 14. That's exactly what I'm looking for, there's about 180k project IDs I was able to get. The problem is the project IDs that have 1 duplicate, which is the number 2 in this filter because I did a group by in power query by the project ID and it counts the number of matching rows it has, so 2 means it has 1 duplicate. When I filter on this it runs over the 1 million excel row limit, I was wondering if anybody had an idea of how I could get around this problem?
1 Reply
You may hit one of the classic pain points with Excel which is 1,048,576 row limit. With 5 million rows and about 180k project IDs, you are already pushing Excel beyond what it was designed for. Power Query can crunch that much data, but once you load it back into the Excel grid, you hit the ceiling.
You may consider this as workaround:
Since you specifically want the project IDs with exactly 1 duplicate (count = 2):
- Do the grouping in Power Query.
- Filter for Count = 2.
- Load the results into the Data Model (not the sheet).
- Use a PivotTable to explore or export to CSV if you need the raw list.
This way, you bypass the worksheet limit entirely while still working inside Excel.