Forum Discussion
Too large dataset - extract data within date range
To extract the data within a certain date range in Excel, the best solution is with PQ as mentioned by Mr. Riny_van_Eekelen in his post.
However, if there is no basic knowledge of PQ, here is an alternative solution:
You can use a combination of filtering and copying the filtered data to a new sheet.
Here is how you can do it:
- Open your Excel file and select the dataset that you want to filter (including the header row).
- Go to the "Data" tab in the Excel ribbon and click on the "Filter" button. This will add filter dropdowns to each column header.
- Click on the filter dropdown for the "date" column.
- In the filter dropdown, select "Date Filters" (or a similar option depending on your Excel version) and then choose "Custom Filter."
- In the "Custom AutoFilter" dialog box, select "is greater than or equal to" from the first dropdown and enter your desired start date (e.g., 7/17/2022) in the input box.
- Click "OK" to apply the filter. This will show only the rows where the date is from 7/17/2022 onward.
- Select all the visible rows (excluding the header row) by clicking on the row number on the left-hand side and dragging down to the last visible row.
- Right-click on the selected rows and choose "Copy" or press Ctrl+C.
- Go to a new sheet in your Excel workbook where you want to paste the filtered data.
- Right-click on the first cell of the new sheet and choose "Paste" or press Ctrl+V. This will paste the filtered data into the new sheet.
- If needed, you can remove the filter from the original dataset by clicking on the filter button again and selecting "Clear Filter."
Now you should have a new sheet with only the data within the specified date range. You can work with this filtered dataset in Excel before importing it into your statistical software.
I hope this helps!
- SergeiBaklanJun 18, 2023Diamond Contributor
Unfortunately that doesn't work on Step 1. Excel opens the file and show only first million of rows ignoring the rest, we have no access to extra records from the grid.
- NikolinoDEJun 18, 2023Platinum Contributor
If the user is working with a data set of millions of rows that exceeds Excel's row limit, alternative methods would have to be used to extract the data within the specific date range.
One approach, as already mentioned by Mr. Riny_van_Eekelen , is PQ.
That would also be the best approach if the data set is to be created with Excel.
Other approaches such as Python or database management systems (DBMS) are far beyond my scope of knowledge.
However, if the data set does not exceed Excel's row limit, it would be a possible alternative, not the most elegant as PQ, but theoreticaly it would be an alternative :).
- Riny_van_EekelenJun 18, 2023Platinum Contributor
NikolinoDE But the OP clearly indicated that there were too many rows to display on the grid.