Forum Discussion

confooseddesi89's avatar
confooseddesi89
Copper Contributor
Jun 17, 2023

Too large dataset - extract data within date range

I have a large dataset. Here is a sample of what the data look like:

 

idid_2surveytypedatefromtocreatedupdatedquestion_numquestionanswer
4570091679391Nutrition5/21/2022 0:005/21/2022 15:005/28/2022 2:005/21/2022 0:125/21/2022 15:48159Please choose a number between 0 and 10 where 0 is not at all and 10 is quite a bit

How much would you say you like the taste and generally enjoy drinking…

Bottled or pre- made sugary tea or coffee drinks (for example, Snapple ice tea or Starbucks Frappuccino)
0
4570091679391Nutrition5/21/2022 0:005/21/2022 15:005/28/2022 2:005/21/2022 0:125/21/2022 15:48160Please choose a number between 0 and 10 where 0 is not at all and 10 is quite a bit

How much would you say you like the taste and generally enjoy drinking…

Sports drinks
0
4570091679391Nutrition5/21/2022 0:005/21/2022 15:005/28/2022 2:005/21/2022 0:125/21/2022 15:48161Please choose a number between 0 and 10 where 0 is not at all and 10 is quite a bit



How much would you say you like the taste and generally enjoy drinking…



Caffeinated energy drinks
0
4564743679491Nutrition5/20/2022 0:005/20/2022 12:005/26/2022 23:005/20/2022 0:105/20/2022 12:46161Please choose a number between 0 and 10 where 0 is not at all and 10 is quite a bit



How much would you say you like the taste and generally enjoy drinking…



Caffeinated energy drinks
0
4564743679491Nutrition5/20/2022 0:005/20/2022 12:005/26/2022 23:005/20/2022 0:105/20/2022 12:46160Please choose a number between 0 and 10 where 0 is not at all and 10 is quite a bit



How much would you say you like the taste and generally enjoy drinking…



Sports drinks
0
4564743679491Nutrition5/20/2022 0:005/20/2022 12:005/26/2022 23:005/20/2022 0:105/20/2022 12:46159Please choose a number between 0 and 10 where 0 is not at all and 10 is quite a bit



How much would you say you like the taste and generally enjoy drinking…



Bottled or pre- made sugary tea or coffee drinks (for example, Snapple ice tea or Starbucks Frappuccino)
0
4564743679491Nutrition5/20/2022 0:005/20/2022 12:005/26/2022 23:005/20/2022 0:105/20/2022 12:46155Please choose a number between 0 and 10 where 0 is not at all and 10 is quite a bit



How much would you say you like the taste and generally enjoy drinking…



100% fruit juice
0
4564743679491Nutrition5/20/2022 0:005/20/2022 12:005/26/2022 23:005/20/2022 0:105/20/2022 12:46158Please choose a number between 0 and 10 where 0 is not at all and 10 is quite a bit



How much would you say you like the taste and generally enjoy drinking…



Diet soft drinks and all other artificially sweetened drinks
0
4564743679491Nutrition5/20/2022 0:005/20/2022 12:005/26/2022 23:005/20/2022 0:105/20/2022 12:46156Please choose a number between 0 and 10 where 0 is not at all and 10 is quite a bit



How much would you say you like the taste and generally enjoy drinking…



Sweetened juice drinks (for example, fruit ades, lemonade, punch, and orange drinks)
0
4564743679491Nutrition5/20/2022 0:005/20/2022 12:005/26/2022 23:005/20/2022 0:105/20/2022 12:46157Please choose a number between 0 and 10 where 0 is not at all and 10 is quite a bit



How much would you say you like the taste and generally enjoy drinking…



Regular soft drinks (soda pop)
0
4556257679691Nutrition5/18/2022 0:005/18/2022 15:005/25/2022 2:005/18/2022 0:115/18/2022 22:07161Please choose a number between 0 and 10 where 0 is not at all and 10 is quite a bit



How much would you say you like the taste and generally enjoy drinking…



Caffeinated energy drinks
0
4556257679691Nutrition5/18/2022 0:005/18/2022 15:005/25/2022 2:005/18/2022 0:115/18/2022 22:07160Please choose a number between 0 and 10 where 0 is not at all and 10 is quite a bit



How much would you say you like the taste and generally enjoy drinking…



Sports drinks
0
4556257679691Nutrition5/18/2022 0:005/18/2022 15:005/25/2022 2:005/18/2022 0:115/18/2022 22:07159Please choose a number between 0 and 10 where 0 is not at all and 10 is quite a bit



How much would you say you like the taste and generally enjoy drinking…



Bottled or pre- made sugary tea or coffee drinks (for example, Snapple ice tea or Starbucks Frappuccino)
0
4556257679691Nutrition5/18/2022 0:005/18/2022 15:005/25/2022 2:005/18/2022 0:115/18/2022 22:07158Please choose a number between 0 and 10 where 0 is not at all and 10 is quite a bit



How much would you say you like the taste and generally enjoy drinking…



Diet soft drinks and all other artificially sweetened drinks
0
4587204679591Nutrition5/27/2022 0:005/24/2022 10:005/30/2022 21:005/24/2022 0:115/27/2022 19:08161Please choose a number between 0 and 10 where 0 is not at all and 10 is quite a bit



How much would you say you like the taste and generally enjoy drinking…



Caffeinated energy drinks
0

 

There are too many rows for Excel to display; when I open the file, I receive the message "this data set is too large for the excel grid. If you save this workbook, you'll lose data that wasn't loaded." I need to work with the data in Excel before importing it into my statistical software, so this is an issue. However, I don't need all the data, only the dates where the column "date" is from 7/17/2022 0:00 onward (the date column ranges from August of 2021 to April of 2023). I've tried using a PivotTable to extract this dataset with only these dates, but even with Googling how to use them, I got stuck as I have no experience with them. Can anyone assist? Thanks

  • confooseddesi89 

    "I've tried using a PivotTable to extract this dataset with only these dates, but even with Googling how to use them, I got stuck as I have no experience with them."

     

    That could work if you have Power Pivot for your version of Excel. Open it, depends on your source select From Database or From Other sources and follow the wizard to import the data. After loading the data into data model you have two ways - create PivotTable from data model (as usual) or create linked-back table in the grid. For the latest on the ribbon Data->Existing Connections->Connections in this Workbook->Open (your connection)->Table.

     

    The pro of the data model using is its performance which is extremely high. Data size on practice is limited only by your computer memory. 

  • peiyezhu's avatar
    peiyezhu
    Bronze Contributor
    Re:receive the message "this data set is too large for the excel grid.
    I am afraid you should pass all data to database or .txt file first with other language like python, c# or golang.


    Then query with sql or some command line like awk.

    Or handle these datas with these computer languages directly.
  • NikolinoDE's avatar
    NikolinoDE
    Gold Contributor

    confooseddesi89 

    To extract the data within a certain date range in Excel, the best solution is with PQ as mentioned by Mr. Riny_van_Eekelen in his post.

    However, if there is no basic knowledge of PQ, here is an alternative solution:

    You can use a combination of filtering and copying the filtered data to a new sheet.

    Here is how you can do it:

    1. Open your Excel file and select the dataset that you want to filter (including the header row).
    2. Go to the "Data" tab in the Excel ribbon and click on the "Filter" button. This will add filter dropdowns to each column header.
    3. Click on the filter dropdown for the "date" column.
    4. In the filter dropdown, select "Date Filters" (or a similar option depending on your Excel version) and then choose "Custom Filter."
    5. In the "Custom AutoFilter" dialog box, select "is greater than or equal to" from the first dropdown and enter your desired start date (e.g., 7/17/2022) in the input box.
    6. Click "OK" to apply the filter. This will show only the rows where the date is from 7/17/2022 onward.
    7. Select all the visible rows (excluding the header row) by clicking on the row number on the left-hand side and dragging down to the last visible row.
    8. Right-click on the selected rows and choose "Copy" or press Ctrl+C.
    9. Go to a new sheet in your Excel workbook where you want to paste the filtered data.
    10. Right-click on the first cell of the new sheet and choose "Paste" or press Ctrl+V. This will paste the filtered data into the new sheet.
    11. If needed, you can remove the filter from the original dataset by clicking on the filter button again and selecting "Clear Filter."

    Now you should have a new sheet with only the data within the specified date range. You can work with this filtered dataset in Excel before importing it into your statistical software.

     

    I hope this helps!

    • SergeiBaklan's avatar
      SergeiBaklan
      MVP

      NikolinoDE 

      Unfortunately that doesn't work on Step 1. Excel opens the file and show only first million of rows ignoring the rest, we have no access to extra records from the grid.

      • NikolinoDE's avatar
        NikolinoDE
        Gold Contributor

        SergeiBaklan 

        If the user is working with a data set of millions of rows that exceeds Excel's row limit, alternative methods would have to be used to extract the data within the specific date range.

        One approach, as already mentioned by Mr. Riny_van_Eekelen , is PQ.

        That would also be the best approach if the data set is to be created with Excel.

        Other approaches such as Python or database management systems (DBMS) are far beyond my scope of knowledge.

         

        However, if the data set does not exceed Excel's row limit, it would be a possible alternative, not the most elegant as PQ, but theoreticaly it would be an alternative :).

         

  • Riny_van_Eekelen's avatar
    Riny_van_Eekelen
    Platinum Contributor

    confooseddesi89 Seems like you need to look into Power Query. Connect to a data source with over a million records and filter the information you need. No problem. Are you familiar with Power Query?

Resources