Forum Discussion

Jason Loscar's avatar
Jason Loscar
Copper Contributor
Aug 25, 2017
Solved

How to group rows of data with text

We get data from a client that list sales detail by entity, branch, and customer (The data can easily be more than 700,000 lines. For some transactions there are two or more lines for the same invoice. All of the main data is the same in all three lines until you get to the area with amounts. It is because One line lists the total materials and the other line would list the freight and the 3rd line might list the tax the invoice number is the same for all 3 lines. I would like to find a quick way of condensing those three lines into one line based on Invoice number. See attached example.

Any help on how to do this efficiently would be greatly appreciated. I tried pivot tables and they do not display the data in the desired format. I also tried the "Consolidate" under Data and that works, but because you have to choose a "Function" like sum etc... it does not pull down the "text" information in the rows it only pulls down numbers and adds them wich I do not want it to do for items like zip codes etc... once again not giving the desired outcome.

I am ssuming that I may have to forego the Prime_account and Sub_account columns because if 2 of the 3 cells have a code in it there is no way to express that in the desired one line/cell. 

  • Hi Jason,

     

    Most logical for will be to use PivotTable. Not sure what's wrong with formatting - I played a bit, see in Sheet2. In you sample data i change addresses - they are different for every record, i guess one invoice shall be on one address.

     

    Prime/Sub account - yes, it shall be ignored or repeated in row data for each record. Another point you have mix of numbers and text in same column(s), desirably to have in one format.

     

    Another option is to transform the range with Get&Transform (Excel 2016) or PowerPivot (previous versions). Not sure if you familiar with it, just in case make a sample in next sheet. In that case all transformations could be done using this tool.

     

     

4 Replies

  • SergeiBaklan's avatar
    SergeiBaklan
    Diamond Contributor

    Hi Jason,

     

    Most logical for will be to use PivotTable. Not sure what's wrong with formatting - I played a bit, see in Sheet2. In you sample data i change addresses - they are different for every record, i guess one invoice shall be on one address.

     

    Prime/Sub account - yes, it shall be ignored or repeated in row data for each record. Another point you have mix of numbers and text in same column(s), desirably to have in one format.

     

    Another option is to transform the range with Get&Transform (Excel 2016) or PowerPivot (previous versions). Not sure if you familiar with it, just in case make a sample in next sheet. In that case all transformations could be done using this tool.

     

     

    • Jason Loscar's avatar
      Jason Loscar
      Copper Contributor

      Hi Sergei,

       

      Thank you for your reply. The pivot table can work, but the issue arrises when I try to creat the pivot table with more than 700,000 lines of data, it comes up with a message saying it ran out of memory etc... so I could break the excel file in half and try the pivot table... I was hoping that there would be a way to use the consolidate feature and have it pull down the text cells... Thanks again for your help. 

       

      Jason

      • SergeiBaklan's avatar
        SergeiBaklan
        Diamond Contributor

        Hi Jason,

         

        On which version of Excel are you? If on 2010 or later you may separate your files with source records and with pivot table. With that few millions of row shall not be a big issue. Here, for example, 50 millions of rows are analazing in Pivot Table http://www.masterdataanalysis.com/ms-excel/analyzing-50-million-records-excel/

         

        Idea is to load your source records in first file into the data model of the second. Using Power Query (aka Get & Transform in Excel 2016), even without any transformation. Just two clicks: Get data from file and after that Load to data model.

         

        After that create PivotTable within second file.

         

        If make some note complex transformations within the Power Query (Group by Invoices) you don't need the PivotTable at all, Power Query returns you resulting table. This option is more preferable.

         

        In more details that's for exmple here

        https://blogs.technet.microsoft.com/josebda/2017/02/12/loading-csvtext-files-with-more-than-a-million-rows-into-excel/

        https://www.youtube.com/watch?v=KLQqGtYydUc

         

        However, consolidate also could work. Since you consolidate data for same invoices you shall insert column with invoices numbers right before columns with figures to sum (move or copy the column to here).

         

        Next, copy the sheet with you data (select sheet tab and copy). In new sheet select entire range (or even entire sheet), in Ribbon Data->Remove Duplicates. Select column with invoices numbers to remove duplicates.

         

        Next, in second sheet select names of your columns starting with invoices numbers till end of all sums (last 6 columns, in yellow in attached file). Click Consolidate, SUM as aggregation, entire range for same columns in first sheet (green in sample) as reference, and, IMPORTANT, both top row and left column as labels. Okay now.

         

        See attached in Sheet3. 

Resources