Forum Discussion

JeanGoodrich's avatar
JeanGoodrich
Copper Contributor
Aug 02, 2022
Solved

Merged subcells and displaced data

I have a data file produced as output by a third-party program (i.e., I'm stuck with it, as is). Data input into Column E is usually one line/one cell in size. However, I have some data appearing in Column E in sub-cells. When I try to sort or filter the data (based on Column A), Excel says it cannot perform the action because "all merged cells need to be the same size." Of course, the data in Column E is what is of interest, and it must stay associated with "User" identified in Column A.

 

In addition, some of the data that should have gone into Column E is actually appearing in Column A, 1-3 lines below the "User" it is associated with.

 

How can I clean up this data so I can sort or filter based on "User" in Column A?

 

ANY suggestions would be greatly appreciated, as this complete data set is nearly 500,000 lines long.

Thank you!

 

Jean Goodrich

 

  • Patrick2788's avatar
    Patrick2788
    Aug 03, 2022

    JeanGoodrich 

     

    Attached is a possible LAMBDA solution. Please note my solution accounts for 6 columns. I see there were a few entries in the column right of Skill so I included it in the calculations.

     

    Fill-in gaps in the data:

     

    =LAMBDA(arr,col_num,SCAN("",INDEX(arr,SEQUENCE(ROWS(arr)),col_num),LAMBDA(a,v,IF(v="",a,v))))

     

    Piece it all together:

     

    =LAMBDA(arr,CHOOSE({1,2,3,4,5,6},FILL(arr,1),FILL(arr,2),FILL(arr,3),FILL(arr,4),INDEX(TEXT(arr,";;;General"),SEQUENCE(ROWS(arr)),5),INDEX(TEXT(arr,";;;General"),SEQUENCE(ROWS(arr)),6)))

     

    I avoided using insider functions which would have made it simpler.

     

    The end result (all you need to give CLEANUP is the entire range, including header row):

     

     

10 Replies

  • Patrick2788's avatar
    Patrick2788
    Silver Contributor

    JeanGoodrich 

    The way I look at your data, you need to unmerge all cells then fill in the blanks with the value above so you can sort or filter.  This becomes tedious if you have to do this each time.

     

    If you have access to SCAN or MAP, something can be drawn up.

      • Patrick2788's avatar
        Patrick2788
        Silver Contributor

        JeanGoodrich 

         

        Attached is a possible LAMBDA solution. Please note my solution accounts for 6 columns. I see there were a few entries in the column right of Skill so I included it in the calculations.

         

        Fill-in gaps in the data:

         

        =LAMBDA(arr,col_num,SCAN("",INDEX(arr,SEQUENCE(ROWS(arr)),col_num),LAMBDA(a,v,IF(v="",a,v))))

         

        Piece it all together:

         

        =LAMBDA(arr,CHOOSE({1,2,3,4,5,6},FILL(arr,1),FILL(arr,2),FILL(arr,3),FILL(arr,4),INDEX(TEXT(arr,";;;General"),SEQUENCE(ROWS(arr)),5),INDEX(TEXT(arr,";;;General"),SEQUENCE(ROWS(arr)),6)))

         

        I avoided using insider functions which would have made it simpler.

         

        The end result (all you need to give CLEANUP is the entire range, including header row):

         

         

  • Riny_van_Eekelen's avatar
    Riny_van_Eekelen
    Platinum Contributor

    JeanGoodrich It's difficult to visualise how the data exactly looks like, but you seem to get "garbage in" and you need make something pretty out of it. Try connecting to the date with Power Query. That will deal with merged cells, and perhaps it will also take care of the data that is shifted into column A rather than column E in Excel. But I can't tell for sure without seeing the data. Can you somehow share (Onedrive, Dropbox or similar) a part of the output file produced by the "other program". That is, not all 500K lines. By the way, what format is it? Does it come out as an xlsx file, or txt or perhaps csv?

    • JeanGoodrich's avatar
      JeanGoodrich
      Copper Contributor

      Riny_van_Eekelen , thank you for your response. And hello!

       

      I have uploaded a sample file to Dropbox:

      https://www.dropbox.com/scl/fi/cxjfcb4elus6igo2afooe/Example.Problem.xlsx?dl=0&rlkey=7dbwhlpzrirjfedwkvuioiw09

       

      It is definitely a garbage-in situation, and it will take longer to fix the third-party software than I have to provide useable data. I'm just starting to learn Pivot Tables but I don't trust my ability with that yet.

       

      I need to keep the relationship of User to Skill without changing the number of Users. The merged cells keep me from sorting by Care Type.

       

      If you have any suggestions, I would be very grateful!

       

      Thank you,

      Jean

       

      • Riny_van_Eekelen's avatar
        Riny_van_Eekelen
        Platinum Contributor

        JeanGoodrich Thanks for the file. It seems that in this small example, all is pretty neat up to and including row 34. Power Query (PQ) can resolve this with a few clicks, without the need to un-merge cells first and the use of complicated  formulas.

        Then the garbage kicks in. I can guess how this should have to be reorganised (which data should go where in order to create the same structure as the data in rows 2 to 34), but can't be sure until you explain it. Also important to know is if these "garbage rows" are consistent (for instance, there are always six garbage rows with data constantly in the wrong place), but I fear that neither PQ nor Patrick2788 's LAMBDA solution can easily fix that. By the way, how often does this garbage kick-in?

         

        But, just to be sure that we are really in the beginning of the process. Does this xlsx file come directly from the 3rd-party system? Or did it perhaps produce a csv file that you just double-clicked on, then opened in Excel looking like the file you shared?

         

Resources