Forum Discussion

Patrick2788's avatar
Patrick2788
Silver Contributor
Sep 18, 2022

Stacking The Beatles in Excel: An Exercise in 3D stacking

The setup:  You're provided with an Excel workbook with 12 sheets for 12 Beatles albums. All 12 sheets have the same fields, but the fields are not in the same order in each sheet.

 

The goal: Roll up 12 sheets and order the columns uniformly in the array.

 

I'll step you through my solution and then open up the discussion for anyone that would like to give this a try.

 

First, a named item for the 3D reference:

=PleasePleaseMe:LetItBe!$A$1:$F$31

 

Next, HSTACK and SORT gathers the track listings:

=SORT(HSTACK(Albums3D),,,1)

This is a snip of what the stack looks like:

 

The biggest step is the re-stack:

=LET(AlbumsNoHeader,DROP(StackedAlbums,1),stack,HSTACK(TOCOL(TAKE(AlbumsNoHeader,,12),,1),TOCOL(CHOOSECOLS(AlbumsNoHeader,SEQUENCE(,12,13,1)),,1),TOCOL(CHOOSECOLS(AlbumsNoHeader,SEQUENCE(,12,25,1)),,1),TOCOL(CHOOSECOLS(AlbumsNoHeader,SEQUENCE(,12,37,1)),,1),TOCOL(CHOOSECOLS(AlbumsNoHeader,SEQUENCE(,12,49,1)),,1),TOCOL(TAKE(AlbumsNoHeader,,-12),,1)),UNIQUE(stack,,1))


Finally, putting it all together:

=VSTACK(UNIQUE(TAKE(StackedAlbums,1),1),ReStack)

 

I'm most interested in the "restacking".  It seems heaven and earth are being moved to sort things out and wonder if there's a more direct way of doing it.

 

Formulas only.  Please no VBA or PowerQuery.

 

20 Replies

  • Patrick2788's avatar
    Patrick2788
    Silver Contributor

    Another go at this for 2025!

    **Edited - cleaned it up a bit more.

    StackNSortλ=
    
    LAMBDA(three_dimensional_ref,
    LET(
        stack, HSTACK(three_dimensional_ref),
        sorted, SORT(stack, , , 1),
        wrapped, WRAPCOLS(TOCOL(sorted, , 1), ROWS(sorted) * (SHEETS()-1)),
        header, TAKE(wrapped, 1),
        result, UNIQUE(FILTER(wrapped, (TAKE(wrapped, , 1) <> 0))),
        result
    ))

     

    • djclements's avatar
      djclements
      Bronze Contributor

      Well done!

      I've been doing a lot of table transformations with INDEX-SEQUENCE algorithms recently, so I took a crack at it:

      =LAMBDA(ref_3d,
         LET(
            a, HSTACK(ref_3d),
            b, TAKE(a,1),
            h, ROWS(a)-1,
            w, COLUMNS(a),
            m, SHEETS(ref_3d),
            n, w/m,
            k, SEQUENCE(h*m,,0),
            x, DROP(SORTBY(a,QUOTIENT(SEQUENCE(,w,0),n),1,b,1),1),
            y, INDEX(x,MOD(k,h)+1,QUOTIENT(k,h)*n+SEQUENCE(,n)),
            VSTACK(UNIQUE(SORT(b,,,1),1),FILTER(y,TAKE(y,,1)<>""))
         )
      )

      It's a little longer, but should hold up well performance-wise.

      If any one of the worksheets were missing a field, though, I think an unpivot-repivot method might be the way to go. Something like:

      =LET(
         a, HSTACK(Albums3D),
         b, DROP(a,1),
         w, COLUMNS(a),
         i, SEQUENCE(ROWS(b)),
         j, QUOTIENT(SEQUENCE(,w,0),w/SHEETS(Albums3D)),
         t, b<>"",
         f, LAMBDA(x,TOCOL(IFS(t,x),2)),
         DROP(PIVOTBY(HSTACK(f(j),f(i)),f(TAKE(a,1)),f(b),SINGLE,0,0,,0),,2)
      )

      Or:

      =LET(
         a, HSTACK(Albums3D),
         b, DROP(a,1),
         v, TOCOL(b),
         w, COLUMNS(a),
         i, SEQUENCE(ROWS(b)),
         j, QUOTIENT(SEQUENCE(,w,0),w/SHEETS(Albums3D)),
         f, LAMBDA(x,TOCOL(IF({1},x,b))),
         DROP(PIVOTBY(HSTACK(f(j),f(i)),f(TAKE(a,1)),v,SINGLE,0,0,,0,,v<>""),,2)
      )

      I'm not really sure if there's any difference in efficiency between the two. I'd like to think the second one is faster because it removes the blank records in one swoop with the [filter_array] argument of PIVOTBY, whereas the first method removes the blank records for every instance of TOCOL (4 times), but I didn't run any tests with larger datasets.

      Cheers!

      • Patrick2788's avatar
        Patrick2788
        Silver Contributor

        Looks good! No reduction tricks needed in 2025. Creative use of PIVOTBY to aggregate as scalars with SINGLE.

        In re: unpivot/re-pivot -  Melt from Pandas might interest you. Could be re-created in Lambda certainly:

         

    • PeterBartholomew1's avatar
      PeterBartholomew1
      Silver Contributor

      I can't even open the file without crashing Excel!

      I had better check whether it is Excel or the files.

  • Patrick2788's avatar
    Patrick2788
    Silver Contributor
    The more I try to approach this task from a fresh angle, the more I run into everyone else's solutions! I've played with sequencing and scanning but ultimately it goes back to solutions already posted.
    • Patrick2788's avatar
      Patrick2788
      Silver Contributor

      Patrick2788 

      This is my attempt to approach this from a different angle. Although, you'll see everyone's solutions have inspired me in a way.

       

      The 3D reference. Same as 1st attempt:

      =PleasePleaseMe:LetItBe!$A$1:$F$31

       

      The header pulled from the Let it Be sheet because that's the column order I'm going for:

      =LetItBe!$A$1:$F$1

       

      The 3D stack. This time with VSTACK:

      =VSTACK(Albums3D)

       

      This is where my solution is different.  This formula pulls the header from each album and repeats it 31 times.  I've added SHEETS() so Magical Mystery Tour can be added!

      =CHOOSEROWS(SEQUENCE(SHEETS()-1,,1,31),INT(SEQUENCE((SHEETS()-1)*31,,1,1/31)))

       

      This formula checks the column position by each row and strings together an array:

      =LET(c,CHOOSEROWS(VSTACK(Albums3D),SeqRows),BYROW(c,LAMBDA(row,TEXTJOIN(",",,XMATCH(header,row)))))

      Which looks like this:

       

      Next, the above array is reduced to a scalar, split, and wrapped:

      =WRAPROWS(TEXTSPLIT(TEXTJOIN(",",,strArr),","),6)

       Which looks like this:

       

      Lastly, the above array is fed to INDEX, some filtering is done, and the header is added:

      =LET(tracks,INDEX(StackedAlbums,SEQUENCE((SHEETS()-1)*31),arr),f,FILTER(tracks,(TAKE(tracks,,-1)>0)*(ISNUMBER(TAKE(tracks,,-1)))),VSTACK(header,f))

       

  • Patrick2788 

    I seem to be falling behind the curve here!

    My formula using the thunk array was fine, but it was tedious forming the array in the first place.  I also felt I was missing out on the new VSTACK 3D capability.  I started the refactoring by defining thunks from the stacked array

     

    LET(
       a, SEQUENCE(12),
       stacked, VSTACK(Albums3D),
       tracksϑ, MAP(a, LAMBDA(a₁,
          LAMBDA(TAKE(DROP(stacked,31*(a₁-1)),31)))),
    // and so on

     

    Then I realised that, since I know there are 31 rows set aside for the tracks within each album, I might as well use a Lambda function with the album number as a parameter.  I therefore changed to

     

    = LET(
       stacked, VSTACK(Albums3D),
       Hdrsλ,   LAMBDA(a₁, TAKE(DROP(stacked,31*(a₁-1)),1)),
       Tracksλ, LAMBDA(a₁, TAKE(DROP(stacked,1+31*(a₁-1)),30)),
    //and then ...
       a, SEQUENCE(,12),
       expanded, REDUCE(Header, a,
          LAMBDA(rollup,a₁, 
            LET(
               hdr,       Hdrsλ(a₁),
               tracks,    Tracksλ(a₁),
               col,       XMATCH(Header,hdr),
               reordered, CHOOSECOLS(tracks, col),
               VSTACK(rollup,reordered)
            )
          )
        );
       FILTER(expanded, ISTEXT(TAKE(expanded,,1)))
    )

     

    The question then is 'Why do my formulas continue to get larger whereas everyone else's gets shorter?'

  • SergeiBaklan's avatar
    SergeiBaklan
    Diamond Contributor

    Patrick2788 

    More straightforward

    =LET(
        header, {"Album","Length","No.","Side","Title","Vocals"},
        stack, VSTACK(Albums3D),
        k, SEQUENCE( ROWS( stack ) ),
        titleRows, SCAN(1, k, LAMBDA(a,v,
                  IF( AND( ISNA( XMATCH( CHOOSEROWS(stack,v),header) ) ), a, v ) )  ),
        raw, DROP( REDUCE("", k, LAMBDA(a,v,
              LET( order, XMATCH( header,   CHOOSEROWS( stack, CHOOSEROWS(titleRows,v) ) ),
                   VSTACK( a, CHOOSECOLS( CHOOSEROWS( stack, v),   order  ) ) )
                )  ), 1 ),
        return, FILTER( raw, ( CHOOSECOLS(raw,1) <> 0 ) *
                             ( CHOOSECOLS( raw,1 ) <> INDEX( header,1)  ) ),
       VSTACK( header, return ) )
    • mtarler's avatar
      mtarler
      Silver Contributor

      SergeiBaklan  inspired me to step it up to be even more simple:

      =LET(in,Albums3D,
      headers,LetItBe!$A$1:$F$1,
      arr,HSTACK(in),
      VSTACK(headers,DROP(REDUCE("",headers,LAMBDA(a,i,
          HSTACK(a,TOCOL(FILTER(DROP(arr,1),INDEX(arr,1,)=i,""),3,1)))),,1))
      )

       

      • SergeiBaklan's avatar
        SergeiBaklan
        Diamond Contributor

        mtarler 

        Thank you, but unfortunately TOCOL(...,3) doesn't work since we have blank cells within data. For example no Vocal for some records. TOCOL() removes them, thus we have incorrect result.

  • Patrick2788 

    I have made some progress approaching the problem by building an array of thunks.

    Worksheet formula
    = REDUCE(Header, Albums,
        LAMBDA(rollup,album, 
          LET(
            hdr,       TAKE(album(),1),
            tracks,    DROP(album(),1),
            columns,   XMATCH(Header, hdr),
            reordered, CHOOSECOLS(tracks, columns),
            VSTACK(rollup, reordered)
          )
        )
      );
    
    Albums 
    = VSTACK(
        LAMBDA(PleasePleaseMe[#All]), 
        LAMBDA(WithTheBeatles[#All]),
        LAMBDA(AHardDaysNight[#All]),
        LAMBDA(BeatlesForSale[#All]),
        LAMBDA(Help[#All])
    );
    

    The setup steps to form the array of thunks are somewhat tedious and error-prone even though the resulting worksheet formula is OK.

    • Patrick2788's avatar
      Patrick2788
      Silver Contributor

      PeterBartholomew1 

      With this task, it seems it's inevitable CHOOSECOLS/CHOOSEROWS will be used. It's a matter of prepping the data to get to that point. The thunks certainly get the job done. I must say CHOOSECOLS/CHOOSEROWS seems to be very quick for all that's being asked (Much better than using INDEX).

       

      This task would be easier with a little bit more flexibility!

       

      The current list of functions that can handle a 3D reference is still limited even while adding in TOCOL/TOROW and TEXTJOIN to those listed in the link.

      Create a 3-D reference to the same cell range on multiple worksheets (microsoft.com)

      • mtarler's avatar
        mtarler
        Silver Contributor

        Patrick2788 Here is my approach.  I did create 1 lambda to perform an iterative VStack operation:

         

        =LET(in,PleasePleaseMe:LetItBe!$A$1:$F$31,
        headers,LetItBe!$A$1:$F$1,
        hCount,COLUMNS(headers),
        hCountSeq,SEQUENCE(,hCount),
        allcols,HSTACK(in),
        allheaders, TAKE(allcols,1),
        allHcount, COLUMNS(allheaders),
        colIndex, hCount*INT(SEQUENCE(,allHcount,0,1/hCount))+MATCH(allheaders,headers,0),
        cSorting,SORT(VSTACK(colIndex,allcols),1,1,1),
        merge,VSTACK(headers,stackN(DROP(cSorting,2),hCount)),
        FILTER(merge,CHOOSECOLS(merge,1)<>0))

         

        so in line 11 is this stackN Lambda function:

         

        LAMBDA(arr,n,
        let(c,columns(arr),
        if(c>n,
           VSTACK(CHOOSECOLS(arr,sequence(,n)),stackn(CHOOSECOLS(arr,SEQUENCE(,columns(arr)-n,n+1)),n)),
           arr)
        ))

        TBH I haven't studied your approach yet and curious to see how you solved it.

        In my case I do use 2 inputs: the 3-d range and an input for the headers so you can pick which format you want, but that could be removed and automatically use the 1st page or something.

        other than that I basically create an array "colindex" that give a 1,2,3... for the column headers based on the requested 'headers' order and then add an offset so it can be used to sort all the columns so all 'tabs' are in the same order.  Then my stackN Lambda function does the Vstack.

         

Resources