Stacking The Beatles in Excel: An Exercise in 3D stacking

Valued Contributor

The setup:  You're provided with an Excel workbook with 12 sheets for 12 Beatles albums. All 12 sheets have the same fields, but the fields are not in the same order in each sheet.

 

The goal: Roll up 12 sheets and order the columns uniformly in the array.

 

I'll step you through my solution and then open up the discussion for anyone that would like to give this a try.

 

First, a named item for the 3D reference:

=PleasePleaseMe:LetItBe!$A$1:$F$31

 

Next, HSTACK and SORT gathers the track listings:

=SORT(HSTACK(Albums3D),,,1)

This is a snip of what the stack looks like:

Patrick2788_0-1663523372516.png

 

The biggest step is the re-stack:

=LET(AlbumsNoHeader,DROP(StackedAlbums,1),stack,HSTACK(TOCOL(TAKE(AlbumsNoHeader,,12),,1),TOCOL(CHOOSECOLS(AlbumsNoHeader,SEQUENCE(,12,13,1)),,1),TOCOL(CHOOSECOLS(AlbumsNoHeader,SEQUENCE(,12,25,1)),,1),TOCOL(CHOOSECOLS(AlbumsNoHeader,SEQUENCE(,12,37,1)),,1),TOCOL(CHOOSECOLS(AlbumsNoHeader,SEQUENCE(,12,49,1)),,1),TOCOL(TAKE(AlbumsNoHeader,,-12),,1)),UNIQUE(stack,,1))

Patrick2788_1-1663523523022.png


Finally, putting it all together:

=VSTACK(UNIQUE(TAKE(StackedAlbums,1),1),ReStack)

 

I'm most interested in the "restacking".  It seems heaven and earth are being moved to sort things out and wonder if there's a more direct way of doing it.

 

Formulas only.  Please no VBA or PowerQuery.

 

14 Replies

@Patrick2788 

I have made some progress approaching the problem by building an array of thunks.

Worksheet formula
= REDUCE(Header, Albums,
    LAMBDA(rollup,album, 
      LET(
        hdr,       TAKE(album(),1),
        tracks,    DROP(album(),1),
        columns,   XMATCH(Header, hdr),
        reordered, CHOOSECOLS(tracks, columns),
        VSTACK(rollup, reordered)
      )
    )
  );

Albums 
= VSTACK(
    LAMBDA(PleasePleaseMe[#All]), 
    LAMBDA(WithTheBeatles[#All]),
    LAMBDA(AHardDaysNight[#All]),
    LAMBDA(BeatlesForSale[#All]),
    LAMBDA(Help[#All])
);

The setup steps to form the array of thunks are somewhat tedious and error-prone even though the resulting worksheet formula is OK.

@Peter Bartholomew 

With this task, it seems it's inevitable CHOOSECOLS/CHOOSEROWS will be used. It's a matter of prepping the data to get to that point. The thunks certainly get the job done. I must say CHOOSECOLS/CHOOSEROWS seems to be very quick for all that's being asked (Much better than using INDEX).

 

This task would be easier with a little bit more flexibility!

 

The current list of functions that can handle a 3D reference is still limited even while adding in TOCOL/TOROW and TEXTJOIN to those listed in the link.

Create a 3-D reference to the same cell range on multiple worksheets (microsoft.com)

@Patrick2788 Here is my approach.  I did create 1 lambda to perform an iterative VStack operation:

 

=LET(in,PleasePleaseMe:LetItBe!$A$1:$F$31,
headers,LetItBe!$A$1:$F$1,
hCount,COLUMNS(headers),
hCountSeq,SEQUENCE(,hCount),
allcols,HSTACK(in),
allheaders, TAKE(allcols,1),
allHcount, COLUMNS(allheaders),
colIndex, hCount*INT(SEQUENCE(,allHcount,0,1/hCount))+MATCH(allheaders,headers,0),
cSorting,SORT(VSTACK(colIndex,allcols),1,1,1),
merge,VSTACK(headers,stackN(DROP(cSorting,2),hCount)),
FILTER(merge,CHOOSECOLS(merge,1)<>0))

 

so in line 11 is this stackN Lambda function:

 

LAMBDA(arr,n,
let(c,columns(arr),
if(c>n,
   VSTACK(CHOOSECOLS(arr,sequence(,n)),stackn(CHOOSECOLS(arr,SEQUENCE(,columns(arr)-n,n+1)),n)),
   arr)
))

TBH I haven't studied your approach yet and curious to see how you solved it.

In my case I do use 2 inputs: the 3-d range and an input for the headers so you can pick which format you want, but that could be removed and automatically use the 1st page or something.

other than that I basically create an array "colindex" that give a 1,2,3... for the column headers based on the requested 'headers' order and then add an offset so it can be used to sort all the columns so all 'tabs' are in the same order.  Then my stackN Lambda function does the Vstack.

 

@mtarler OK so i looked at your solution and modified my recursive Lambda to work with it:

Stack
=LAMBDA(arr,
    LET(
        c, COLUMNS(arr),
        uniqCols, COLUMNS(UNIQUE(chooserows(arr, 1), 1)),
        copies, c / uniqCols,
        IF(
            uniqCols > 1,
            hstack(
                vstack(INDEX(arr, 1, 1), tocol(drop(arr, 1, copies - c), 3, 1)),
                stack(drop(arr, , copies))
            ),
            vstack(INDEX(arr, 1, 1), tocol(drop(arr, 1), 3, 1))
        )
    )
)

and then called by:

=LET(in,PleasePleaseMe:LetItBe!$A$1:$F$31,
Stack(SORT(HSTACK(in),1,1,1) )
)

 

@mtarler 

This is a fascinating approach! I had some fun testing this solution. I added some dummy rows and extended your 3D reference to:

=PleasePleaseMe:LetItBe!$A$1:$F$60000

It took a minute but Excel completed the calculation!  I tried 65K and above but no dice - crash.  Handling 720,000 rows is impressive.

Since the solution is using recursion I wonder if more tabs would significantly diminish the performance (i.e. 120 tabs but 6000 rows).
I am still just learning these new text functions as you might notice I switched from CHOOSECOLS to DROP between the 2 LAMBDA functions and have to wonder if either is usable which is more or less efficient. For example:
CHOOSECOLS(arr, SEQUENCE(,n))
vs
DROP(arr, n-COLUMNS(arr))

@Patrick2788  Here is a 3rd solution combining the above 2.  So this recursive Lambda is real slick in that it can take any 2-d array with column headers (not necessarily sequential or grouped) and stack all the columns corresponding to the requested set of Headers for the output:

 

StackCommon 
= LAMBDA(arr, headers,
    IF( COLUMNS(headers) > 1,
        HSTACK(
            VSTACK(
                INDEX(headers, 1),
                TOCOL(FILTER(DROP(arr, 1), INDEX(arr, 1, ) = INDEX(headers, 1), ""),3,1)
            ),
            stackcommon(arr, DROP(headers, , 1))
        ),
        VSTACK(
            headers,
            TOCOL(FILTER(DROP(arr, 1), INDEX(arr, 1, ) = headers, ""), 3, 1)
        )
    )
)

 

and calling formula is:

 

=LET(in,PleasePleaseMe:LetItBe!$A$1:$F$31,
headers,LetItBe!$A$1:$F$1,
StackCommon(HSTACK(in),headers)
)

 

@Patrick2788 

More straightforward

=LET(
    header, {"Album","Length","No.","Side","Title","Vocals"},
    stack, VSTACK(Albums3D),
    k, SEQUENCE( ROWS( stack ) ),
    titleRows, SCAN(1, k, LAMBDA(a,v,
              IF( AND( ISNA( XMATCH( CHOOSEROWS(stack,v),header) ) ), a, v ) )  ),
    raw, DROP( REDUCE("", k, LAMBDA(a,v,
          LET( order, XMATCH( header,   CHOOSEROWS( stack, CHOOSEROWS(titleRows,v) ) ),
               VSTACK( a, CHOOSECOLS( CHOOSEROWS( stack, v),   order  ) ) )
            )  ), 1 ),
    return, FILTER( raw, ( CHOOSECOLS(raw,1) <> 0 ) *
                         ( CHOOSECOLS( raw,1 ) <> INDEX( header,1)  ) ),
   VSTACK( header, return ) )

@Sergei Baklan  inspired me to step it up to be even more simple:

=LET(in,Albums3D,
headers,LetItBe!$A$1:$F$1,
arr,HSTACK(in),
VSTACK(headers,DROP(REDUCE("",headers,LAMBDA(a,i,
    HSTACK(a,TOCOL(FILTER(DROP(arr,1),INDEX(arr,1,)=i,""),3,1)))),,1))
)

 

@mtarler 

Thank you, but unfortunately TOCOL(...,3) doesn't work since we have blank cells within data. For example no Vocal for some records. TOCOL() removes them, thus we have incorrect result.

@Sergei Baklan good catch.  simple fix:

=LET(in,Albums3D,
headers,LetItBe!$A$1:$F$1,
arr,HSTACK(in),
stacked,DROP(REDUCE("",headers,LAMBDA(a,i,
    HSTACK(a,TOCOL(FILTER(DROP(arr,1),INDEX(arr,1,)=i,""), 0,1)))),,1),
VSTACK(headers,FILTER(stacked,CHOOSECOLS(stacked,1)<>0,""))
)

@Patrick2788 

I seem to be falling behind the curve here!

My formula using the thunk array was fine, but it was tedious forming the array in the first place.  I also felt I was missing out on the new VSTACK 3D capability.  I started the refactoring by defining thunks from the stacked array

 

LET(
   a, SEQUENCE(12),
   stacked, VSTACK(Albums3D),
   tracksϑ, MAP(a, LAMBDA(a₁,
      LAMBDA(TAKE(DROP(stacked,31*(a₁-1)),31)))),
// and so on

 

Then I realised that, since I know there are 31 rows set aside for the tracks within each album, I might as well use a Lambda function with the album number as a parameter.  I therefore changed to

 

= LET(
   stacked, VSTACK(Albums3D),
   Hdrsλ,   LAMBDA(a₁, TAKE(DROP(stacked,31*(a₁-1)),1)),
   Tracksλ, LAMBDA(a₁, TAKE(DROP(stacked,1+31*(a₁-1)),30)),
//and then ...
   a, SEQUENCE(,12),
   expanded, REDUCE(Header, a,
      LAMBDA(rollup,a₁, 
        LET(
           hdr,       Hdrsλ(a₁),
           tracks,    Tracksλ(a₁),
           col,       XMATCH(Header,hdr),
           reordered, CHOOSECOLS(tracks, col),
           VSTACK(rollup,reordered)
        )
      )
    );
   FILTER(expanded, ISTEXT(TAKE(expanded,,1)))
)

 

The question then is 'Why do my formulas continue to get larger whereas everyone else's gets shorter?'

The more I try to approach this task from a fresh angle, the more I run into everyone else's solutions! I've played with sequencing and scanning but ultimately it goes back to solutions already posted.

@Patrick2788 

This is my attempt to approach this from a different angle. Although, you'll see everyone's solutions have inspired me in a way.

 

The 3D reference. Same as 1st attempt:

=PleasePleaseMe:LetItBe!$A$1:$F$31

 

The header pulled from the Let it Be sheet because that's the column order I'm going for:

=LetItBe!$A$1:$F$1

 

The 3D stack. This time with VSTACK:

=VSTACK(Albums3D)

 

This is where my solution is different.  This formula pulls the header from each album and repeats it 31 times.  I've added SHEETS() so Magical Mystery Tour can be added!

=CHOOSEROWS(SEQUENCE(SHEETS()-1,,1,31),INT(SEQUENCE((SHEETS()-1)*31,,1,1/31)))

 

This formula checks the column position by each row and strings together an array:

=LET(c,CHOOSEROWS(VSTACK(Albums3D),SeqRows),BYROW(c,LAMBDA(row,TEXTJOIN(",",,XMATCH(header,row)))))

Which looks like this:

Patrick2788_0-1664041512429.png

 

Next, the above array is reduced to a scalar, split, and wrapped:

=WRAPROWS(TEXTSPLIT(TEXTJOIN(",",,strArr),","),6)

 Which looks like this:

Patrick2788_1-1664041653615.png

 

Lastly, the above array is fed to INDEX, some filtering is done, and the header is added:

=LET(tracks,INDEX(StackedAlbums,SEQUENCE((SHEETS()-1)*31),arr),f,FILTER(tracks,(TAKE(tracks,,-1)>0)*(ISNUMBER(TAKE(tracks,,-1)))),VSTACK(header,f))