Forum Discussion

Patrick2788's avatar
Patrick2788
Silver Contributor
May 22, 2022

Unpivot Monthly Data with a Formula

Note: I know how to unpivot data using PowerQuery and vba.  What I'm interested in with this exercise is using a formula to unpivot monthly data.  I'm interested in using some of the newest functions available and pushing Excel to the limits.

 

Constraints for this exercise: The source data on the BravoCompany sheet must not be altered in any way.  Named items may be created as needed.  The solution must be 1 formula.  Use of the fill handle is prohibited.  No helper columns.

 

The Solution:  When I first created this project a few weeks ago, my first thought was to use MAP to run through the range, concatenate Name-Month-Val, and then use TEXTSPLIT to distribute them across 3 columns.  I ran into the array of arrays limitation.  The next attempt was successful. I used HSTACK and MAP to stack the arrays.  I still felt I could improve upon that solution.

 

My solution is in the attached workbook.  I'm interested in any creative solutions.  Particularly ones that don't stack arrays or that can somehow pull this off with MAKEARRAY.

 

Including PeterBartholomew1 in the discussion who may find this interesting.

 

13 Replies

  • Patrick2788's avatar
    Patrick2788
    Silver Contributor

    Patrick2788 

    With this post being bumped, I thought it'd be a good idea to revisit and refine.

     

    This effort is me trying to avoid the standard REDUCE/VSTACK method.

     

    'UnPivot
    =LAMBDA(range,LET(
        header, {"Name", "Month", "Val"},
        top, TAKE(range, 1),
        filtered, FILTER(range, LEFT(top) <> "Q"),
        c, SEQUENCE(COLUMNS(filtered) - 1, , 1, 0),
        agent, TOCOL(DROP(CHOOSECOLS(filtered, c), 1)),
        r, SEQUENCE(ROWS(agent) / COUNT(c), , 1, 0),
        months, TOCOL(DROP(CHOOSEROWS(filtered, r), , 1)),
        values, TOCOL(DROP(filtered, 1, 1)),
        VSTACK(header, HSTACK(agent, months, values))
    ))

     

  • Patrick2788 

    My current feeling is that both thunks and MAKEARRAY need to be approached with a certain degree of caution.  In the present instance, I decided to use concatenated strings to hold information relating to an item within the crosstab.  The code to the point of getting a list of packed records is the same whether I go on to use HSTACK or MAKEARRAY

    = LET(
        packed,   nBravo&","&dates&"|"&rBravo,
        mask,     LEFT(dates,1)<>"Q",
        filtered, FILTER(packed,mask),
        list,     TOCOL(filtered),
        Unpackλ(list)
      )

    Using HSTACK

    UnpackHSλ = LAMBDA(list,
        LET(
            dimensions, TEXTBEFORE(list, "|"),
            d₁,  TEXTBEFORE(dimensions, ","),
            d₂,  TEXTAFTER(dimensions, ","),
            val, VALUE(TEXTAFTER(list, "|")),
            HSTACK(d₁, d₂, val)
        )
    );

    and using MAKEARRAY

    UnpackMAλ = LAMBDA(list,
        MAKEARRAY(ROWS(list),3,
            LAMBDA(r, f,
                LET(
                    item, INDEX(list, r),
                    SWITCH(f,
                        1,TEXTBEFORE(TEXTBEFORE(item, "|"), ","),
                        2,TEXTAFTER(TEXTBEFORE(item, "|"), ","),
                        3,VALUE(TEXTAFTER(item, "|"))
                    )
                )
            )
        )
    );

    (I do not do 'concise'!)

    A key point of interest is that the solution with HSTACK took 10ms to refresh whereas MAKEARRAY took 690msThat is a massive difference.

     

     

    • Patrick2788's avatar
      Patrick2788
      Silver Contributor

      PeterBartholomew1 

      Thank you for sharing. The more examples I see with MAKEARRAY, it seems to confirm the function is not be used for re-shaping data.  Its best use is in generating random data (I'd argue there are other simpler options available to doing the same task).

       

      Re: calculation times - I have a working knowledge of VBA but would much rather do a task at the sheet level if possible (Especially with the wealth of new functions being released).  The way I was told to write code was to minimize 'touching the sheet' (e.g. write data to cells rather than copy/paste, avoid .select, etc.).  I'd like to understand where MAKEARRAY's calculation time is coming from. My guess is the use of INDEX and having to 'touch the sheet' more than the HSTACK solution.

      • mtarler's avatar
        mtarler
        Silver Contributor
        I agree that INDEX should take many more cycles than HSTACK. Basically you are evaluating/executing on every element while HSTACK presumably acts on the entire array(s). I presume you could 'test' this somewhat by comparing a 100,000x2 grid compared to a 2x100,000 grid since the former should show significant improvement using HSTACK but MAY (depending on the coding) show much less improvement on the latter.
      • PeterBartholomew1's avatar
        PeterBartholomew1
        Silver Contributor

        SergeiBaklan

        It gets worse than that!

        If one simple focusses on the values (not the agent names or dates),

        = TOCOL(CHOOSECOLS(rBravo,f))

        the formula takes 0.7 ms.  On the other hand, if I use MAKEARRAY to calculate a thunk containing the agent, date and value triples for each value, then filter and unpivot the thunks, and finally use a second MAKEARRAY to extract the triple from each thunk

        = LET(
              packedϑ, MAKEARRAY(200,16,
                 LAMBDA(r,f,
                    LAMBDA(CHOOSE({1,2,3}, 
                       INDEX(nBravo,r),
                       INDEX(dates,f), 
                       INDEX(rBravo,r,f)
                    )
                 )
              )),
              mask,      LEFT(dates,1)<>"Q",
              filteredϑ, FILTER(packedϑ,mask),
              listϑ,     TOCOL(filteredϑ),
              MAKEARRAY(2400,3,
                 LAMBDA(r₀,c,INDEX(INDEX(listϑ,r₀,1)(),c))
              )
           )

         takes 10,032 ms!     

        DO NOT USE THIS!

        I would rather like to hear an authoritative account of where the CPU cycles went!

         

  • Patrick2788's avatar
    Patrick2788
    Silver Contributor
    Both solutions are excellent considering the constraints. MAKEARRAY's drawback seems to be that it needs the help of INDEX unless you're using it to create random data. It's a bit easier to use HSTACK/VSTACK where needed. Stacking seems to be the Excel Team's answer to the array of arrays limitation.
  • mtarler's avatar
    mtarler
    Silver Contributor

    Patrick2788  here is another, but again I wouldn't call it 'creative' as much as brute force...

    =LET(rc,ROWS(nBravo),
         cc,COLUMNS(dates),
    ungroup,MAKEARRAY(rc*cc,3,
                 LAMBDA(r,c,CHOOSE(c,
                     INDEX(nBravo,QUOTIENT(r-1,cc)+1),
                     INDEX(dates,MOD(r-1,cc)+1),
                     INDEX(rBravo,QUOTIENT(r-1,cc)+1,MOD(r-1,cc)+1)
                 ))),
        FILTER(ungroup,ISERROR(SEARCH("Total",INDEX(ungroup,,2)))))
  • flexyourdata's avatar
    flexyourdata
    Iron Contributor

    Patrick2788 

     

    I believe this will work:

     

    =LET(
      dat,$A$2:$Q$202,
      h,INDEX(dat,1,),
      filt,FILTER(dat,LEFT(h,1)<>"Q"),
      fr,ROWS(filt),
      fc,COLUMNS(filt),
      outr,(fr-1)*(fc-1)+1,
      out,MAKEARRAY(outr,3,
            LAMBDA(r,c,
              LET(
              colindex,MOD(r-2,fc-1)+2,
              rowindex,QUOTIENT(r-2,fc-1)+2,
                IF(
                  r=1,INDEX({"Agent","Month","Value"},1,c),
                  CHOOSE(c,
                    INDEX(filt,rowindex,1),
                    INDEX(filt,1,colindex),
                    INDEX(filt,rowindex,colindex)
                  )
                )
              )
            )
          )
    ,out
    )

Resources