May 22 2022 10:47 AM
Note: I know how to unpivot data using PowerQuery and vba. What I'm interested in with this exercise is using a formula to unpivot monthly data. I'm interested in using some of the newest functions available and pushing Excel to the limits.
Constraints for this exercise: The source data on the BravoCompany sheet must not be altered in any way. Named items may be created as needed. The solution must be 1 formula. Use of the fill handle is prohibited. No helper columns.
The Solution: When I first created this project a few weeks ago, my first thought was to use MAP to run through the range, concatenate Name-Month-Val, and then use TEXTSPLIT to distribute them across 3 columns. I ran into the array of arrays limitation. The next attempt was successful. I used HSTACK and MAP to stack the arrays. I still felt I could improve upon that solution.
My solution is in the attached workbook. I'm interested in any creative solutions. Particularly ones that don't stack arrays or that can somehow pull this off with MAKEARRAY.
Including @Peter Bartholomew in the discussion who may find this interesting.
May 22 2022 11:47 AM
I believe this will work:
=LET(
dat,$A$2:$Q$202,
h,INDEX(dat,1,),
filt,FILTER(dat,LEFT(h,1)<>"Q"),
fr,ROWS(filt),
fc,COLUMNS(filt),
outr,(fr-1)*(fc-1)+1,
out,MAKEARRAY(outr,3,
LAMBDA(r,c,
LET(
colindex,MOD(r-2,fc-1)+2,
rowindex,QUOTIENT(r-2,fc-1)+2,
IF(
r=1,INDEX({"Agent","Month","Value"},1,c),
CHOOSE(c,
INDEX(filt,rowindex,1),
INDEX(filt,1,colindex),
INDEX(filt,rowindex,colindex)
)
)
)
)
)
,out
)
May 22 2022 12:54 PM
@Patrick2788 here is another, but again I wouldn't call it 'creative' as much as brute force...
=LET(rc,ROWS(nBravo),
cc,COLUMNS(dates),
ungroup,MAKEARRAY(rc*cc,3,
LAMBDA(r,c,CHOOSE(c,
INDEX(nBravo,QUOTIENT(r-1,cc)+1),
INDEX(dates,MOD(r-1,cc)+1),
INDEX(rBravo,QUOTIENT(r-1,cc)+1,MOD(r-1,cc)+1)
))),
FILTER(ungroup,ISERROR(SEARCH("Total",INDEX(ungroup,,2)))))
May 22 2022 01:42 PM
May 23 2022 01:34 AM
My current feeling is that both thunks and MAKEARRAY need to be approached with a certain degree of caution. In the present instance, I decided to use concatenated strings to hold information relating to an item within the crosstab. The code to the point of getting a list of packed records is the same whether I go on to use HSTACK or MAKEARRAY
= LET(
packed, nBravo&","&dates&"|"&rBravo,
mask, LEFT(dates,1)<>"Q",
filtered, FILTER(packed,mask),
list, TOCOL(filtered),
Unpackλ(list)
)
Using HSTACK
UnpackHSλ = LAMBDA(list,
LET(
dimensions, TEXTBEFORE(list, "|"),
d₁, TEXTBEFORE(dimensions, ","),
d₂, TEXTAFTER(dimensions, ","),
val, VALUE(TEXTAFTER(list, "|")),
HSTACK(d₁, d₂, val)
)
);
and using MAKEARRAY
UnpackMAλ = LAMBDA(list,
MAKEARRAY(ROWS(list),3,
LAMBDA(r, f,
LET(
item, INDEX(list, r),
SWITCH(f,
1,TEXTBEFORE(TEXTBEFORE(item, "|"), ","),
2,TEXTAFTER(TEXTBEFORE(item, "|"), ","),
3,VALUE(TEXTAFTER(item, "|"))
)
)
)
)
);
(I do not do 'concise'!)
A key point of interest is that the solution with HSTACK took 10ms to refresh whereas MAKEARRAY took 690ms. That is a massive difference.
May 23 2022 03:12 AM
@Peter Bartholomew , why I'm not surprised...
May 23 2022 06:43 AM
Thank you for sharing. The more examples I see with MAKEARRAY, it seems to confirm the function is not be used for re-shaping data. Its best use is in generating random data (I'd argue there are other simpler options available to doing the same task).
Re: calculation times - I have a working knowledge of VBA but would much rather do a task at the sheet level if possible (Especially with the wealth of new functions being released). The way I was told to write code was to minimize 'touching the sheet' (e.g. write data to cells rather than copy/paste, avoid .select, etc.). I'd like to understand where MAKEARRAY's calculation time is coming from. My guess is the use of INDEX and having to 'touch the sheet' more than the HSTACK solution.
May 23 2022 08:28 AM
It gets worse than that!
If one simple focusses on the values (not the agent names or dates),
= TOCOL(CHOOSECOLS(rBravo,f))
the formula takes 0.7 ms. On the other hand, if I use MAKEARRAY to calculate a thunk containing the agent, date and value triples for each value, then filter and unpivot the thunks, and finally use a second MAKEARRAY to extract the triple from each thunk
= LET(
packedϑ, MAKEARRAY(200,16,
LAMBDA(r,f,
LAMBDA(CHOOSE({1,2,3},
INDEX(nBravo,r),
INDEX(dates,f),
INDEX(rBravo,r,f)
)
)
)),
mask, LEFT(dates,1)<>"Q",
filteredϑ, FILTER(packedϑ,mask),
listϑ, TOCOL(filteredϑ),
MAKEARRAY(2400,3,
LAMBDA(r₀,c,INDEX(INDEX(listϑ,r₀,1)(),c))
)
)
takes 10,032 ms!
DO NOT USE THIS!
I would rather like to hear an authoritative account of where the CPU cycles went!
May 23 2022 08:45 AM
May 23 2022 02:35 PM
Charles could explain all details, but look, you call few thousand times values to memory, use slow FILTER and another few thousand times make calls on per-value basis. It shall be slower than call few times large blocks.
May 24 2022 06:42 AM
Sergei, perhaps I should send the file to Charles. @Patrick2788 's solution is the fastest. I was deliberately trying different methods against the problem even where I suspected they would not be efficient. I was not expecting to get differences in excess of 10,000 though! Even if I were calculating 2400 terms of an array, selecting 1, and repeating the 2400 selecting the second etc. I would not expect the times to be that bad.
The attached file contains timing information for the methods posted here but with no study of size effects.
May 24 2022 02:07 PM
Hi Peter, I had a quick look at this after following your other link. Perhaps a simplified version might help highlight the source of the performance issues:
=LET(thunks,IF(SEQUENCE(2400),LAMBDA({1,2,3})),
MAKEARRAY(2400,3,LAMBDA(r,c,INDEX(INDEX(thunks,r,1)(),1,c))))
Entering the above formula in any cell returns a 2400 x 3 array and takes 10+ seconds to calc. However this single row version calculates near instantaneously:
=LET(thunks,IF(SEQUENCE(1),LAMBDA({1,2,3})),
MAKEARRAY(2400,3,LAMBDA(r,c,INDEX(INDEX(thunks,1,1)(),1,c))))
It seems that the thunk array might be being re-evaluated for each element of the MAKEARRAY result matrix - presumably this is due to 'lazy evaluation'. I've tended to steer clear of thunks so far but remain hopeful performance will improve over time.
Aug 16 2023 01:17 PM
I just learned another more efficient way which I'm posting here in case I forget it,
=LET(thunks, IF(SEQUENCE(2400), LAMBDA({1, 2, 3})), MAP(IF(SEQUENCE(, 3), thunks),IF(SEQUENCE(2400), SEQUENCE(, 3)),LAMBDA(thunk, i, INDEX(thunk(),i))))
The MAP method takes only 0.04s compared to over 10s using MAKEARRAY. It seems placing a LAMBDA around 'thunks' in MAKEARRAY wastes many cycles. I was also surprised to find sorting rows on the same size array using,
=LET(
thunks, BYROW(
Data,
LAMBDA(row, LET(sorted, SORT(row, , , 1), LAMBDA(sorted)))
),
MAP(
IF(SEQUENCE(, COLUMNS(Data)), thunks),
IF(SEQUENCE(ROWS(Data)), SEQUENCE(, COLUMNS(Data))),
LAMBDA(thunk, i, INDEX(thunk(), i))
)
)
versus the more standard REDUCE/VSTACK method,
=DROP(
REDUCE(
0,
SEQUENCE(ROWS(Data)),
LAMBDA(a, i, VSTACK(a, SORT(CHOOSEROWS(Data, i), , , 1)))
),
1
)
was around 10 times quicker!
[~0.1s vs ~1s with Data=RANDARRAY(2400,3,0,100,1)]
Aug 19 2023 10:34 AM
With this post being bumped, I thought it'd be a good idea to revisit and refine.
This effort is me trying to avoid the standard REDUCE/VSTACK method.
'UnPivot
=LAMBDA(range,LET(
header, {"Name", "Month", "Val"},
top, TAKE(range, 1),
filtered, FILTER(range, LEFT(top) <> "Q"),
c, SEQUENCE(COLUMNS(filtered) - 1, , 1, 0),
agent, TOCOL(DROP(CHOOSECOLS(filtered, c), 1)),
r, SEQUENCE(ROWS(agent) / COUNT(c), , 1, 0),
months, TOCOL(DROP(CHOOSEROWS(filtered, r), , 1)),
values, TOCOL(DROP(filtered, 1, 1)),
VSTACK(header, HSTACK(agent, months, values))
))