A statement of truth or a wasted opportunity. Lambda helper functions reluctant to return 2D arrays.

Silver Contributor

I remember @JoeMcDaid stating that 2D arrays are the only data types held in Excel; rows columns and scalars are merely special cases with the last being a 1x1 array.

image.png

So why is it such a problem bringing related rows together the form a 2D array?  There are a number of tricks and workarounds, but should that be necessary?  By way of example I show an old data file with the individual results from a cross-country championships.

image.png

(first 20 of 300)

The exercise is to rank the teams by adding the positions of the first 4 of their 8 runners to get a point count (lowest wins).  What I require is an output with the teams in result order showing the names of the 4 scoring members of the team.

image.png

The formula looks to be a monster at first sight

 

 

= LET(
  \0, "Extract source data",
      pos,  --LEFT(rawdata,7),
      name, (MID(rawdata,8,22)),
      team, TRIM(MID(rawdata,31,26)),
      time,--("00:"&RIGHT(rawdata,5)),
      teamList, UNIQUE(team),
  \1, "Build Lambda functions",
      SCOREλ, LAMBDA(t,
         LET(
            pts, FILTER(pos, team=t),
            SUM(INDEX(pts,{1,2,3,4}))
         )),
      MEMBERSλ, LAMBDA(t,
         LET(
            m, FILTER(name, team=t),
            CONCAT(m)
         )),
  \2, "Identify scoring members of teams in order",
      teamScore, MAP(teamList, SCOREλ),
      teamOrder, SORTBY(teamList, teamScore),
      scoring, MAP(teamOrder, MEMBERSλ),
      MID(scoring,{1,23,45,67},22))

 

 

but the section following name '\0' and the text comment is simply extracting individual fields from the source data and could easily be separated off within a helper range.  Similarly the lines following '\1' defines named Lambda functions to calculate the team score given the name of a single team and to generate the array of scoring competitors.

The meat of the calculation is at the end: the teams are scored and ranked by score.  The penultimate line assemble the list of scoring members for each team in order.  My complaint is that each team cannot be treated as an array without triggering an error message.  Instead I have had to concatenate the names to product a column of results.  Only then can I split the column back to give the array I need.   

There are other approaches such as that shown by @tboulden in 

Array of arrays using Lambda helper functions - Microsoft Tech Community

which used REDUCE and a sequence of VSTACK operations.

 

Does the restriction on arrays of arrays serve any purpose given that the spreadsheet is all about the manipulation of 2D arrays?  What do you think? 

@Sergei Baklan 

@lori_m 

@Chris Gross 

69 Replies

@tboulden 

Sooner or later, I may try to generalise your formulae to cover more degrees of freedom.

image.png

First, however, I think I need to take time out to review the progress we (mainly you) have made.  I need to analyse the lessons learnt and would like to be sure I can apply them when the situation arises.

@Peter BartholomewI think I'm working my way around to what your contact had suggested for passing thunks, please see below. I've not figured out how best to incorporate the changing F value, and I seem to have screwed up the calc somewhere, but that's probably an error on my part. This is based on an ACCUMULATE I've built to use an array of LAMBDAs, but in this case I'm only passing in the single LAMBDA.

=LAMBDA(init_value,funcs,[iterations],
     LET(
         fxns,IF(TYPE(funcs)=64,funcs,CHOOSE({1},funcs)),
         iter,IF(ISOMITTED(iterations),1,iterations),
         fxns_array,
             MAKEARRAY(iter*ROWS(fxns),COLUMNS(fxns),
                 LAMBDA(i,j,INDEX(fxns,MOD(i-1,ROWS(fxns))+1,j))
             ),
         thunks,SCAN(LAMBDA(init_value),fxns_array,LAMBDA(acc,fn,LET(carry,acc(),curr,fn(carry),LAMBDA(curr)))),
         CHOOSE({1,2},MAP(thunks,LAMBDA(thunk,INDEX(thunk(),1,1))),MAP(thunks,LAMBDA(thunk,INDEX(thunk(),2,1))))
     )
)({0;2},LAMBDA(xv,xv+Δt*MMULT(CHOOSE({1,2;3,4},0,1,-κ,-Coef),xv)+Δt*CHOOSE({1;2},0,F/Mass)),12)

@tboulden 

I revisited the earlier depreciation example and am posting associated Python code in the hope that it might help make sense of these different approaches.

from itertools import accumulate
from numpy import array, reshape, transpose

n=9
bal=60000
r=[0.9,0.88]

#1. 2d array
scan1=accumulate(array([r]*n),lambda acc,r:acc*r,initial=[bal]*2)
print(transpose(list(scan1)))

#2. Generator
f=lambda rate:accumulate([rate]*n,lambda acc,r:acc*r,initial=bal)
scan2=list(map(f,r))
print([list(scan2[0]),list(scan2[1])])

#3. Composition
scan3=accumulate([bal]+[lambda x:x*r[0]]*n
       +[lambda x:bal]+[lambda x:x*r[1]]*n,lambda acc,f:f(acc))
print(reshape(list(scan3),(2,n+1)))

All three methods return the same results as in the workbook,

[[60000, 54000.0, 48600.0, 43740.0, 39366.0, 35429.4, 31886.46, 28697.81, 25828.03, 23245.23],
[60000, 52800.0, 46464.0, 40888.32, 35981.72, 31663.92, 27864.25, 24520.54, 21578.07, 18988.70]]

Notes:
#1. is not (yet) an option in Excel since initial_value is restricted to a scalar. Hopefully there may be some similar option available in future.

#2. In Python 'map' and 'accumulate' return an object reference to be passed to list(). I suppose the Excel equivalent would pass LAMBDA(f) to MAP() as a 'thunk' to be evaluated with ARRAYMAKE()?

#3. This follows your approach of composing an array of lambdas. In Excel I guess one may be able to leverage the row-major order to generate the results and then transpose the result set?

@lori_mThanks for this, had to play with these in a Jupyter notebook because Python doesn't live in my head. I confused myself because I had created an ACCUMULATE function that overshadowed my understanding of itertools' accumulate. Will update once I have more answers to your scenarios, but what I've determined so far is I'll definitely have to get accustomed to thunking vector/tuple-equivalents in Excel to be able to work with the LAMBDA helpers. More to come!

@tboulden 

The following code defines a function  'scanv' for a vector accumulation by iterating the equation x₁=Ax₀+b. The results match the particle dynamics worksheet and the sequence of Fibonacci pairs.

from itertools import accumulate
from numpy import array, transpose

scanv = lambda A,B,x0:accumulate(B,lambda x1,b:x1@A+b,initial=array(x0))

#particle dynamics
A = [[1,-0.05],[0.1,0.99]]
B = [0,0.1]*transpose([[1]*11+[0]*29+[1]*21+[0]*40])
x0 = [0,2]
print(array(list(scanv(A,B,x0))))

#fibonacci
A = [[0,1],[1,1]]
B = [[0,0]]*30
x0 = [0,1]
print(array(list(scanv(A,B,x0))))

 

It looks like you have built an Excel equivalent using lambdas which is quite some feat - I'm just getting a bit lost trying to deconstruct it!

@tboulden @lori_m 

Thank you so much for taking this conversation forward.  I know some others have looked at the discussion but hesitated to get involved.  At times you are pushing me to my limits and beyond.  I plan to keep the discussion for future reference!

 

In recent days/weeks, I have tried to refactor a number of workbooks to exploit the new Lambda helper functions, that implement ideas that Lori first introduced me to.  The challenge of producing true 2D arrays rather than copying single rows seems to reoccur.  Whilst true 2D arrays are comparatively unusual, I would estimate that about 80% of solutions have array calculations conducted in parallel and would benefit from the ability to handle arrays of array (Joe did say "now that arrays are more prominent, the limitation is more apparent").  An example of this is the 12 month rolling forecast demonstrated by Danielle Stein-Fairhurst.

Building a Rolling Forecast in Excel - YouTube

I did manage to produce a fully-dynamic rolling forecast using MAKEARRAY and a Lambda function FORECASTλ that called the inbuilt FORECAST.ETS function by row and column indices (line item number and period counter) but it was far from straightforward and I really could do with a 'How to' guide if I am going to get beyond trial and error. 

 

I have yet to achieve the objective with 'thunks', so I am clearly falling short in my understanding.  My most successful approach has been to build a Lambda function that will return a single value from the desired array of results.  MAKEARRAY (or to a lesser extent, MAP) can then return the 2D array but it is far from fool-proof (unless that is just me bringing a finer grade of foolishness to the problem!)

 

 

image.png

I haven't attached Danielle's workbook without getting permission but have, instead, attached a simpler workbook which I used for development purposes.

@Peter Bartholomew  Let me know if this helps re: thunks; or anyone following along in the audience who may not know what we're talking about.

 

So as not to cloud the process with anything overly useful, I've defined PEEL to peel an array like an onion:

=LAMBDA(array,
    LET(
        rows_,ROWS(array),
        cols_,COLUMNS(array),
        INDEX(array,SEQUENCE(rows_-2,,2,1),SEQUENCE(,cols_-2,2,1))
    )
)

We can use this in REDUCE with no problem because REDUCE only returns the final result, unconstrained by the dimensions of the 2nd parameter, unlike MAP, BYROW/BYCOL, SCAN.

PEEL.pngREDUCE.png

 

But if we try it in SCAN, we get an error because the results break the dimensions of 2nd parameter.

SCAN_error.png

 

If we thunk the results instead, we can get all 4 back; they show as errors, but the arrays are inside the thunked LAMBDAs.

thunks.png

If we define EXPAND like so:

=LAMBDA(array_of_arrays,indx,
    LET(
        target_thunk,INDEX(array_of_arrays,indx,1),
        target_thunk()
    )
)

Then we can use it to select and return a thunked array.

EXPAND.png

Forgot to attach workbook.

@tboulden 

Clever use of REDUCE with array parameters! That would never have crossed my mind.

 

It wasn't immediately clear to me exactly what the PEEL function was doing, maybe using SEQUENCE(n,n) for the screen clipping could assist others like me who do not yet have access to these new functions.

 

In any case, this looks very promising for working around the array truncation stumbling block that @Peter Bartholomew cites in the original post. 

I wonder if the 'thunk' method could also be made to work with SCAN?

For reference, this notebook code returns the same results as your sample file 

# array of arrays
peel = lambda x,_:x[1:-1,1:-1]
scanp=accumulate([0]*4,peel,initial=np.eye(9))
list(scanp)
# array of thunks
peelt = lambda x,_:lambda :x()[1:-1,1:-1]
scant=accumulate([0]*4,peelt, initial=lambda :np.eye(9))
list(scant)[3]()

 [np.eye = MUNIT]

@lori_mPlease see attached, modified EXPAND slightly to handle 2d arrays of thunks. You'll note that for most of these, SCAN omits the init_value; this can be overcome if absolutely necessary, but is an added complication. I've also included a version of reshape that I tried basing off np.reshape, but it still needs some polishing.

=LAMBDA(array,[aRows],[aCols],
    LET(
        aRows_,IF(OR(ISOMITTED(aRows),aRows=""),1,INT(aRows)),
        aCols_,IF(OR(ISOMITTED(aCols),aCols=""),COUNTA(array),INT(aCols)),
        rows_,ROWS(array),
        cols_,COLUMNS(array),
        seq,SEQUENCE(aRows,aCols,0,1),
        r_,1+QUOTIENT(seq,cols_),
        c_,1+MOD(seq,cols_),
        reshaped,IFERROR(INDEX(array,r_,c_),""),
        result,FILTER(reshaped,BYROW(reshaped,LAMBDA(row_,NOT(AND(row_=""))))),
        result
    )
)

 

I'm enjoying these side-by-side comparisons, I've got another one for you, earlier this year I wrote a LAMBDA to extract subarrays of size [r x c] from a given array of size [m x n] in row-major order. Originally it was returning all subarrays, but since we hadn't figured out how to handle arrays-of-arrays, I was just using ARRAYTOTEXT, or subbing in an appropriate function to get back a single value.

subarrays.jpg

 

Now that we are thunking the arrays, I can provide an index list, and operate on specific subarrays. What would this look like in Python?

subthunk.jpg

Sorry, keep forgetting about the attachment.

@tboulden 

Wow!  If this were an art class, I am still working through Picasso's cubist period, whilst you guys are moving through Neoclassism and into the Surreal.   Despite that, I have made a little progress and come up with a Lambda function that appends a seasonally adjusted forecast to a table of actuals.  The prompt shows 

= FORECASTSλ(timeline, actuals, forecastPeriods, fill)

and the code reads

= LAMBDA(vec,arr,n,fill,
    MAKEARRAY(ROWS(arr), n+COLUMNS(arr),
      LAMBDA(row,col,
        LET(
          M,  COLUMNS(arr),
          k,  SEQUENCE(1,M),
          X,  INDEX(vec,col),
          X0, INDEX(vec,k),
          Y0, INDEX(arr,row,0),
          IF(col>M,
             FORECAST.ETS(X,Y0,X0),
             IF(fill, INDEX(Y0,col), "")))))
   )

At the moment, I find the creation of such formulae requires Herculean effort but no doubt it will come more naturally in time.

 

Looking through the discussion, I think we need @lori_m to have an insider beta licence to bring his Python experience back to the fold of Excel.  Your frequent contributions have been missed both here and on Chandoo.  I have memories of a number of 'shortest formula' competitions with a quite remarkable degree of innovation.

 

I worked through the previous @tboulden workbook with the reducing array thunks.  Can a thunk refer to a previously calculated array, just waiting to be called, or should it be thought of as a recipe for calculating the array should it ever be required?

 

One further thought.  It looks like we are not going to get native support for arrays of arrays or arrays of ranges within the foreseeable future (no doubt for good reason, including the potential loss of backward compatibility), but I wondered whether a single formula could be co-opted to the cause.  MAKEARRAY and derived Lambdas such as VSTACK are already capable of packing arrays into an array 'container' with padded spaces.

 

At the moment a '0' parameter is illegal within MAKEARRAY, but it would be reasonable to interpret it in much the same manner as a '0' in the INDEX function (i.e. 'I want it all').  If a number of row arrays were to be packed into a 2D array, the '0' would be interpreted as 'make the width just as long as it needs to be to accommodate the longest row'.

 

Any thoughts?

@Peter Bartholomew 
Thanks for hosting these lively discussions. I have learned a number of useful general techniques from the examples and suggestions in these posts. These threads also provide a useful way to feedback to the community now uservoice is not available. And yes, it would be useful to have access to lambda functionality for testing, with any luck it won't be too much longer to wait until they go current!

 

@tboulden
Nice solutions, It will take some time to fully grasp these techniques! Interesting exercise to generate subarrays, one option:

r,c,n=2,3,5
A = np.arange(n*n).reshape(n,n)+1
p = np.mgrid[0:n-r+1,0:n-c+1].reshape(2,-1)
list(map(lambda i,j:A[i:i+r,j:j+c],p[0],p[1]))

I believe a more elegant solution exists using 3D array manipulations without lambda/map but it eludes me at present.

 

In Excel I wonder if unions of ranges could be constructed of the form ref1,ref2,...,refn via REDUCE so that individual areas could be selected with INDEX. OFFSET might also work though it is volatile,

=SUM(SUBTOTAL(9,OFFSET(A,{0,1,1},{0,1,2},2,3)))

As Peter says arrays of ranges are unsupported at present but out of interest, can MAP return an array of references using INDEX that can be manipulated this way?

@Peter BartholomewRe: Can a thunk refer to a previously calculated array, just waiting to be called, or should it be thought of as a recipe for calculating the array should it ever be required?

 

I think I recall hearing Andy/Simon/someone along the way mention lazy evaluation?? And I think I recall that meaning that it stores the LAMBDA, but only evaluates it when needed. It's also possible that my brain made it up sometime during my research on lambda calculus implementations.

@Peter BartholomewI went down a rabbit hole trying to play with MAKEARRAY since you mentioned it can't have a 0 value. I learned that you can omit both of the first two parameters, and that led me even further trying to see what possibly could go in for those parameters, even providing SEQUENCE formulae, etc. I was hoping to reverse-engineer MAKEARRAY based off the similarity to MAP of 2 parameters, but what I developed with MAP had sensical behavior, but I couldn't quite figure out what was going on with MAKEARRAY; exhibited on the MAKEARRAY tab of attached. I'm not sure I follow how to make it work with the 0 parameter. If the thunks aren't your cup of tea, check out the array_storage tab where I've put together a storage scheme to get dimensions of arrays and flatten them, and can revive the array from the storage array.

 

@lori_m  I was able to use the union (,) operator, but could only get it working with thunks, see the union tab. I've not messed with union/intersection much, so not fully following your breadcrumbs, and my google searches didn't help much. Any references you can point me to, or suggest a specific INDEX construction? I wasn't sure what to do with OFFSET either.

@tboulden 

My thought was that one might be able to select specific sub areas with

=INDEX((B6:D7,C7:E8,D7:F8),,,i)

and then apply MAP with i={1,2,3} maybe within a name? Analogously one can define a name 'subarrays' as

=EVALUATE({"B6:D7","C7:E8","D7:F8"})

and then use SUBTOTAL(9,subarrays) as in the previous OFFSET formula.

 

In Python instead of applying map to 2D arrays one can use array broadcasting in 3D, e.g.

arange(3)+arange(2)[:,None]+arange(3)[:,None,None]

returns:

[[[0, 1, 2],
[1, 2, 3]],

[[1, 2, 3],
[2, 3, 4]],

[[2, 3, 4],
[3, 4, 5]]]