SOLVED

Ways of performing Accumulation with Dynamic Arrays

Silver Contributor

I would welcome your review of the following ideas, to identify errors, spot improvements or advise if I am following ideas that are already standard fare for others.  My objective is to develop a series of workbooks that are comprised entirely of dynamic arrays; no more 'and fill down'.  The problems I identified when DA first appeared were the problem of using dynamic arrays for accumulations without allowing circular references and problems handling 'arrays of arrays' (sum each column for example).  Recursion and lambda functions appear to offer a way forward, but I do not find it intuitive. 

 

image.png

 

"The purpose of the proposed approach is to accumulate a sequence of values as a dynamic array. Examples include the flow variables in a running balance or series returned from a difference equation. The key feature of the calculation is that each term depends upon the preceding value of the same array; this makes Excel report the calculation as a circular reference.

 

The calculation developed here uses recursion to work around this problem by exploiting the fact that the recursion introduces a new set of operands at every level of the stack. In order to avoid replicating the flow variables at every level of the recursion stack, they are passed as a lambda function rather than an array. A similar consideration also applies to the results. Rather than accumulating the results as they are calculated, assembly is left to the tail end of the calculation as control is returned back though the stack.

image.png

 

Unfortunately the obvious linear sequence of calls from term to term can easily fail because of the limit on the size of the operand stack.  In order to circumvent this problem, I have traversed the data as a binary tree, as shown in the figure. Although twice as many nodes are visited in such a scheme, the maximum levels of recursion for N values is log₂(N).

 

= LAMBDA(λfunction,startValue,r,p,n,
      IF(n>1,
         LET(
            ℓ, n - QUOTIENT(n,2),
            b₁, XACCλ(λfunction,startValue,r,p,ℓ),
            b₂, XACCλ(λfunction,MAX(b₁),r,p+ℓ,n-ℓ),
            balance, COMBINEλ(b₁, b₂),
           balance),
         λfunction(p)+(1+r)*startValue)
   )

 

I have tested the Lambda function for up to 16,000 terms (pretty much the width of an Excel sheet) and it has worked, but is slow. The approach took 15sec, more than 1000 times slower than Charles Williams's FastExcel paid add-in!  That said, for more reasonably sized timelines the calculation worked well.

 

Note: Other approaches to the accumulation of flow variables (sometimes known as corkscrews from the way a hand-calculation may be laid out) include the use of MMULT and SUMIFS. In each case, the idea is to calculate the balances directly from the flow variables without any recursive calls to previously calculated values. In the case of MMULT, the calculation requires an upper triangular matrix of 1's to be assembled. Although there is no formal limit of array size in Excel, 16000 x 16000 struck me as uncomfortably large."

39 Replies

@Charles Williams 

You may be pleased to know that your FastExcel functions absolutely thrashed recursion when it came to 16,000 time periods; but then it should.  I like VSTACK and HSTACK but, If I had to pick one function from your add-in it would be ACCUMULATE.

Very nice, @Peter Bartholomew! I went down a rabbit hole trying to use thunks and trampolining to get over the stack depth issue with recursion, but with little success. I think I may be misunderstanding something essential, however the examples all used a WHILE loop to prevent running out of stack and trying to replicate that with LAMBDA is self-defeating. This is an interesting approach with traversing a binary tree, will look at it in more depth.

 

On a related note: https://www.youtube.com/watch?v=7tFq-9Zvk3M Check 54:19 for Andy Gordon's fold implementation. Also, he mentions LAMBDA editor should be getting released to a small group early, and he invites folks to reach out if interested.

Details: https://the-au-forml-lab.github.io/colloquium_talks/Gordon.html

@tboulden 

Sorry, I appear to have missed some of your posts.

LAMBDA: Recursive Lists and Understanding Recursion Limits of LAMBDA - Microsoft Tech Community

You have been busy!  I still find the functional calculus notation pretty opaque.  Even where I think I understand the concept, expressing it in the lambda notation and manipulating the formula does not come naturally.

 

The idea of the binary tree evolved during the work.  At first my idea was simply to accumulate over single years and then nest the annual result to apply the formula to multiple years.  I then decided that it is not necessary to identify such natural grouping within the dataset and instead to simply bisect the it to the required depth.  Only then did I realise I do not need to actually split the array but, rather, I could merely pass a pointer and array length to the internal Lambda function.  The result was the binary tree but that wasn't in mind at the outset!

 

Thanks for the link to Andy Gordon's talk.  I had not seen some of the later material before and noted his invitation to make contact, but no contact details were provided.  Do you have an email address for Andy to request access to the Lambda editor?  I sent an email to the seminar organisers to forward so we will see what comes of that.

In addition Sergei's LinkedIn link, I used the microsoft.com address found here: https://www.microsoft.com/en-us/research/people/adg/.

@Peter Bartholomew 

Thanks for sharing, I will be able to test when LAMBDA goes live - whenever that will be...  I'm not familiar with the details but this appears to be similar to the pipelined binary tree section of Prefix sum - Wikipedia - all the more impressive that you have come up with this type of implementation independently.

 

As discussed elsewhere, I hope we get capability for running totals built in soon - functions like map, reduce, fold and scan have been mentioned in a recent MS research podcast.

@lori_mI tried to ask about map/fold/etc. timing as part of the YouTube livestream I linked to up-thread, however the moderator didn't include part 2 of my question, only the LAMBDA Playground/Editor was addressed. Perhaps if Andy replies to me about LAMBDA playground I can ask about the timing of those as well.

@tboulden @Sergei Baklan 

Thanks for the leads. I will give it a bit before following up to avoid the risk of duplication.

@lori_m 

Thanks for the Wikipedia reference, there does appear to be several point of connection.  Even the statement "Prefix summation or partial summation form linear operators on the vector spaces of finite or infinite sequences; their inverses are finite difference operators" reminded me of discussions with Charles Williams where I was keen that the ACCUMULATE and DIFF functions would be capable of acting as inverses of one another, just as the matrix forms do.

image.png 

Mind you, I also wanted the multiplicative factor associated with growth to be a matrix but that was a step too far in terms of usability.  I thought it might be fun to see a factor of

image.png

give rise to the Fibonacci series.

 

I gather both you and @Sergei Baklan are holding off until something more mature and stable appears in the area of Excel's Lambda functions.  I have got some catching up to do.  Somehow, I think maths degrees from over half a century ago and Fortran IV programming experience may not have placed me in an ideal position for getting to grips with arcane features of the world of functional programming.

 

With a bit of luck the higher order functions of map/scan/reduce etc. will make life easier though!

@Peter Bartholomew 

It took me some time to understand the points you raise here. I think I get it now, translating into functional programming terminology one might want as an accumulator,

 

LAMBDA(x,y,x+z*y)

 

with z = 1.1, say, for compound interest. One could also extend to array inputs using,

 

LAMBDA(x,y,x+MMULT(z,y))

 

Setting z = {1,1;1,0} together with an input list consisting of pairs {1;0}, {0;0}, {0;0}, {0;0}, ...

would give rise to a sequence of Fibonacci pairs {1;1}, {2;1}, {3;2}, {5;3}, ...

Since arrays cannot be nested they would need to be passed as individual arguments though, one possible approach for that was mentioned here.

I realise since the arrays are only one dimensional they could be stacked which I guess was the intention for the ACCUMULATE function. So one could write a function that accumulates arrays in one dimension or other, e.g.

=VSCAN({0,1;0,0;0,0;0,0},LAMBDA(x,y,x+MMULT(y,{0,1;1,1})))

would return {1,1;1,2;2,3;3,5}

@lori_m 

Yes, though I apologise for causing you to devote time chasing that particular rabbit down its burrow!  It was just a thought that the second-order difference equation could be reduced to first-order matrix form by introducing an additional variable.

@Peter Bartholomew@lori_m 

 

I've been a little busy playing with melding Andy Gordon's fold implementation and Peter's tree traversal (_tt), and have put together versions of reduce/fold/scan (left and right) using both methodologies; well, almost, I've gone cross-eyed on doing the tree traversal for scan, so tabling it for now. Please see attached! I've even got reduce/fold on columns and rows!

 

Based on my studying, we basically only need one of the three to get the other two; I've done this with reduce_tt using fold_tt. I think we should be able to make scan from reduce and Peter's COMBINEλ, however its going slightly wrong. I'm sure its an oversight on my part somewhere, but I thought I might let others play with these while I uncross my eyes.

 

Sleep helped sort out the scan=reduceLeft & COMBINEλ problem; needed to update reduceLeft to have a check for AND(a=FALSE,TYPE(a)<>64) like in scanLeft and that worked! Maybe I'll get scan_tt today!

@tboulden 

I have downloaded your file and plan to work through it.  Some education regarding the higher-order functions will not go amiss! 

 

As things stand though, it is my belief that Excel still needs a high performance ACCUMULATE function as a built-in function in order to achieve the goal of fully-dynamic modelling with Excel.  Corkscrews represent a very common construct in financial modelling and suggesting the solution lies in recursive programming is not a way forward.

@Peter Bartholomew 

Great news! I got the LAMBDA Editor add-in as a beta-tester on Friday and have been re-working my original functions using the editor; I hope you were able to get on the list for it as well! Note that if you haven't, it works on same principle to your VBA solution, but with a proper editor and a few diffs. It loads its session data to a hidden sheet and you can export/import to JSON file. Not sure if this is best way for sharing, but will have to see.

 

I've also implemented @lori_m Fibonacci matrix calc example using scan from above (see "Examples" tab). I'm starting to see how powerful fold can be, I've implemented my reduce/scan functions leveraging fold as the workhorse. And you can do lots of other tasks just by providing the right function to fold/reduce/scan. Will update later with more of these extended examples.

@tboulden 

I got as far as the point of  making registry changes but was struggling with the installation process so I will return to the task when I get some more time.  What you have achieved is pretty amazing; have you come across these higher order functions before in other coding contexts?  Something that completely threw me was the output from Paste Names

image.png

I was at a loss to follow the syntax!

Fortunately the JKP name manager worked a bit better (tried that today)

image.png

The comment block appears to have been highjacked for some technical purpose!  Nothing appears as a prompt when writing formulas containing Lambda functions.

@tboulden 

(... continued)

As an aside, I added the Greek λs simply as a flag to assist me in reading the formulas.  I also note that you appear to have tidied up the MAKEARRAY function and I recognise the code fragments showing where you have incorporated the binary tree.

Looking at the amortisation problem, I note that you have calculated the repayment using the PMT function at each step of the calculation.  I think this opens up the possibility of taking interest rate variability into account with the repayment being adjusted based on the remaining repayment period and the outstanding balance.

Overall, I have some serious work to do to try to keep up with you!

@Peter BartholomewSorry to hear about the difficulties with the LAMBDA editor install! To answer your question, I've not had any "functional" experience prior to this; I put it in scare quotes because I think I'm using functional ingredients in an imperative style. I get stuck on something and step away, then come back excited again when someone shows me something that I didn't think of! Then I learn until my tools can't fix what I've built and step away again! Case in point: I promised extended examples using fold, and they are attached, however they've shown me there is a bug in my approach and I can't resolve it. I'm inclined to leave init as blank, but my logic doesn't handle it well and it bloats my result; I think I've seen having a "natural zero" for the initial value, but I think some of the helper functions for my folds may not have one. I'm about ready to start from scratch again since I think my commitment to my recent progress is hindering my continued progress.

 

1 best response

Accepted Solutions
best response confirmed by Peter Bartholomew (Silver Contributor)
Solution

@Peter Bartholomew 

 

This has been a great thread! Thanks to all the contributors. We just announced a new wave of lambda functions that I think will greatly help these scenarios. In particular, REDUCE, SCAN, BYROW and BYCOL.

 

View solution in original post