Forum Discussion
Ways of performing Accumulation with Dynamic Arrays
I would welcome your review of the following ideas, to identify errors, spot improvements or advise if I am following ideas that are already standard fare for others. My objective is to develop a series of workbooks that are comprised entirely of dynamic arrays; no more 'and fill down'. The problems I identified when DA first appeared were the problem of using dynamic arrays for accumulations without allowing circular references and problems handling 'arrays of arrays' (sum each column for example). Recursion and lambda functions appear to offer a way forward, but I do not find it intuitive.
"The purpose of the proposed approach is to accumulate a sequence of values as a dynamic array. Examples include the flow variables in a running balance or series returned from a difference equation. The key feature of the calculation is that each term depends upon the preceding value of the same array; this makes Excel report the calculation as a circular reference.
The calculation developed here uses recursion to work around this problem by exploiting the fact that the recursion introduces a new set of operands at every level of the stack. In order to avoid replicating the flow variables at every level of the recursion stack, they are passed as a lambda function rather than an array. A similar consideration also applies to the results. Rather than accumulating the results as they are calculated, assembly is left to the tail end of the calculation as control is returned back though the stack.
Unfortunately the obvious linear sequence of calls from term to term can easily fail because of the limit on the size of the operand stack. In order to circumvent this problem, I have traversed the data as a binary tree, as shown in the figure. Although twice as many nodes are visited in such a scheme, the maximum levels of recursion for N values is log₂(N).
= LAMBDA(λfunction,startValue,r,p,n,
IF(n>1,
LET(
ℓ, n - QUOTIENT(n,2),
b₁, XACCλ(λfunction,startValue,r,p,ℓ),
b₂, XACCλ(λfunction,MAX(b₁),r,p+ℓ,n-ℓ),
balance, COMBINEλ(b₁, b₂),
balance),
λfunction(p)+(1+r)*startValue)
)
I have tested the Lambda function for up to 16,000 terms (pretty much the width of an Excel sheet) and it has worked, but is slow. The approach took 15sec, more than 1000 times slower than Charles Williams's FastExcel paid add-in! That said, for more reasonably sized timelines the calculation worked well.
Note: Other approaches to the accumulation of flow variables (sometimes known as corkscrews from the way a hand-calculation may be laid out) include the use of MMULT and SUMIFS. In each case, the idea is to calculate the balances directly from the flow variables without any recursive calls to previously calculated values. In the case of MMULT, the calculation requires an upper triangular matrix of 1's to be assembled. Although there is no formal limit of array size in Excel, 16000 x 16000 struck me as uncomfortably large."
This has been a great thread! Thanks to all the contributors. We just announced a new wave of lambda functions that I think will greatly help these scenarios. In particular, REDUCE, SCAN, BYROW and BYCOL.
39 Replies
- JoeMcDaid
Microsoft
This has been a great thread! Thanks to all the contributors. We just announced a new wave of lambda functions that I think will greatly help these scenarios. In particular, REDUCE, SCAN, BYROW and BYCOL.
- lori_mIron Contributor
- tbouldenIron Contributor
Here's my updated version using the new helper functions, plus a few convenient LAMBDAs to facilitate some tasks.
- PeterBartholomew1Silver ContributorThanks. I have saved the file in a new folder ready for when the functions come live!
- PeterBartholomew1Silver Contributor
I haven't found as much time as I had hoped for to work with the Lambda editor. How about everyone else? I find the editor works well and both error checking and the context specific prompts also work well. That said, I find there is a curious disconnect between the add-in's side panel and the worksheet formulas one is trying to construct. I have yet to develop a work practice that allows me to progressively develop the formula, checking each step of the calculation as it is created. When I am in a VBA environment, the ability to step through the calculation using F8 is a lifesaver -- which probably demonstrates my shortcomings as a coder.
I do wonder whether the environment for creating Lambda functions on the worksheet could also be improved. After all, creating formula-local names using LET is now second nature. Would it be possible to allow workbook-scoped names to be defined in a similar manner?
Does anyone have insight as to the reason that the performance of the Lambda functions can be underwhelming? It could simply be the lack of 35 years of optimisation but I also wonder whether the memory management required to support recursion is the source of problems. Certainly Charles William's ACCUMULATE function was lightyears ahead in performance and, since converting flows to balances is so central to modelling, I would like to see it as an Excel built-in function.
Talking of FastExcel, Charles Williams has given me permission to upload a newly developed runtime version of the addin, if anyone is interested in trying it. It appears that the addin and the Excel workbook should be packaged as a zipfile and used within the same folder,
- tbouldenIron ContributorTrue to my word, I've gone through with the unary reworking, and I think the only reason I have the patience for it is because of the editor. I think there are some issues and bugs, but I've not made a concerted effort to catalog them methodically. And I've had my Windows crash this week, seemingly beyond repair, so I'm out of commission for a few more days.
I agree that it would be very nice to be able to step through much like in VBA; even having the Evaluate formula step-through would be a welcome improvement.
As for performance, I would think it is probably due to the feature being strapped on rather than properly integrated. I also wonder if the Calc.ts work paves the way for this optimization, but may rely on being Javascript/web-based, and its not super compatible with legacy Excel recalc paradigm? Uneducated musings on my part.
And I'd love to try out FastExcel once I get a working machine again! How do I sign up?- PeterBartholomew1Silver Contributor
Putting the ingredients together to create a slightly different dish!
The Lambda editor and Calc.ts should be a pretty good fit. The editor checks syntax but if that is OK why not a form to accept a set of test parameters and display results on a small grid?
As for FastExcel it is possible to download a time limited test copy. What I was suggesting was slightly different. I would use my license to download a runtime version of the add-in and use zip to package it along with the Excel file for distribution. Apparently the idea is that the add-in works on the target machine, provided it is held in the same folder and the macro is run to load it on open.
- tbouldenIron Contributor
PeterBartholomew1re: "Of all the MAPS/FOLDS/SCANS that might be technically possible, I wonder which form the 'core irreducible set' that must be present to meet the major demands of user problems. Whichever make the transition to built-in functions will need careful presentation if it is to fit into standard practice." -- Based on my experimentation, and the varying implementations I've tried, I suspect we'll end up with element-wise, and row-wise/column-wise versions, and we'll need a proper conception of an empty array to work with. I think much of my trouble with my previous fold examples was trying to work around not having an empty array primitive as a "natural zero".
Taking myself at my word and reworking my fold/reduce/scans for a 3rd time, and after lori_m's COLMAP example, I've done implementations of vMap and hMap doing row-wise and column-wise mapping respectively. I may have been trying to squeeze too much out of fold and many of my failing examples have yielded to an application of one of these maps. I'm pretty happy that I got bubble-sort figured out finally though.
If I rework things again, it may be to take what I think may be the more functional approach and make everything unary so that I can get better acquainted with currying/partial application. I've gotten some inspiration recently from whoever is posting https://rosettacode.org/wiki/Category:Excel and https://rosettacode.org/wiki/Fibonacci_sequence#LAMBDA! Could be fun to pick through some of the things https://rosettacode.org/wiki/Reports:Tasks_not_implemented_in_Excel and give them a swing.
As for the LAMBDA editor, thus far I'm really enjoying it! Whereas previously I tried not to clutter things up too much, I've not minded the explosion of functions that do "one" thing that really are part of a swiss-army knife approach that makes more intractable problems easier.
And as SergeiBaklan mentions, usability will be the ultimate arbiter; I'm having fun with it now, but nothing in my work life relies on it, so we'll see what the team can make for us. I know that Peter's tree traverse has essentially removed the stack depth issues, but it does slow thing down considerably when getting into larger tree structures. Nothing to do but play around until the next thing is brought down from the mountain-top.
- lori_mIron Contributor
Thanks for the links and sample workbook - all sounds like great progress given what is available currently.
I now also have access to Lambda Editor but, as I'm restricted to current channel on corporate licence, it's not much use without LAMBDA! I presume one can make names containing lambdas appear as functions in tooltips and also create function categories for them by opening name manager from a macro sheet (ctrl+f11) or by using the VBE Names properties.
And interesting what you say about empty arrays being a stumbling block. An alternative might be to use an array containing a single blank value which can be returned as an intermediate result eg if blank=SORT(,) then ISBLANK(blank) returns {TRUE}. The INDEX function can preserve blank values like these in results but many other functions can't which makes them tricky to manipulate however.
- SergeiBaklanDiamond Contributor
I believe we shall have something more civilized to check missing/optional parameters. Excel team has lot of feedback from pre-lambda stage, hope many of wishes will be implemented in v.1.0 (or 2.0 if exclude production from versioning).
- lori_mIron Contributor
Thanks for sharing, I will be able to test when LAMBDA goes live - whenever that will be... I'm not familiar with the details but this appears to be similar to the pipelined binary tree section of Prefix sum - Wikipedia - all the more impressive that you have come up with this type of implementation independently.
As discussed elsewhere, I hope we get capability for running totals built in soon - functions like map, reduce, fold and scan have been mentioned in a recent MS research podcast.
- PeterBartholomew1Silver Contributor
Thanks for the Wikipedia reference, there does appear to be several point of connection. Even the statement "Prefix summation or partial summation form linear operators on the vector spaces of finite or infinite sequences; their inverses are finite difference operators" reminded me of discussions with Charles Williams where I was keen that the ACCUMULATE and DIFF functions would be capable of acting as inverses of one another, just as the matrix forms do.
Mind you, I also wanted the multiplicative factor associated with growth to be a matrix but that was a step too far in terms of usability. I thought it might be fun to see a factor of
give rise to the Fibonacci series.
I gather both you and SergeiBaklan are holding off until something more mature and stable appears in the area of Excel's Lambda functions. I have got some catching up to do. Somehow, I think maths degrees from over half a century ago and Fortran IV programming experience may not have placed me in an ideal position for getting to grips with arcane features of the world of functional programming.
With a bit of luck the higher order functions of map/scan/reduce etc. will make life easier though!
- lori_mIron Contributor
It took me some time to understand the points you raise here. I think I get it now, translating into functional programming terminology one might want as an accumulator,
LAMBDA(x,y,x+z*y)
with z = 1.1, say, for compound interest. One could also extend to array inputs using,
LAMBDA(x,y,x+MMULT(z,y))
Setting z = {1,1;1,0} together with an input list consisting of pairs {1;0}, {0;0}, {0;0}, {0;0}, ...
would give rise to a sequence of Fibonacci pairs {1;1}, {2;1}, {3;2}, {5;3}, ...
Since arrays cannot be nested they would need to be passed as individual arguments though, one possible approach for that was mentioned here.
- tbouldenIron Contributor
lori_mI tried to ask about map/fold/etc. timing as part of the YouTube livestream I linked to up-thread, however the moderator didn't include part 2 of my question, only the LAMBDA Playground/Editor was addressed. Perhaps if Andy replies to me about LAMBDA playground I can ask about the timing of those as well.
- tbouldenIron Contributor
Very nice, PeterBartholomew1! I went down a rabbit hole trying to use thunks and trampolining to get over the stack depth issue with recursion, but with little success. I think I may be misunderstanding something essential, however the examples all used a WHILE loop to prevent running out of stack and trying to replicate that with LAMBDA is self-defeating. This is an interesting approach with traversing a binary tree, will look at it in more depth.
On a related note: https://www.youtube.com/watch?v=7tFq-9Zvk3M Check 54:19 for Andy Gordon's fold implementation. Also, he mentions LAMBDA editor should be getting released to a small group early, and he invites folks to reach out if interested.
- PeterBartholomew1Silver Contributor
Sorry, I appear to have missed some of your posts.
LAMBDA: Recursive Lists and Understanding Recursion Limits of LAMBDA - Microsoft Tech Community
You have been busy! I still find the functional calculus notation pretty opaque. Even where I think I understand the concept, expressing it in the lambda notation and manipulating the formula does not come naturally.
The idea of the binary tree evolved during the work. At first my idea was simply to accumulate over single years and then nest the annual result to apply the formula to multiple years. I then decided that it is not necessary to identify such natural grouping within the dataset and instead to simply bisect the it to the required depth. Only then did I realise I do not need to actually split the array but, rather, I could merely pass a pointer and array length to the internal Lambda function. The result was the binary tree but that wasn't in mind at the outset!
Thanks for the link to Andy Gordon's talk. I had not seen some of the later material before and noted his invitation to make contact, but no contact details were provided. Do you have an email address for Andy to request access to the Lambda editor? I sent an email to the seminar organisers to forward so we will see what comes of that.
- tbouldenIron ContributorIn addition Sergei's LinkedIn link, I used the microsoft.com address found here: https://www.microsoft.com/en-us/research/people/adg/.
- PeterBartholomew1Silver Contributor
You may be pleased to know that your FastExcel functions absolutely thrashed recursion when it came to 16,000 time periods; but then it should. I like VSTACK and HSTACK but, If I had to pick one function from your add-in it would be ACCUMULATE.