SOLVED

Ways of performing Accumulation with Dynamic Arrays

Silver Contributor

I would welcome your review of the following ideas, to identify errors, spot improvements or advise if I am following ideas that are already standard fare for others.  My objective is to develop a series of workbooks that are comprised entirely of dynamic arrays; no more 'and fill down'.  The problems I identified when DA first appeared were the problem of using dynamic arrays for accumulations without allowing circular references and problems handling 'arrays of arrays' (sum each column for example).  Recursion and lambda functions appear to offer a way forward, but I do not find it intuitive. 

 

image.png

 

"The purpose of the proposed approach is to accumulate a sequence of values as a dynamic array. Examples include the flow variables in a running balance or series returned from a difference equation. The key feature of the calculation is that each term depends upon the preceding value of the same array; this makes Excel report the calculation as a circular reference.

 

The calculation developed here uses recursion to work around this problem by exploiting the fact that the recursion introduces a new set of operands at every level of the stack. In order to avoid replicating the flow variables at every level of the recursion stack, they are passed as a lambda function rather than an array. A similar consideration also applies to the results. Rather than accumulating the results as they are calculated, assembly is left to the tail end of the calculation as control is returned back though the stack.

image.png

 

Unfortunately the obvious linear sequence of calls from term to term can easily fail because of the limit on the size of the operand stack.  In order to circumvent this problem, I have traversed the data as a binary tree, as shown in the figure. Although twice as many nodes are visited in such a scheme, the maximum levels of recursion for N values is log₂(N).

 

= LAMBDA(λfunction,startValue,r,p,n,
      IF(n>1,
         LET(
            ℓ, n - QUOTIENT(n,2),
            b₁, XACCλ(λfunction,startValue,r,p,ℓ),
            b₂, XACCλ(λfunction,MAX(b₁),r,p+ℓ,n-ℓ),
            balance, COMBINEλ(b₁, b₂),
           balance),
         λfunction(p)+(1+r)*startValue)
   )

 

I have tested the Lambda function for up to 16,000 terms (pretty much the width of an Excel sheet) and it has worked, but is slow. The approach took 15sec, more than 1000 times slower than Charles Williams's FastExcel paid add-in!  That said, for more reasonably sized timelines the calculation worked well.

 

Note: Other approaches to the accumulation of flow variables (sometimes known as corkscrews from the way a hand-calculation may be laid out) include the use of MMULT and SUMIFS. In each case, the idea is to calculate the balances directly from the flow variables without any recursive calls to previously calculated values. In the case of MMULT, the calculation requires an upper triangular matrix of 1's to be assembled. Although there is no formal limit of array size in Excel, 16000 x 16000 struck me as uncomfortably large."

39 Replies

@tboulden 

I love your final thought of

"my commitment to my recent progress is hindering my continued progress"!


I am now up and going with the Lambda editor. It was just a case of not noticing SHARED FOLDER under Get Add-ins. I now need time to run through the set-piece examples to bring it together with the results of my own experimentation.

 

At the moment, I find a strange contrast between basic spreadsheet development and the world of Lambda functions. The first assumes one is too idle to define ones terms and can handle only the most primitive computing concepts but, by way of contrast, has an elephantine memory for 'tips and tricks' that allow one to apply 'go faster stripes' to the solution process.  The second appears so convoluted that I am finding it tricky to thread my Fortran IV trained brain through the twists and turns.

 

Of all the MAPS/FOLDS/SCANS that might be technically possible, I wonder which form the 'core irreducible set' that must be present to meet the major demands of user problems.  Whichever make the transition to built-in functions will need careful presentation if it is to fit into standard practice.

@Peter Bartholomew @tboulden 

MAP and FOLD could be combined to give SCAN though ideally scalable versions of all these functions would be available. For comparison Power Query and Python both have Accumulate available as a standard function capable of processing large lists via buffers / iterators.  Many useful functions could be derived from these three.

 

As a typical example, to sum over columns one might use something like,

=COLMAP( SUM, TABLE )

where SUM or LAMBDA( x, SUM( x ) ) is the aggregation function and

COLMAP:=LAMBDA(f,T,MAP(LAMBDA(i,f(INDEX(T,,i)),SEQUENCE(,COLUMNS(T)))))

@lori_m 

As a comment, Power Query is optimised to work with tables, not with lists. If something could be done by not elegant sequence of table transformations and with elegant list function, the performance will be dramatically decreased in the latest case.

Afraid we could be here in the same situation, elegant lamdas functional programming could kill performance compare to legacy formulas.

As for lambdas editor the question is not how good examples it provides but how suitable this editor will be for average Joe for editing, maintenance, etc. 

@Sergei Baklan 

Thanks for the clarification that list aggregation in PQ is quite inefficient - that's good to be aware of. I do not really know PQ well and using List.Buffer for running totals is likely still not that much of an improvement.


Python - which I am more familiar with - uses iterators for list operations. The itertools accumulate function processes a million rows in a fraction of a second and I'm hoping we could see similar performance in Excel if the propose new functions are well optimised. For reference, the suggested formula above was prototyped in Python using:

 

 

from numpy import array, shape, sum
colmap=lambda f,T:map(lambda i:f(T[:,i]),range(T.shape[0]))

>>>A=array([[1,2],[3,4]])
>>>list(colmap(sum,A))
[4,6]

 

 

@lori_m 

I only would like to say that everything depends on implementation. As for today, and from my point of view, lambdas in Excel is not the best solution from performance point of view if you have another alternatives. Perhaps it will be improved, but it takes months or years, who knows.

@Peter Bartholomewre: "Of all the MAPS/FOLDS/SCANS that might be technically possible, I wonder which form the 'core irreducible set' that must be present to meet the major demands of user problems. Whichever make the transition to built-in functions will need careful presentation if it is to fit into standard practice." -- Based on my experimentation, and the varying implementations I've tried, I suspect we'll end up with element-wise, and row-wise/column-wise versions, and we'll need a proper conception of an empty array to work with. I think much of my trouble with my previous fold examples was trying to work around not having an empty array primitive as a "natural zero".

 

Taking myself at my word and reworking my fold/reduce/scans for a 3rd time, and after @lori_m's COLMAP example, I've done implementations of vMap and hMap doing row-wise and column-wise mapping respectively. I may have been trying to squeeze too much out of fold and many of my failing examples have yielded to an application of one of these maps. I'm pretty happy that I got bubble-sort figured out finally though.

 

If I rework things again, it may be to take what I think may be the more functional approach and make everything unary so that I can get better acquainted with currying/partial application. I've gotten some inspiration recently from whoever is posting LAMBDAs on Rosetta Code  and seem to know what they're doing! Could be fun to pick through some of the things not yet implemented and give them a swing.

 

As for the LAMBDA editor, thus far I'm really enjoying it! Whereas previously I tried not to clutter things up too much, I've not minded the explosion of functions that do "one" thing that really are part of a swiss-army knife approach that makes more intractable problems easier.

 

And as @Sergei Baklan mentions, usability will be the ultimate arbiter; I'm having fun with it now, but nothing in my work life relies on it, so we'll see what the team can make for us. I know that Peter's tree traverse has essentially removed the stack depth issues, but it does slow thing down considerably when getting into larger tree structures. Nothing to do but play around until the next thing is brought down from the mountain-top.

 

 

@tboulden 

Thanks for the links and sample workbook - all sounds like great progress given what is available currently.

 

I now also have access to Lambda Editor but, as I'm restricted to current channel on corporate licence, it's not much use without LAMBDA! I presume one can make names containing lambdas appear as functions in tooltips and also create function categories for them by opening name manager from a macro sheet (ctrl+f11) or by using the VBE Names properties.

 

And interesting what you say about empty arrays being a stumbling block. An alternative might be to use an array containing a single blank value which can be returned as an intermediate result eg if blank=SORT(,) then ISBLANK(blank) returns {TRUE}. The INDEX function can preserve blank values like these in results but many other functions can't which makes them tricky to manipulate however.

@lori_m 

I believe we shall have something more civilized to check missing/optional parameters. Excel team has lot of feedback from pre-lambda stage, hope many of wishes will be implemented in v.1.0 (or 2.0 if exclude production from versioning).

I haven't found as much time as I had hoped for to work with the Lambda editor.  How about everyone else?  I find the editor works well and both error checking and the context specific prompts also work well.  That said, I find there is a curious disconnect between the add-in's side panel and the worksheet formulas one is trying to construct.  I have yet to develop a work practice that allows me to progressively develop the formula, checking each step of the calculation as it is created.  When I am in a VBA environment, the ability to step through the calculation using F8 is a lifesaver -- which probably demonstrates my shortcomings as a coder.

 

I do wonder whether the environment for creating Lambda functions on the worksheet could also be improved.  After all, creating formula-local names using LET is now second nature.  Would it be possible to allow workbook-scoped names to be defined in a similar manner?

 

Does anyone have insight as to the reason that the performance of the Lambda functions can be underwhelming?  It could simply be the lack of 35 years of optimisation but I also wonder whether the memory management required to support recursion is the source of problems.  Certainly Charles William's ACCUMULATE function was lightyears ahead in performance and, since converting flows to balances is so central to modelling, I would like to see it as an Excel built-in function.

 

Talking of FastExcel, Charles Williams has given me permission to upload a newly developed runtime version of the addin, if anyone is interested in trying it.  It appears that the addin and the Excel workbook should be packaged as a zipfile and used within the same folder,

True to my word, I've gone through with the unary reworking, and I think the only reason I have the patience for it is because of the editor. I think there are some issues and bugs, but I've not made a concerted effort to catalog them methodically. And I've had my Windows crash this week, seemingly beyond repair, so I'm out of commission for a few more days.

I agree that it would be very nice to be able to step through much like in VBA; even having the Evaluate formula step-through would be a welcome improvement.

As for performance, I would think it is probably due to the feature being strapped on rather than properly integrated. I also wonder if the Calc.ts work paves the way for this optimization, but may rely on being Javascript/web-based, and its not super compatible with legacy Excel recalc paradigm? Uneducated musings on my part.

And I'd love to try out FastExcel once I get a working machine again! How do I sign up?

@tboulden 

Putting the ingredients together to create a slightly different dish!  

The Lambda editor and Calc.ts should be a pretty good fit.  The editor checks syntax but if that is OK why not a form to accept a set of test parameters and display results on a small grid?

 

As for FastExcel it is possible to download a time limited test copy.  What I was suggesting was slightly different.  I would use my license to download a runtime version of the add-in and use zip to package it along with the Excel file for distribution.  Apparently the idea is that the add-in  works on the target machine, provided it is held in the same folder and the macro is run to load it on open.

 

Ahh, that makes sense then, would love to see ACCUMULATE in action!

I've got a new Windows install running in a VM, and I've still got access to LAMBDA and the LAMBDA editor, so more to come!

@tboulden 

The zip file should contain the original accumulation file posted on this thread but with an 'on open' macro that loads one of the two add-ins.  There is also a Word file containing instructions/explanation and a text copy of the macro. 

 

If you wish to try other functions then let me know and I will add them to a fresh copy of the workbook; assuming you get it going in the first place, that is.

 

@tboulden 

[2006.14706] Will Dynamic Arrays finally change the way Models are built? (arxiv.org)

In the referenced EuSpRIG paper, I drew attention to problems of accumulation and column sums as being the major bottlenecks impeding progress towards fully-dynamic (financial or engineering) models.  Charles Williams solves each with elegance but it is a shame to have to rely upon add-ins to perform tasks that should be native functionality.

@tboulden 


From the lack of any further communication, I assume my experiment with the FastExcel runtime add-in didn't work. Sorry about that; it was my first attempt at using it.

Hi Peter, I apologize for my rudeness, it was successful! I've found my Windows install in a VM lacking however and that discouraged me from doing much; a replacement laptop with Windows finally shipped this week and I promise I'll do better once I have a working machine and can really examine how ACCUMULATE functions.
best response confirmed by Peter Bartholomew (Silver Contributor)
Solution

@Peter Bartholomew 

 

This has been a great thread! Thanks to all the contributors. We just announced a new wave of lambda functions that I think will greatly help these scenarios. In particular, REDUCE, SCAN, BYROW and BYCOL.

 

@Peter Bartholomew @lori_m 

Here's my updated version using the new helper functions, plus a few convenient LAMBDAs to facilitate some tasks.

Thanks. I have saved the file in a new folder ready for when the functions come live!

@JoeMcDaid @tboulden 

 

Thanks for the update, so I guess for Peter's original set up something like this might work,

=SCAN(loan, -repayment#, LAMBDA(bal, pmt, bal * (1 + rate) + pmt))