Forum Discussion
Your thoughts serializing arrays to manage state in excel (by way of inventory tracking)
I really need to see the formula in the context of its workbook. You are doing a number of things that I wouldn't do but that does not mean they are wrong; they are worth exploring. I think you are calculating running totals for a number of products. The calculations I have experience of are simpler in that only one product stream is considered at a time but more complex in that it matches outputs to inputs on a FIFO basis.
I normally move outside the table to perform such calculation, using arrays and spill ranges rather than using the inherent replication of table formula. There seems to be some benefit of your approach in that persistent storage is used to hold the prior step of an cumulative calculation. The downside would appear to be that the serialised data representing the evolving state would seem to be cumbersome, so if appears there are swings and roundabouts to be evaluated.
I would suggest you use your serialised data to hold the current state but to record the minimum of accumulated data as possible. Since I would be using an array calculation, I would have access to every step returned by the calculation but that means that the entire calculation is processed each time data is added. I am sorry that my ability to visualise and evaluate your proposal has proved to be somewhat limited.
Hi Peter,
Thanks for replying. I've attached a primitive example for you - at this point I haven't actually implemented it in a meaningful data set (though I have demonstrated it works across thousands of lines of data).
The motivation is overcoming the limitations of excel's requirement for every cell to return a value, i.e. you can't return a thunk or any other fancy container/wrapper to a cell and then reference that cell for its value. This would, theoretically, allow you to persist the intermediate calculations in involved formulas without having to solve the array of array problems (or using the name manager to name intermediate calculation results).
Obviously some refinement is needed (such as using better row/column delimiters) and if taken seriously, data compression techniques could be automatically implied (removing all but the most essential data and encoding that data to the extent reasonable). If done correctly, with about 30k characters available in a string, you can save quite a lot of information in one of excel's most efficient data structures (strings are saved as entire discrete values and given a single lookup key).
A bigger application of the idea, for instance, would be assigning individual lot numbers to every inflow of a security and being able to update that lots quantity and remaining cost basis according to whatever formula the cell using the array wants, i.e. this time specific identification, that time FIFO, another time LIFO, another time cost averaging, etc. Because the entire lot state is being persisted, a single outflow transaction of securities can relatively simply impact all affected security lots - whether that be one or a thousand. For instance, if you are using an average method over 15 lots of IBM shares, one formula could just search for lots of IBM in the prior state and adjust their values to their new remaining shares/value on a pro rata basis. Because the state was persisted, the entire updating of lots is basically a simple filter of one table that consists exclusively of the updated information for each remaining lot.
Even if this type of persistence were limited to something like once every thousand rows with the inbetween steps requiring full calculation/accumulation of all steps subsequent to the last persisted state, surely figuring out the state after 1,000 transactions is better than figuring it out after 10,000.
- PeterBartholomew1Sep 24, 2024Silver Contributor
I haven't been of much value to you, but you have given me something to think about. Essentially what I do is record transactions (in-flows and out-flow) and perform the entire calculation from time zero each time a further transaction occurs. The array formulae are fast and this causes no difficulty at first. The point must come, though, where piling through decades of old data cannot be justified.
That would be where your idea of having persistent data to provide a complete description of the current state would be important. This would, in effect, provide a record of a stock-taking exercise and the focus would be the balances rather than the flows.
Since the MAPĪ» helper function I have used here is something I only completed recently, I have been presumptuous enough to use your problem to further test the functionality by applying it to a set of FIFO calculations corresponding to the individual Lots. The first image shows the result presented as an array of normalised tables
whilst the second has a grouping option selected (cross-tabs are also possible)
An exercise for the future is to create a persistently stored checkpoint that will support ongoing calculation.
- joelb95Sep 25, 2024Brass Contributor
You've been of help to me on lots of occasions, including this one. If the solution strikes you as meh, it is probably me who is missing the bigger picture. Generally speaking my solutions are solving excel problems that amount to less than a minute of compute time (if that) at the cost of dozens of hours of thinking about computer science and how I can shoehorn it into excel. What really gets me is that despite my effort to act as if I am capable of using excel labs to achieve efficiencies much closer to assembly language techniques, I am always forced to confront that Excel is a fully abstracted language whose internal calculations are non-transparent and whose own internal calculation engine is doing with the data/formulas what it likes (and I'm sure it makes better choices than I do).
The problem I solved here is one that vexes the both of us with no good solution. How can we use thunks (or any other lazy evaluation type solution to the array of array problems) as return values and not just keep them wrapped up in the ether and scope bound. Serializing the string is a real possibility, I think, as inelegant as it may be. Given my preference for structured data, this is a legitimate way to have something tantamount to tail recursion happening within a table capable of auto expanding/contracting/updating that is subject to re-ordering and filtering on a whim. It is really the re-sorting of tables of thousands of lines and more than two dozen columns that gives me anxiety.
Also, consider that this method allows you to store an array of potentially thousands of cells in a single cell in a fully resolved form that can instantly be reconverted into a full blown array capable of all of your array functions just by using a simple deserialization formula. I just don't think the excel modules (in excel labs) or the standard excel features were really designed for handling the sorts of arrays I want it to consider for thousands of cells.- PeterBartholomew1Sep 25, 2024Silver Contributor
Coming from an engineering background I am more used to the idea of formulating the numerical analysis problem and then solving it as efficiently as one can within a single computation. The process model by which transactions are processed one at a time as they arise is less familiar territory for me.
Excel, as a functional programming environment will prevent you from changing the state iteratively as new transactions are processed. The use of a table which expands along with the input data would allow the new state to recorded (possibly within hidden fields) and you have demonstrated that it is workable. I do have concerns that the demand on memory and processing (encoding/decoding) might be heavy if an entire copy of the data structure is held at every step.
On the other hand, repeating a complete recalculation from time zero on each occasion a further transaction is appended will also become unworkable as the data grows in volume. It might well be that some compromise solution is called for in which blocks of transactions are processed before the resulting state is stored in order to provide a checkpoint/restart.
As far as thunks are concerned, I see them as simple in-memory references to previously calculated results. Because of their ephemeral nature, I tend to perform quite large blocks of calculation and write back to the grid only the minimum state data required to make sense of the next timestep (opening balances typically, from which the complete model may be built for that timestep).
Although a table would persist calculated data far better than LET variables, that too is limited. Being formulae, the data only persists until the workbook is closed. You have to be able to afford a complete recalculation on workbook open.
Overall, I think the idea of capturing array data as text within one or more cells of a table has its place. I would aim to store the minimum of data required to restart the calculation and possibly only store it at intervals to provide checkpoint restarts if calculation from step to step turns out to be less resource intensive.
I am aware that I have not provided in-depth support for your idea and have even gone off at a tangent, exploring other ideas. I would have liked to be in a position to offer an expert assessment but in reality my comments are somewhat speculative and there are many points at which I would defer to your opinion.
- PeterBartholomew1Sep 23, 2024Silver Contributor
Your serialization appears to be well implemented and works. The methods we are developing are far removed from one another and the manner in which they scale is likely to be different. I tend to use thunks and perform entire calculations as one. I haven't given much thought to the way in which I could separate out 'historical' data to leave a smaller dynamic formula to bring the calculation up to date. Perhaps I should!
The idea of serialising the data appears to be considerably more complicated than assigning a field to stats from each lot but it should be more dynamic when new lots are introduced.
I have included the type of calculations I would turn to if faced with a problem such as that you describe, but it is so different that I doubt it will be of much value to you.