Forum Discussion
SPILLs of external data - word of warning
The OP merits a follow-up. During the summer of 2024, an idea came to me how to fix the breaking spills problem in VBA. I then faced the question whether to continue with the work-around described in the OP or to rephrase my code. I want to share here why that was even a question. Let's start by recalling the pros and cons of spilling into a fixed range:
Pros:
- Everyone knows the conventional fixed range addressing scheme used to access the spill. Unfamiliar readers may be mystified how the fixed range becomes populated out of a single formula in the TLC, but that incomprehension will not prevent them from following the downstream code.
- It is a no-brainer to replace the spill (of external data) with user data within that fixed span. Just type away...
- Excel provides STRONG auditing for the downstream code. Press F2 and you will immediately see the slices of data (rows of horizontal spills or columns of vertical spills) extracted from the spill. Any referencing error becomes immediately obvious and you can use your mouse to correct it.
- If you place a "bookend" token to the right of the TRC (of a horizontal spill), your code is guaranteed to fail with a #SPILL! error when the spill attempts to break out of the range. In this way, you can reliably protect the conventional references from accidentally only capturing parts of a spill.
Cons:
- Your code will error out when the spill hits the bookend. How bad that is depends on the complexity of your downstream code. Our production models are very complex and it transpired that while the #SPILL! error resulted as designed, it quickly morphed into an amorphous #VALUE! in further downstream processing such that the original #SPILL! becomes very hard to trace - in particular by inexperienced Excel users. We also experienced many more breaks in production than I had anticipated; my receiver ranges had been too short. Interestingly, while increasing the size of those receiver ranges reduced the incidence of these errors, the decreasing incidence also reduced familiarity among users with those errors when periodically they did arise and how to resolve them.
- The downstream code consuming these fixed-size slices of variable length spill data must be programmed to be tolerant of blank inputs. The processing of blank inputs will reduce the computational efficiency of your model.
At which point the question pops up "What's the alternative?" And one possible reason for staying with the work-around above is that you might reject them all...
How to slice a spill?
At the heart of this question is the impossibility in Excel to return a range of Spills. It is simply not possible to split a 6xN spill into 6 spills of 1xN each. It is hence not possible to easily reference the 2nd row of a spill out for D400 as D401#.
Put that question to ChatGBT and it will tell you to use INDEX(s#,r,0) to access row r in spill s. This is true enough, so let's review the pros and cons of this approach:
Pros:
- It is computationally efficient. However much INDEX will seem like a function, it isn't. Excel's precompiler will resolve it before the actual computations start.
- The downstream code will only receive good data (ie does not need to deal with blanks).
- We should never expect any #SPILL! errors.
Cons:
- INDEX is tedious to write and in its bulk distracts the reader of a downstream formula from the true purpose of that formula.
- Humans simply aren't good at counting. The row indexing is ok when you want the 2nd row but when you want the 10th - are you quite sure that row isn't #9 or #11? This may be reasonably acceptable to the programmer but to any reader of your code, this is a nightmare.
- Closely tied into the previous point, Excel's auditing is powerless. All you gonna see is an outline of the spill (which INDEX references) whole.
I did not like this answer one little bit and set about writing a Lambda to ease my existence. I was very pleased with my final version, which read
=WithRows(s#, RowLabels, LAMBDA(r, FILTER(r("Cash"), r("Dates")< ...
All my spill rows are prefixed by user-friendly labels such that RowLabels was simply the column immediately in front of the spill data.
Pros:
- Intuitively readable.
- Practically eliminates any mis-referencing.
- While the coding is "wordy", the gumph is all in the preamble, ie the mechanics of addressing the spill do not distract from the algorithm.
- The downstream code will only receive good data (ie does not need to deal with blanks).
- We should never expect any #SPILL! errors
Cons:
- Not strictly native code. While it is intuitive to me, others may not follow.
- Less efficient than INDEX - the MATCHing of the string input is computationally more intensive than the numeric addressing.
- The coding is "wordy".
- While reference errors are virtually impossible, there is no auditing thru F2 either - like with INDEX, you only see s#.
- Requires a map for each spill.
I discarded the ideas of INDEX and WithRows when I recalled the arcane Excel intersection operator: It is possible to address D6 by writing "=D:D 6:6." D:D addresses the entire column D, 6:6 the entire row 6. The space in between is the "intersection operator" - Excel will only address those cells which are spanned by the range before it and after. In this instance, that is only D6.
In 30 years of programming Excel, I never once had the occasion to use it. I have now embraced it to slice my spills. The reference
=$D400# 401:401
will return the 2nd row from the range spilling out of D400.
Pros:
- Totally native reference which gets resolved at the precompilation stage and imposes zero overheads on the actual computation.
- F2 provides perfectly good auditing. Sure, 401:401 is oversized (relative to D400#) but you see the intersection. If you made a reference error, this is now as obvious as with the fixed range. And you can use your mouse to fix it.
- It is reasonably concise. For example, =$D400# 401:401 + $D400# 403:403 is a fairly succinct way to sum the 2nd and 4th row of the spill in D400.
- The downstream code will only receive good data (ie does not need to deal with blanks).
- We should never expect any #SPILL! errors.
- It requires no RowLabels.
Cons:
- It may freak out the average reader who hasn't a clue what's going on.
If I managed to sustain your attention until this point, then I shall reward you with a further thought. You may recall bullet #2 from the Pros of the work-around: How can we construct a spill from manual inputs such that it becomes possible to feed a downstream calculation that uses spill notation?
I wrote a VBA macro to perform this procedure in the spill area itself: Write your input data somewhere, say into B100:AZ105. Put a reference to that fixed range into the TLC of your spill area, ie =B100:AZ105. (The data will spill, obviously.) Now press F2 followed by F9, then Enter. You can now delete B100:AZ105 😉
Happy coding.