SPILLs of external data - word of warning

Iron Contributor

Just as a heads-up to fellow Excel programmers.  SPILLs (via FILTER) are obviously a great technique to import data from external databases.  The syntax to reference the imported results in eg A100:Y103 is the very handy "A100#".  All of this works very well in the presence of these databases. 

 

When we periodically supply outside parties with our production Excels, we do not export these databases (because in my experience we end up spending countless hours supporting the recipient in staging our production environment on their system).  To facilitate the data exchange, we only supply standalone versions of those Excels, ie only workbooks with broken links.

 

Now, here the nub:  When you break the link to the data source, Excel will freeze the imported results in A100:Y103 *)  That is exactly what we want.  So far, so good. 

 

BUT ...

 

The reference "A100#" now produces a #REF! error because that range is no longer a SPILL (but an ordinary range of numbers in the sheet).  To my money, this is an oversight by Microsoft.  Break Links would need to replace all references to "A100#" with the fixed span "A100:Y103".

 

To work around this issue, I put aside an oversized fixed span eg A100:BA103.  To the right of that span in BB100:BB103, I put some text (instructions on what to do with #SPILL errors).  I then put the FILTER call, as before, in A100.  But instead of referencing A100#, I now reference the fixed span A100:BA103, providing code (IFs) to deal with blank cells.

 

So long as the imported data can fit into the fixed span, all is well.  If FILTER spills over the end of the fixed span, the text to the right of it forces a #SPILL error, therefore preventing accidents.

 

*) If you are lucky, that is!  Data/Edit Links/Break Links is arguably the most unreliable feature in Excel and we use a VBA macro to largely replace this functionality, but that's a matter for a separate post.

1 Reply

The OP merits a follow-up.  During the summer of 2024, an idea came to me how to fix the breaking spills problem in VBA.  I then faced the question whether to continue with the work-around described in the OP or to rephrase my code.  I want to share here why that was even a question.  Let's start by recalling the pros and cons of spilling into a fixed range:

 

Pros:

  • Everyone knows the conventional fixed range addressing scheme used to access the spill.  Unfamiliar readers may be mystified how the fixed range becomes populated out of a single formula in the TLC, but that incomprehension will not prevent them from following the downstream code.
  • It is a no-brainer to replace the spill (of external data) with user data within that fixed span.  Just type away...
  • Excel provides STRONG auditing for the downstream code.  Press F2 and you will immediately see the slices of data (rows of horizontal spills or columns of vertical spills) extracted from the spill.  Any referencing error becomes immediately obvious and you can use your mouse to correct it.
  • If you place a "bookend" token to the right of the TRC (of a horizontal spill), your code is guaranteed to fail with a #SPILL! error when the spill attempts to break out of the range.  In this way, you can reliably protect the conventional references from accidentally only capturing parts of a spill.

Cons:

  • Your code will error out when the spill hits the bookend.  How bad that is depends on the complexity of your downstream code.  Our production models are very complex and it transpired that while the #SPILL! error resulted as designed, it quickly morphed into an amorphous #VALUE! in further downstream processing such that the original #SPILL! becomes very hard to trace - in particular by inexperienced Excel users.  We also experienced many more breaks in production than I had anticipated; my receiver ranges had been too short.  Interestingly, while increasing the size of those receiver ranges reduced the incidence of these errors, the decreasing incidence also reduced familiarity among users with those errors when periodically they did arise and how to resolve them.
  • The downstream code consuming these fixed-size slices of variable length spill data must be programmed to be tolerant of blank inputs.  The processing of blank inputs will reduce the computational efficiency of your model.

At which point the question pops up "What's the alternative?"  And one possible reason for staying with the work-around above is that you might reject them all...

 

How to slice a spill? 

 

At the heart of this question is the impossibility in Excel to return a range of Spills.  It is simply not possible to split a 6xN spill into 6 spills of 1xN each.  It is hence not possible to easily reference the 2nd row of a spill out for D400 as D401#.

 

Put that question to ChatGBT and it will tell you to use INDEX(s#,r,0) to access row r in spill s.  This is true enough, so let's review the pros and cons of this approach:

 

Pros:

  • It is computationally efficient.  However much INDEX will seem like a function, it isn't.  Excel's precompiler will resolve it before the actual computations start.
  • The downstream code will only receive good data (ie does not need to deal with blanks).
  • We should never expect any #SPILL! errors.

Cons:

  • INDEX is tedious to write and in its bulk distracts the reader of a downstream formula from the true purpose of that formula.
  • Humans simply aren't good at counting.  The row indexing is ok when you want the 2nd row but when you want the 10th - are you quite sure that row isn't #9 or #11?  This may be reasonably acceptable to the programmer but to any reader of your code, this is a nightmare.
  • Closely tied into the previous point, Excel's auditing is powerless.  All you gonna see is an outline of the spill (which INDEX references) whole.

I did not like this answer one little bit and set about writing a Lambda to ease my existence.  I was very pleased with my final version, which read

 

=WithRows(s#, RowLabels, LAMBDA(r, FILTER(r("Cash"), r("Dates")< ...

 

All my spill rows are prefixed by user-friendly labels such that RowLabels was simply the column immediately in front of the spill data. 

 

Pros:

  • Intuitively readable.
  • Practically eliminates any mis-referencing.
  • While the coding is "wordy", the gumph is all in the preamble, ie the mechanics of addressing the spill do not distract from the algorithm.
  • The downstream code will only receive good data (ie does not need to deal with blanks).
  • We should never expect any #SPILL! errors

Cons:

  • Not strictly native code.  While it is intuitive to me, others may not follow.
  • Less efficient than INDEX - the MATCHing of the string input is computationally more intensive than the numeric addressing.
  • The coding is "wordy".
  • While reference errors are virtually impossible, there is no auditing thru F2 either - like with INDEX, you only see s#.
  • Requires a map for each spill.

I discarded the ideas of INDEX and WithRows when I recalled the arcane Excel intersection operator:  It is possible to address D6 by writing "=D:D 6:6."  D:D addresses the entire column D, 6:6 the entire row 6.  The space in between is the "intersection operator" - Excel will only address those cells which are spanned by the range before it and after.  In this instance, that is only D6.

 

In 30 years of programming Excel, I never once had the occasion to use it.  I have now embraced it to slice my spills.  The reference

 

=$D400# 401:401

 

will return the 2nd row from the range spilling out of D400.

 

Pros:

  • Totally native reference which gets resolved at the precompilation stage and imposes zero overheads on the actual computation.
  • F2 provides perfectly good auditing.  Sure, 401:401 is oversized (relative to D400#) but you see the intersection.  If you made a reference error, this is now as obvious as with the fixed range.  And you can use your mouse to fix it.
  • It is reasonably concise.  For example, =$D400# 401:401 + $D400# 403:403 is a fairly succinct way to sum the 2nd and 4th row of the spill in D400.
  • The downstream code will only receive good data (ie does not need to deal with blanks).
  • We should never expect any #SPILL! errors.
  • It requires no RowLabels.

Cons:

  • It may freak out the average reader who hasn't a clue what's going on.

If I managed to sustain your attention until this point, then I shall reward you with a further thought.  You may recall bullet #2 from the Pros of the work-around: How can we construct a spill from manual inputs such that it becomes possible to feed a downstream calculation that uses spill notation?

 

I wrote a VBA macro to perform this procedure in the spill area itself: Write your input data somewhere, say into B100:AZ105.  Put a reference to that fixed range into the TLC of your spill area, ie =B100:AZ105.  (The data will spill, obviously.)  Now press F2 followed by F9, then Enter.  You can now delete B100:AZ105 😉

 

Happy coding.