Forum Discussion
Is now the time to abandon the concept of relative referencing?
"defaulting it might cause a stir"
Yes, but that is at the core of the argument. Users will follow the path that is most easily discovered. For me, that path leads firmly in the wrong direction, enticing users to follow. The scalar operations do lead to a very direct sense of the user manually interacting with their numbers. That suits many users, but they should receive some warning/indication that they are opting for legacy methods that deliver little in the context of modern Excel.
They will come to think of being limited to scalar arithmetic as representing 'simplicity' yet, in reality, the inherent complexity of the problem has simply moved to the task of wading through a sea of semantically limited cell formulas trying to divine the path of the overall calculation. Such users may seek more knowledge from the 'tips and tricks' genre of multiply plagiarised content, perhaps thinking that memorising over 350 shortcuts will in some mysterious way add wisdom to the proceedings.
With dynamic arrays, I look for 'simplicity' not in the individual scalar operations but in the overall patterns of array calculation. Operations like sorting, filtering, wrapping, grouping emerge along with the concept of arrays; they are not meaningful at the level of scalar thinking. Many problems, such as that of banded bonuses given to salesmen or taxation bands (attached) have array solutions that follow a different path from that adopted by fill-down formulas.
This process of abstraction, in which the details of a calculation are left to Excel, is a key element in the fight to control the error rates that characterise spreadsheet solutions and give the spreadsheet such a poor reputation. Overall, I would not suggest attempting to force users to adopt modern methods (that would simply drive many casual users away) but neither do I want to signpost a default route that ultimately leads to a dead end.
PeterBartholomew1 My two cents...
I'm more of the attitude that if the situation calls for it, use it. If not, just keep it simple. This is especially true when helping out beginners. If I were to hazard a guess, I would say that the "average" user falls into this category. Intermediate users would have a decent chance at adapting, but people who are still using =SUM(A2+A3) formulas are not very likely to grasp major concept changes. I still feel the "default" behavior should be the path of least resistance, so no one gets left behind.
On the topic of defining names, I say each to their own. For me, less is more (in general). I find spreadsheets that were created with excessive, unnecessary named ranges to be harder to modify, even if I was the one who built it. Too much time is wasted jumping back and forth between the worksheet formulas and Name Manager trying to reverse engineer everything. I would much rather see the cell references directly in the formula, so I know exactly what they're referring to. Having said that, there are plenty of situations where named ranges are essential, and the majority of my spreadsheets have at least a few.
Further to that, when it comes to defining names within the LET function, I'm all for defining each variable to improve readability, especially when first introducing others to LET. However, it's easy to get carried away here too. One thing I'll probably never get on board with is defining the LAMBDA function first for BYROW, BYCOL, MAP, REDUCE, SCAN, etc... it's like putting the cart before the horse, and actually reduces readability in my opinion, as you end up having to backtrack to figure out what's happening, rather than just reading each step in logical order.
I often find myself suggesting solutions on this forum that I would never use personally, simply because I would never setup my spreadsheets in a manner that calls for them. Rather than trying to explain the pitfalls of certain design choices, I just end up answering the question directly, even if it goes against my own preferred methods. The beauty of Excel is that there are many ways to get the job done. Some methods are easier to understand, while others can be more efficient. In many cases, though, it boils down to personal preference.
By the way, the custom TaxCalcλ function you shared in the TaxBands file is awesome. It's so simple and yet beyond the comprehension of any normal person (looks like Thunks was used?). You sir, are in a league of your own. 🙂
- PeterBartholomew1Mar 06, 2024Silver Contributor
If you liked TaxCalcλ, try its big brother
= FIFO.GenerateTableλ()
As it stands, the lack of any parameters would cause it to error but following Craig Hatmaker 's ideas on Component-Based Software Engineering, eusprig-proceedings-2023.pdf (pp 33-58), it actually returns
With the parameters correctly input and with option=2, the result might be of the form
[shows costs pivoted by output (rows) and input (columns)]
I have also used the same formula for a challenge posted by Ian Hewitt on Chandoo a few year back with involved the Fe content of ore mined and supplied on a FIFO basis.
At its core, the idea of the calculation is the same as that used for the tax bands. That is stack two arrays including cumulative quantities and sort in ascending order. Then take the difference to assign quantities to individual tranches.
- Craig HatmakerMar 09, 2024Iron Contributor
Thanks for the shout out. Those functions look great so I guess it's time to package your LAMBDAs into 5g libraries.
BTW - 5g is on fire. In December I presented 5g at the University of Arizona's MECC championship. Last month Danielle Stein Fairhurst presented 5g functions a the Excel Global Summit. Model Citizn just posted an article on 5g. All next week I have 5g workshops for Full Stack Modeller. In April I'm at the University of Tennessee. In May I'm presenting at FMI (Financial Modelling Institute). Finomatic Consultancy is converting their models to 5g... and this all traces back to when you introduced me to your work with array formulas, what, 10 years ago? Thanks for all your help Peter.
- PeterBartholomew1Mar 09, 2024Silver Contributor
10 years? It must be about that, the FAST Standard was launched in 2014 during a meeting at the ICAEW. As a heavy user of both defined names and CSE arrays I queried features of the standard but it was not the occasion to follow up the discussion. In the following year I pretty much trained myself to commit all formulas with CSE and I think we were sharing ideas well before EuSpRIG 2015 (not that CSE arrays are that pleasant to work with). I also got invitations to ModelOff 2017 and 2018 Bootcamps and pestered Microsoft staff to improve their array offering in much the same manner as Tables had done for list data.
The complete rework of the Excel calculation engine, when it came, was far more than I had expected and has led us to the point where your pre-built modules are even possible. I may need some guidance from you on testing and publication. The algorithms that interest me most at present are those that use array shaping, sorting and filtering as key elements of the solution. Without arrays as the fundamental building block one could be left floundering in a sea of helper cells.
If you are giving any presentations that I could listen in on, could you send me a link/invite?
- djclementsMar 07, 2024Bronze Contributor
PeterBartholomew1 That’s pretty cool too. There’s a lot more to it than TaxCalcλ, so I’ll have to really break it down and see if I can understand what’s going on.
In reference to your SUMIFS example, yes, the named ranges quickly reveal the error. That’s the type of situation where a structured table is the way to go. Admittedly, SUMIFS has always been one of my favorite Excel functions, and even more so now with MS365 as it can easily spill a 2D array of results when using both horizontal and vertical arrays as the criteria. Performance was sometimes an issue with SUMIFS in older versions of Excel, but not so much anymore (unless you’re dealing with huge amounts of data, in which case PQ can be used to reduce the dataset accordingly).
That’s very interesting to hear a bit about your background. I figured it was something technical, but had you pegged as being an ex-computer programmer or engineer of sorts. My background is considerably less impressive, lol. I studied ornamental horticulture, majoring in turfgrass management, but found it difficult to make a living in Canada, as the golf season is only 5 and a half to 6 months long! After a few years of that, I put my math and computer skills to use and found a more stable job as a data entry clerk at an accounting office, which is where I developed my love for Excel. John Walkenbach’s books were a great resource at the time, and really opened my eyes to what Excel was capable of.
I made the jump from Office 2010 to MS365 around 2 years ago out of necessity. Personally, Office 2010 had everything I needed, but clients I previously built spreadsheets for were migrating from Sage 50 to Xero and needed integration with newer add-ins only available in MS365. The first spilled array formula I encountered was when the SCOTT.CHART function was released in mid-late 2022 (as part of Scott’s Add-Ins for Xero) and my initial reaction was “meh… a table would be better”. It intrigued me, though, and after a bit of research online, as well as watching a few YouTube videos, I saw the potential, but still didn’t have very many real-world scenarios to play around with. Joining the discussion here in October 2023 really helped me out in that department, and it’s been a great place to learn, experiment and develop a better understanding of the new dynamic array functions.
Tying all of that back to the topic at hand, I think part of the reason why array functions come naturally for people like you and me, as well as the other contributors here, is the way our minds work. There’s a certain amount of logic required... you need the ability to visualize what the results of an array will look like in order to build a complex, single-cell formula (especially when dealing with nested arrays). This can be learned and achieved through experience but will inevitably come more easily for a small percentage of users. That’s not to say relative vs absolute vs mixed references aren’t a source of confusion… they absolutely are. But I think they’re still easier to learn and understand for the average user. 😉
- PeterBartholomew1Mar 04, 2024Silver Contributor
I guess it all depends on how one defines 'simplicity'. Whereas others see simplicity in a mass of disconnected formulas because they individually involve no more than primary school arithmetic, I regard that to be an unstructured and complex mess of relationships. Simplicity, for me, is achieved when many disconnected observations can be derived from one guiding principle (in this situation, a single array formula). I accept that it is me that is weird in this respect, an abstract thinker that studied mathematics, theoretical physics and astronomy (over half a century ago) before going on to work as a research scientist in aeronautics. There was a time when I had probably written more matrix formulas than natural language sentences!
Spreadsheets are more about doing than thinking. I wonder how many spreadsheet users could offer a coherent explanation of what is achieved by relative referencing and why the same formula, placed in different locations on the worksheet, give different answers. I haven't used relative references since 2015 , 1704.01142.pdf (eusprig.org), and by now things that spreadsheet users find normal, I see as odd. When I look at a term in a formula, I do not need to know 'where did that come from?'; I want to know 'what does that represent?' The A1 notation tells me the wrong thing; a name just might tell be what I need to know to evaluate the formula.
One of the hardest things I find, is to persuade an experienced Excel user to stop thinking in terms of 'and fill down' and see instead, a single array which may be reordered, reshaped or aggregated to achieve the desired results. There are times when it would be easier working with a complete novice who is free from the expectations of traditional spreadsheeting. Interestingly, though, this far the responses have been from individuals who are comfortable with traditional spreadsheeting but, at the same time, have fully mastered the opportunities presented by the new. You clearly beat me on versatility!
- Patrick2788Mar 04, 2024Silver Contributor
It's interesting you mention defined names which have been used more since the advent of LET/LAMBDA. I agree that sometimes there can be too many defined names and it becomes difficult to manage.
There's a long forgotten feature that was removed with Excel 2007 because it was seldom used: Natural Language Formulas.
This is from the wayback machine:
Microsoft Excel 2007 (nee Excel 12) : Deprecated features for Excel 2007 (archive.org)
Natural Language Formulas
This feature allowed people to use the labels of columns and rows on worksheets to refer to the cells adjacent to those cells without explicitly defining them as names. This feature has been disabled by default since Excel 2000 based on customer feedback. In Excel 12 we will completely remove this seldom-used feature from the product. When opening files in Excel 2007 that use this feature, formulas will be converted to use cell references.
Note: This feature could be found in the Excel 11 options dialog with the title “Accept labels in formulas”
Insert Name Label
This command allowed people to more explicitly specify labels for Natural Language Formulas and is being removed as part of that feature.
Maybe I'm misremembering this feature but it sounds like this feature would be a lot more useful in 2024 than in 2003.
- PeterBartholomew1Mar 06, 2024Silver Contributor
Interesting. I have seen add-ins along those lines but I didn't know Microsoft had ever tried to change end-user perception in that way. The spreadsheet user base appears to be very resistant to supposed 'improved' practice!
For me, it was always to opposite; I would simply 'glaze over' at formulas such as
= SUMIFS(Trading!DK5:DK12025, Trading!DI5:DI12025, "banana", Trading!DJ5:DJ12025, "Japan")
I don't think I even read the references after the first few; it was just encrypted 'stuff' that I couldn't be bothered to think about. My only strategy for maintaining attention was to replace the direct references by names derived from any adjacent annotation (assuming there was any). That way I might get to see
= SUMIFS(revenue, country, "banana", product, "Japan")
at which point even I might begin to suspect that all is not right with the formula. By the time I had understood the workbook, I had completely refactored it and then had the embarrassment of confessing to the author that I no longer had a copy of their workbook but that I had found some errors along the way!