Array Formulas / Array Concepts Summary (desired and attempted)

Iron Contributor

Array formulas / formulas dealing with arrays are great but hard to learn for an average user. Unfortunately, a comprehensive and formal description of rules and Excel behaviours regarding array formulas is not well documented or not easily available. There are websites which try to explain things and Microsoft's own support pages 

explain some guidelines but overall, the information is scattered. Some books can be found dedicated to Array Formulas, but a book is too lengthy an investment in terms of time at least, and is a last resort.

 

1) If any good resources exists which comprehensively document the rules and Excel behaviours governing array formulas in a summarised way, pls. share!

 

2) In a personal bid to summarise the common rules on a single page with demo examples, i have created attached file. The focus is to document how the array formulas / formulas dealing with arrays behave when,

 

a) The Inputs are : a range constant / an array constant / range from a formula / array from a formula ...

b) Given to : formulas which return an array / a single value ...

c) Typed in : single cell / multiple cells ...

d) Making use of : no operator / @ operator / CSE (ctrl+shift+enter).

 

The file is a personal work for personal use, but i felt that it can be of some use to other members here as a reference (only). Expert members are welcome to review it, correct it, value add, because it still seems incomplete and hence this post. Their inputs can help the community.

 

Yes, the attached file summary looks bit congested but it was a goal to give a single glance perspective to things. I hope it is not so complicated if each demo formula is seen one at a time, and by referring to the cell color which matches the rule it demonstrates. How the formulas work is not described in detail as one would do in a manual or training material because the file intent is neither of those. It is only to put in single place (for personal use) the common concepts taken from references included below.

 

Thanks,

 

References

https://support.microsoft.com/en-us/office/guidelines-and-examples-of-array-formulas-7d94a64e-3ff3-4...

 

https://support.microsoft.com/en-us/office/dynamic-array-formulas-and-spilled-array-behavior-205c6b0...

 

https://support.microsoft.com/en-us/office/implicit-intersection-operator-ce3be07b-0101-4450-a24e-c1...

 

https://exceljet.net/glossary/broadcasting

7 Replies

@amit_bhola 

I'm not sure if there are any good resources which document the rules and Excel behaviours governing array formulas more reliably than those you have listed in the post. My understanding, based on developing function libraries in the Excel C API, is that array formula behavour is essentially dictated by the function signature.

 

Worksheet function parameters and results can be assigned a data type that may be a single value, array or reference. Examples of each type are given in the table below. Note some functions like INDEX may take either ranges or arrays as parameters or results.

 

Param / Result | Value          Array            Reference
----------------------------------------------------------
Value          | ABS(_)         INDEX(_,,_)      INDIRECT(_)
Array          | SUM(_)         TRANSPOSE(_)     INDEX(_,)
Reference      | N(_)           ROW(_)           OFFSET(_,,)

 

Underscores may be replaced with range or arrays and results analysed with the insert function dialog or evaluate function tools. Depending on Excel version, coercion may be required to obtain an array result for some of these functions. .

 

This link may provide further insight:
https://techcommunity.microsoft.com/t5/excel/hyperlink-function-do-not-support-dynamic-arrays/m-p/20...
Comprehensive function listings by data type and further links are here:

https://chandoo.org/forum/threads/excel-function-reference.35686/
https://github.com/lhem/excel/blob/master/FuncList.xlsx

@amit_bhola 

I suspect you have taken on too much in trying to present a complex topic with a large number of exceptions in a single table.  You will have benefitted in its production but, for someone looking for information, it makes for difficult reading.

 

The first thing I would suggest is to deal with Excel 365 and legacy spreadsheets separately; they are very different beasts despite the fact that 365 is backward compatible.  I would also recommend developing the ideas using mainstream functions before addressing the 'odd-ball' functions such as INDIRECT.  There, @lori_m's work on function signature should be valuable.

 

One function that you could use is XLOOKUP which is capable of returning a range reference or an array (as is INDEX).  As an example, the returned values of

= XLOOKUP({"Feb";"March"}, textRange, numericRange)
and
= XLOOKUP("Feb", textRange, numericRange):XLOOKUP("March", textRange, numericRange)

may look similar but nesting within ISREF( ) or using it within COUNTIFS( )

= COUNTIFS(
    XLOOKUP({"Feb";"March"},textRange,numericRange),
    ">5")

= COUNTIFS(
    XLOOKUP("Feb", textRange, numericRange):XLOOKUP("March",textRange,numericRange), 
    ">5")

gives very different outcomes.

p.s. The names textRange and numericRange are simply your ranges held within cells A1:B4, but I haven't used direct referencing since 2015.

@Peter Bartholomew , @lori_m 

 

Thanks for your insights and resources.

 

I understand that there is always some chance of encountering exceptions in behaviours and then the pre vs. post Dynamic Arrays Excel changes.. there are many dimensions to cover. So it is just too ambitious (perhaps futile) an attempt to try to put everything on one screen and still keep it sensible/legible/clear to understand.

 

The other forum discussion shared by you (which not coincidentally was an active discussion between you!) needs quite some time to go through. My work keeps me busy, so i hope to follow up at some time in future.

 

Thanks,

 

PS : For the HYPERLINK function post for which @lori_m provided a link, i found using the Evaluate Function tool that the formula arguments are calculated in different order when implicit intersection operator @ is used vs. not used in one of the range arguments. That might explain why his solution worked internally. I am yet to come across a documentation which mentions this. If it is really so, then pages like below support page might need an update(?) with advent of DAE
https://support.microsoft.com/en-us/office/the-order-in-which-excel-performs-operations-in-formulas-...

 

=HYPERLINK(A1:A3,B1:B3)

amit_bhola_0-1628448840710.png

=HYPERLINK(@A1:A3,B1:B3)

 

amit_bhola_1-1628448907853.png

 

 

 

@amit_bhola 

It seems the evaluate formula tool skips over single cell references which can be confusing. One option is to put parentheses around arguments to access the intermediate results. This was mentioned in an example in some wiki notes accompanying the previous link, https://github.com/lhem/excel/wiki/Functions

 

@Peter Bartholomew

I just noticed I also used the XLOOKUP function as an example in the preceding link and, indeed, since a single value is expected for first parameter the result of those formulas is an array of references which is not supported. This is a similar situation to the HYPERLINK example - I suppose now one may use the MAP function as a workaround without relying on relative referencing.

@lori_m 

"now one may use the MAP function as a workaround without relying on relative referencing"

I see it as more than a workaround.  It should mean that now, at last, I can ditch the idea of relative referencing that I have regarded as a crap concept for a number of years!

 

I have toyed with putting together am 'Excel for heretics' course in which the A1 (and R1C1) notations are introduced as legacy methods of referencing individual records of a table/list or elements of an array (an array being an ordered list addressed by index) before deprecated further use in 365.

 

Quite often, the use of MAP is unnecessary because, with pairwise lifting, the same result is achieved with an array formula.  Overall, I am very happy with the new helper functions.  I put together an amortisation table with interest rates that can vary from year to year:

= SCAN(principal, period#,
      LAMBDA(balance,p,
         LET(
            MPR, INDEX(rate#, p),
            remaining, 12*duration + 1 - p,
            flow, PMT(MPR, remaining, balance),
            (1+MPR)*balance+flow
         )
      )
   )

and then summarised the results by year

= BYROW(INDEX(interest#, SEQUENCE(duration,12)), LAMBDA(a,SUM(a)) )

All far more elegant that direct relative cell referencing!

@Peter Bartholomew 

Indeed, my take is that MAP and SCAN fill a large gap in array calculation, allowing power users and developers to construct array solutions where DA evaluation falls short. I remember MAP was suggested in a discussion you initiated before LAMBDA even appeared on the scene. Perhaps there'll be a more compact approach if/when nested arrays are supported in future.

 

@amit_bhola 
In addition to the links you give above, I came across a nice overview by Joe McDaid,
https://powerusers.microsoft.com/t5/2019-MSBizAppsSummit-Gallery/Microsoft-Excel-Advanced-spreadshee...

 

Function signature is mentioned (29:15-32:10) and is fairly easy to understand in the context of new DA evaluation though reference types are not covered in that presentation (to keep things simple i suppose). Notable examples of functions taking reference parameters include the N function and former analysis toolpak functions. These functions can resolve an array of returns from OFFSET or INDIRECT though range parameters may need coercing to arrays to obtain required results.

 

The rules for legacy implicit intersection evaluation are more complex. Functions which default to a scalar result are underlined in the previous function listing and in general require CSE whether or not the parameter is a range or an array. To deliver an array of results, one may pass an array/reference result as a reference parameter e.g. N(INDEX(_,)) so as to override implicit intersection evaluation of function parameters that may return arrays e.g. CELL(_). This latter behaviour is really not intuitive at all - early in the presentation even refers to old style formula combinations of this kind as 'Black Magic'!

@lori_m 

I was aware of @JoeMcDaid's presentation and had found it added clarity to some of the non-standard techniques I had adopted in legacy Excel.  I had schooled myself to commit all formulas with CSE but it hardly provided a great user experience.  What I had discovered was that moving the calculation to name manager simplified the use of sequences of array formula (despite an MS help page that suggested that the uses of named functions within names was not recommended).

 

The insight, given by the presentation, that I was simply using defined Names to evade implicit intersection, was really helpful.  The dynamic array behaviour that emerged was better than anything I had hoped for.  I hadn't got beyond the point of 'Insert Array' (to parallel 'Insert Table') in my thinking but allowing array calculation to roam free was way better.