Forum Discussion
Working with Arrays of Ranges
Interesting scenario and method.
I played around a little looking to see if it was possible to avoid the use of thunks by combining references into arrays and had partial success. If the names of the ranges are Table1,..,Table4 we can define a multi-area range: Tables=Table1,Table2,Table3,Table4 and try the following formula to convert this to an array of ranges.
=SUMIF(LAMBDA(T,MAP(SEQUENCE(AREAS(T)),LAMBDA(a,DROP(INDEX(T,,,a),{0}))))(Tables),"<>")
Strangely, this appears to work fine in Monthly Enterprise Channel but not Beta Channel.
An alternative that was more successful was to build a multi-area reference via REDUCE as shown
- SergeiBaklanDec 24, 2024Diamond Contributor
Nice idea with areas. In general we may avoid explicit markers and don't care about tables location. If only to keep existing layout for each table.
=LET( Tables, (Table1,Table2,Table3,Table4), n, SEQUENCE( AREAS(Tables) ), bar, 1000000, city, LAMBDA(i, OFFSET( TAKE(INDEX(Tables,,,i),1,1),-2, ) ), qty, LAMBDA(i, COUNTIF( DROP(INDEX(Tables,,,i), ,1), ">" & bar ) ), VSTACK( {"Countries","Cities > 1m"}, HSTACK( MAP( n, city ), MAP( n, qty ) ) ) )
and have not thousands of tables to care about performance.
- lori_mDec 23, 2024Iron Contributor
Related to the DICTIONARYλ suggestion, I wonder if it might be worth defining a nestable dictionary such as,
DICT = LAMBDA(array, LAMBDA([key], IF(ISOMITTED(key), array, XLOOKUP(key, VALUETOTEXT(TAKE(array, , 1)), TAKE(array, , -1)) ) ) );
which, like in python, returns a dictionary from a two column array. e.g. one could try
=countries()
=countries("United Kingdom")()
=countries("France")("Paris")where
countries =DICT( WRAPROWS(VSTACK( "United Kingdom", DICT(Table1), "Germany", DICT(Table2), "France", DICT(Table3), "Italy", DICT(Table4) ),2) );
Though I'd still tend to side with the earlier PIVOTBY suggestion.
- PeterBartholomew1Dec 20, 2024Silver Contributor
Hi Lori
I think my experimentation may have more significance in terms of my working than I realised!
The pairing of a text string and a thunk allows me to build a data dictionary that provides a mechanism to pass complex data structures between Lambda function (in particular to return multiple data objects from a sub-function).
So far I have written a DICTIONARYλ constructor function that builds the data structure
/* NAME: DICTIONARYλ REVISIONS: Date Developer Description Dec 19 2024 Peter Bartholomew Original Development */ DICTIONARYλ = LAMBDA( _key01, object01, [_key02], [object02], ... // Check inputs - Omitted required arguments Help?, OR( ISOMITTED(_key01), ISOMITTED(object01), ), NULLOMMITTED, LAMBDA(x, IF(ISOMITTED(x), "", x)), _key02, NULLOMMITTED(_key02), ... // Procedure // Build array of unique keys uniqueKeys, VSTACK(_key01, _key02,_...), // Build array of thunked arrays (also applicable to scalars) objectArray, VSTACK(LAMBDA(object01), LAMBDA(object02), ... ), // Combine unique keys with references to data objects result, FILTER(HSTACK(uniqueKeys, objectArray), uniqueKeys<>""), CHOOSE(Help? + 1, Result, Help) ) );
as well as generic and dictionary-specific functions to return data from dictionaries
GETλ = LAMBDA(dict, LAMBDA(key, LET( uniqueKeys, TAKE(dict,,1), objects, DROP(dict,,1), result, XLOOKUP(key, uniqueKeys, objects, "keyword not found"), (@result)() ) )); GETSERIESDATAλ = GETλ(seriesDataDictionary) ;
and the ability to return a list of keys
KEYSλ = LAMBDA(dict, TAKE(dict,,1));
Within a functional programming environment, each dictionary is write once, read many since all objects are immutable, but I should be able to combine two directories to give a fresh data object.
That might sound extravagant in terms of memory commitment, but since I am only storing thunks (references to memory allocated to a LET variable) the burden might be modest.
The Excel failure to return arrays of arrays makes it a little difficult to dump an image of the dictionary to the worksheet for checking or to provide persistence but I think my MAPλ function should work.
I have addressed these thoughts to you, as a Python programmer, in the hope that you might have additional insight that you could draw upon to educate me🙂! Of course, others are welcome to contribute.
- lori_mDec 21, 2024Iron Contributor
Hi Peter - I think a natural way to define a data dictionary in Excel is as an (n x 2)-array consisting of key/value pairs. Given definitions as in the earlier attachment, a data dictionary could be defined pairwise,
dic = WRAPROWS(VSTACK( "United Kingdom", LAMBDA(Table1), "Germany", LAMBDA(Table2), "France", LAMBDA(Table3), "Italy", LAMBDA(Table4) ),2)
or alternatively as a pair of lists,
dic = LET( keys, FILTER(TAKE(allocatedRange,,1),ISTEXT(markers)), values, StructureDataλ(markers, allocatedRange), HSTACK(keys, values) )
In either case TAKE(dic,,1) returns the keys and (@VLOOKUP("United Kingdom",dic,2,0))() returns the values corresponding to the key. This appears much like the GETλ / KEYSλ functions in your comment. However, I wouldn't advise creating DICTIONARYλ as I don't know of any way to specify an extensible list of arguments within a lambda definition - the first method above seems like a reasonable compromise in my view.
For comparison the equivalent definitions in Python would be,
dic={"United Kingdom":Table1,"Germany":Table2,"France":Table3,"Italy":Table4}
or otherwise utilising tuples,
dic = dict(zip( ("United Kingdom","Germany","France","Italy"), (Table1,Table2,Table3,Table4) ))
dic.keys() and dic.get("United Kingdom") return keys / values. A custom data dictionary could be defined by passing arguments using unpacking operators (*args / **kwargs) - something Excel currently lacks
- PeterBartholomew1Dec 22, 2024Silver Contributor
I see your point about DICTIONARYλ, in that it limits the VSTACK formulas it uses rather than increasing the scope of application. I set it up for a data dictionary length longer than required and discarded the pairs that were not populated using IFOMITTED. This is very limited functionality but it more than met my current requirements.
At the moment, the function that I am planning to convert calculates 24 time series across 360 time periods. To pass the results back, it stacks the 24 individual arrays into an 24 x 360 array which is returned to the calling function. There, yet another copy of the data follows as each series of the array is singled out with CHOOSEROWS and assigned a new name. Mistakes at that stage are possible and may not be detected immediately. Then it remains to use VSTACK once more to arrange the arrays to match the user-expectation for the output, with tastefully placed blank rows and some arrays appearing more than once.
My hope is that, by turning the arrays into a data dictionary that uses thunks, I am only replicating references to the data, not the arrays themselves. I plan to generate the presentation layer by referencing arrays from the returned data dictionary directly, based upon the keywords that come as part of the dictionary.
Hopefully I will have reduced both the data storage requirement and the likelihood of error. These are all very new ideas for me and I certainly haven't worked through the pros and cons!
- PeterBartholomew1Dec 18, 2024Silver Contributor
Wow, I would bet that those are a couple of formulas that no one has tried to write before! What do you mean by 'not working' ? Does the formula error or does it crash Excel? I have noticed a number of occasions where the beta channel code hesitates, crashes, reopens, and crashes a second time. The recovered file contains areas of #N/A in place of formulas. Out of curiosity what is
LAMBDA(a,DROP(INDEX(T,,,a),{0}))
intended to do? My guess it that it was intended as the means to extract an individual range reference. I agree that, since the concept of multi-area ranges is a native part of Excel it makes sense to evaluate their applicability.
In my exploration of 'array of ranges', though, I wasn't setting out to avoid thunks. More the opposite, I am increasingly comfortable with the concept, and look for opportunity to deploy them. I see parallels with Named Formulas which can be defined in Name Manager, but are not evaluated until such time as they are used within a formula. They may never be evaluated but, each time they are, it will be a fresh calculation. The alternative is to evaluate the Name in a Range and then reference the Range. The price of the calculation is paid at that point but its multiple reuse is cheap.
Similarly, I believe placing a formula within LAMBDA turns it into a named function that may never need to be evaluated. The catch is that if the function is used within MAKEARRAY to extract its elements term by term, the price of the calculation will be paid at every step. What I attempt, is to use LET to force the calculation and only then to create a thunk to reference the contents of the variable. I am then able to pass the thunk as a reference to an area of memory, through a chain of nested function calls as a reference rather than an array.
I was driven to such strategies by necessity (explaining to a lay audience why an array of thunks might be the best thing since sliced bread is not something to be undertaken lightly), but it could even be considered as a way of improving calculation efficiency. If the array is used at the final step of a recursive formula, do you need a copy of the array at each step or would it be better to store it as a reference at each step and only evaluate it at the final step. Now I really am speculating beyond my competence!
Meanwhile, I am quite happy thunking along on the crest of a wave 😜.
- lori_mDec 18, 2024Iron Contributor
DROP(range,{0}) appears to have the effect of 'boxing' the range within a 1x1 array, i.e. range -> {range}, thus preventing it from being dereferenced in some situations. Conversely the @ operator has the effect of 'unboxing' the range, i.e. {range} -> range. I'd need to track down some other examples where lambda helper functions dereference range inputs and this trick was required.
Not that I have anything against thunks about which you will certainly be more informed than me. To be honest my use of spreadsheets hasn't progressed beyond copy/paste exercises of late so I thought it would be an interesting challenge to see if any of the old techniques could be leveraged here. One thought was that it may be possible to improve efficiency through manipulation of references. I haven't looked to see what the limit on areas is but I'd think there are scenarios where forming a union of range references can be more efficient than forming a union of range arrays particularly given hstack/vstack support multi-area ranges as parameters.
PS. Clarification to earlier formula: this is designed to return an array of sums for a multi-area range but currently throws #CALC! on beta version.