Forum Discussion

PeterBartholomew1's avatar
PeterBartholomew1
Silver Contributor
Dec 09, 2024

Working with Arrays of Ranges

OK, so strictly speaking an array of ranges is an error in Excel right from the outset.  However, I set up an array of tables contained within the named Range  'allocatedRange', here shown with a red outline.

I then set up a function that would return the range corresponding to a country

SelectDataλ
= LAMBDA(country,
    LET(
        countries,   FILTER(TAKE(allocatedRange, , 1), markers = "Country:"),
        recordNum,   FILTER(SEQUENCE(ROWS(allocatedRange)), markers = "Country:"),
        recordEnd,   VSTACK(DROP(recordNum - 1, 1), ROWS(allocatedRange)),
        rangeRows,   recordEnd - recordNum,
        countryArrϑ, MAP(
            recordNum,
            rangeRows,
            LAMBDA(start, rows, LAMBDA(TRIMRANGE(TAKE(DROP(allocatedRange, start), rows))))
        ),
        XLOOKUP(country, countries, countryArrϑ)()
    )
)

The start and row count of each table is determined by using the string "Country:" as a marker and differencing row numbers.  A combination of TRIMRANGE/TAKE/DROP picks out each range in turn and assembles them into an array by turning the range references into thunks (TYPE=128).  The function SelectDataλ is used to look up any specific country and return the corresponding range.

To demonstrate that the function indeed returns ranges, the COUNTIFS function is used to count the number of cities within each country that have a population exceeding 100 000.

= LET(
    countries, FILTER(TAKE(allocatedRange,,1), markers="Country:"),
    countLarge, MAP(countries,
      LAMBDA(country,
        COUNTIFS(TAKE(SelectDataλ(country),,-1), ">1000000")
      )
    ),
    HSTACK(countries, countLarge)
  )

The point of this post is to introduce the idea of treating tables as elements of an array, so allowing further tables to be inserted or removed dynamically.  TRIMRANGE ensures that each range is sized correctly and MAP is used to analyse each table in turn.  Whilst Excel throws hissy fits at being asked to store arrays of arrays, arrays of ranges, or nested arrays, it is perfectly happy to store arrays of functions.  When evaluated, each element of the function array is perfectly free to return an array or even a range.

The effect is to permit Excel to process 'arrays of Tables' faultlessly.

16 Replies

  • The ideas proposed in this discussion appear to be working well.  The data dictionary allows arrays of ranges to be stored (as would the multi-area range) but the ranges may be mixed with calculated arrays.  It is possible to return a data dictionary from a LAMBDA function as a single array object.

    A specific use examined in Dynamic Let Function | Microsoft Community Hub allows the contents of multiple LET variables to be returned and scrolled through as and aid to debugging.

    The format introduced in this discussion allows an in-memory data dictionary to be written to a worksheet to persist the data beyond the LAMBDA/LET environment in which it was created.  Functions are provided to enable any other Lambda function to reload the dictionary from the worksheet if needed.

    Since the data dictionary contains only references to data in memory and not the actual data arrays it should be possible to replicate it or pass it to other functions with relatively low overhead (this is speculation not, as yet, backed by evidence).  Due to immutability, it is not possible to append a data item to a dictionary, but multiple dictionaries may be stacked to form a new, more comprehensive, dictionary.

    The image shows the result of simply returning a dictionary (here formed from a set of LET variables); the result of creating a persistent copy using a function;

    = DB.PERSISTλ(TESTλ(a,b))

    and a single item selected from the re-read dictionary by a spinner setting (this final object is included merely to demonstrate that the data has been re-read from the worksheet).

  • lori_m's avatar
    lori_m
    Iron Contributor

    Interesting scenario and method. 

    I played around a little looking to see if it was possible to avoid the use of thunks by combining references into arrays and had partial success. If the names of the ranges are Table1,..,Table4 we can define a multi-area range: Tables=Table1,Table2,Table3,Table4 and try the following formula to convert this to an array of ranges. 

    =SUMIF(LAMBDA(T,MAP(SEQUENCE(AREAS(T)),LAMBDA(a,DROP(INDEX(T,,,a),{0}))))(Tables),"<>")

    Strangely, this appears to work fine in Monthly Enterprise Channel but not Beta Channel.

    An alternative that was more successful was to build a multi-area reference via REDUCE as shown

     

     

    • SergeiBaklan's avatar
      SergeiBaklan
      Diamond Contributor

      Nice idea with areas. In general we may avoid explicit markers and don't care about tables location. If only to keep existing layout for each table.

      =LET(
        Tables,    (Table1,Table2,Table3,Table4),
        n,         SEQUENCE( AREAS(Tables) ),
        bar,       1000000,
        city,      LAMBDA(i, OFFSET(  TAKE(INDEX(Tables,,,i),1,1),-2,  ) ),
        qty,       LAMBDA(i, COUNTIF( DROP(INDEX(Tables,,,i), ,1), ">" & bar ) ),
        VSTACK(
          {"Countries","Cities > 1m"},
          HSTACK(
             MAP( n, city ),
             MAP( n, qty  )
          )
        )
      )

      and have not thousands of tables to care about performance.

    • lori_m's avatar
      lori_m
      Iron Contributor

      Related to the DICTIONARYλ suggestion, I wonder if it might be worth defining a nestable dictionary such as, 

      DICT = LAMBDA(array, LAMBDA([key],
          IF(ISOMITTED(key),
              array,
              XLOOKUP(key, VALUETOTEXT(TAKE(array, , 1)), TAKE(array, , -1))
              )
          )
      );

      which, like in python, returns a dictionary from a two column array. e.g. one could try

      =countries()
      =countries("United Kingdom")() 
      =countries("France")("Paris")

      where

      countries =DICT(
          WRAPROWS(VSTACK(
             "United Kingdom", DICT(Table1),
             "Germany", DICT(Table2),
             "France", DICT(Table3),
             "Italy", DICT(Table4)
         ),2)
      );

      Though I'd still tend to side with the earlier PIVOTBY suggestion. 

    • PeterBartholomew1's avatar
      PeterBartholomew1
      Silver Contributor

      Hi Lori

      I think my experimentation may have more significance in terms of my working than I realised!

      The pairing of a text string and a thunk allows me to build a data dictionary that provides a mechanism to pass complex data structures between Lambda function (in particular to return multiple data objects from a sub-function).

      So far I have written a DICTIONARYλ constructor function that builds the data structure

      /*  NAME:           DICTIONARYλ
          REVISIONS:      Date        Developer           Description  
                          Dec 19 2024 Peter Bartholomew   Original Development
      */
      DICTIONARYλ 
      = LAMBDA( _key01, object01, [_key02], [object02],  ...
          // Check inputs - Omitted required arguments
          Help?, OR(
              ISOMITTED(_key01),
              ISOMITTED(object01),
          ),
      
          NULLOMMITTED, LAMBDA(x, IF(ISOMITTED(x), "", x)),
          _key02, NULLOMMITTED(_key02),
          ...
      
          //  Procedure
          //  Build array of unique keys
              uniqueKeys, VSTACK(_key01, _key02,_...),
          
          //  Build array of thunked arrays (also applicable to scalars)
              objectArray, VSTACK(LAMBDA(object01), LAMBDA(object02), ...  ), 
      
          //  Combine unique keys with references to data objects
              result, FILTER(HSTACK(uniqueKeys, objectArray), uniqueKeys<>""),
             
          CHOOSE(Help? + 1, Result, Help)
          )
      );

      as well as generic and dictionary-specific functions to return data from dictionaries

      GETλ
      = LAMBDA(dict, LAMBDA(key,
          LET(
              uniqueKeys, TAKE(dict,,1),
              objects,    DROP(dict,,1),
              result,     XLOOKUP(key, uniqueKeys, objects, "keyword not found"),
              (@result)()
          )
      ));
      
      GETSERIESDATAλ
      = GETλ(seriesDataDictionary)
      ;

      and the ability to return a list of keys

      KEYSλ = LAMBDA(dict, TAKE(dict,,1));

      Within a functional programming environment, each dictionary is write once, read many since all objects are immutable, but I should be able to combine two directories to give a fresh data object.

      That might sound extravagant in terms of memory commitment, but since I am only storing thunks (references to memory allocated to a LET variable) the burden might be modest.

      The Excel failure to return arrays of arrays makes it a little difficult to dump an image of the dictionary to the worksheet for checking or to provide persistence but I think my MAPλ function should work.

      I have addressed these thoughts to you, as a Python programmer, in the hope that you might have additional insight that you could draw upon to educate me🙂!  Of course, others are welcome to contribute.

      • lori_m's avatar
        lori_m
        Iron Contributor

        Hi Peter - I think a natural way to define a data dictionary in Excel is as an (n x 2)-array consisting of key/value pairs. Given definitions as in the earlier attachment, a data dictionary could be defined pairwise,

        dic = WRAPROWS(VSTACK(
               "United Kingdom", LAMBDA(Table1),
               "Germany", LAMBDA(Table2),
               "France", LAMBDA(Table3),
               "Italy", LAMBDA(Table4)
        ),2)

        or alternatively as a pair of lists,

        dic = LET(
            keys, FILTER(TAKE(allocatedRange,,1),ISTEXT(markers)),
            values, StructureDataλ(markers, allocatedRange),
            HSTACK(keys, values)
        )

        In either case TAKE(dic,,1) returns the keys and (@VLOOKUP("United Kingdom",dic,2,0))() returns the values corresponding to the key. This appears much like the GETλ / KEYSλ functions in your comment. However, I wouldn't advise creating DICTIONARYλ as I don't know of any way to specify an extensible list of arguments within a lambda definition - the first method above seems like a reasonable compromise in my view. 

        For comparison the equivalent definitions in Python would be,

        dic={"United Kingdom":Table1,"Germany":Table2,"France":Table3,"Italy":Table4}

        or otherwise utilising tuples,

        dic = dict(zip(
            ("United Kingdom","Germany","France","Italy"),
            (Table1,Table2,Table3,Table4)
        ))

        dic.keys() and dic.get("United Kingdom") return keys / values. A custom data dictionary could be defined by passing arguments using unpacking operators (*args / **kwargs) - something Excel currently lacks

  • djclements's avatar
    djclements
    Bronze Contributor

    Good day Peter,

    While I cannot advocate for this type of table/data structure, I do appreciate the TRIMRANGE demonstration as it is still currently unavailable to me on the business channel. The concept of working with an array of ranges is also very useful and is something I would highly recommend to anyone looking to up their game.

    There was a SUMIFS riddle posted on another forum a few weeks ago, and this concept worked beautifully for solving it. Even after broadcasting a vector of thunked ranges across multiple rows/columns, they still returned a range object when recalled later. I've attached a copy of the file if anyone's interested, which includes the link to my post explaining how it works. Incidentally, IFNA can also be used as a broadcasting tool instead of IF({1},...).

    I tried to play around with your scenario a bit, but without TRIMRANGE at my disposal, I was forced to rewrite it using other methods.

    Kind regards.

    • Patrick2788's avatar
      Patrick2788
      Silver Contributor

      The way I see the problem in the attached workbook is that it can be solved with PIVOTBY but there are two obstacles:

      1. The data is already pivoted
      2. The row labels needed to be swapped

       

      For both obstacles I have generalized Lambda functions: UnPivotMλ and FindReplaceλ. The solution is then:

      =LET(
          row_labels, FindReplaceλ(lstFruits, includes, lstBundles),
          flat, UnPivotMλ(row_labels, dates, data),
          r, TAKE(flat, , 1),
          c, CHOOSECOLS(flat, 2),
          v, TAKE(flat, , -1),
          PIVOTBY(r, c, v, SUM, , , , 0)
      )

      UnPivotMλ uses shaping functions and sequencing while FindReplaceλ is essentially a recursive REGEXREPLACE. I'd drop the code here, but I don't mean to derail Peter's post!

  • SergeiBaklan's avatar
    SergeiBaklan
    Diamond Contributor

    Off topic. Not to forget OfficeScript tried to repeat that particular sample on it

    function main(workbook: ExcelScript.Workbook) {
    
    	const tables = workbook.getTables()
    	const target = workbook.getActiveWorksheet().getRange("L5:M5")
    	const bar = 1000000
    
    	let i = 0
    	target.setValues([["Countries", "Cities > 1m"]])
    	for (let table of tables) {
    			target
    				.getOffsetRange(++i, 0)
    				.setValues( [[
    					table.getRange()
    						.getCell(-1, 0)
    						.getValue()
    						.toString(),
    					table.getRangeBetweenHeaderAndTotal()
    						.getLastColumn()
    						.getValues()
    						.filter(x => x[0] > bar).length
    					]] )
    	}
    }
    • PeterBartholomew1's avatar
      PeterBartholomew1
      Silver Contributor

      It would appear to do the job!

      The Excel world is getting more confusing.  There always were those for whom the Excel grid was merely a space from which values could be read into VBA for processing and returned for display.  Then we were told that was to end, and everything would be OfficeScript.  Yet I see less focus on OfficeScript than on Python or PowerQuery M.  DAX also appears to be a somewhat niche artform. 

      I really should broaden my skillset but at the moment it is an interesting challenge to see whether the Excel formula language can do every calculation that can be performed in Turing machines.  Python has NumPY and SciPY to support calculation but, for Excel formulas, one has to write one's own library of algorithms.  Whether one regards that as service to enhance Excel, or merely a waste of time replicating code that already exists elsewhere, is a matter for debate 🤔

       

  • Patrick2788's avatar
    Patrick2788
    Silver Contributor

    This appears to be a more flexible XLOOKUP/Thunk demo.  I recall a workbook from a few years ago where XLOOKUP was being used to unpack thunks and return tables.  The catch was the tables had to be VSTACK'd and thunked - every one of them.  It made for an exciting demo but would be difficult to find a practical use for the technique.  Nevertheless, I studied it and made my own workbook until I could internalize it.

    A few observations:

    1. I think what is fascinating with this workbook is how you've managed to avoid converting ranges to arrays.  TAKE/DROP are interesting because the result of each function is accepted by the range-only functions.  The part I wouldn't have guessed would hold up is when countryArrϑ is unpacked with XLOOKUP then later used in COUNTIFS!
    2. There's no declared Thunk function in your module but I see where countryArrϑ is being thunked.  Very subtle and smooth.
    3. I messed around a bit with trying SelectDataλ with 3D references but received errors because I used VSTACK on my references and made arrays out of ranges. Could this function handle 3 references? 

       

     

    • PeterBartholomew1's avatar
      PeterBartholomew1
      Silver Contributor

      In a sense, the array of 2D ranges down the sheet is a representation of a 3D range.  If we turn attention to 3D ranges of the form

      range3D
      = Sheet1:Sheet3!$K$3:$K$20

      I suspect that life is going to get a whole lot trickier.  

      It should be possible to return an array of thunked arrays, but that would not allow the use of TRIMRANGE or any of the xIFS functions.

      = LET(
          TRIMARRλ, LAMBDA(column, LAMBDA(FILTER(column, NOT(ISBLANK(column)),""))),
          stacked,  HSTACK(range3D),
          thunkArr, TOCOL(BYCOL(stacked, TRIMARRλ)),
          MAP(thunkArr, LAMBDA(ϑ, AVERAGE(ϑ())))
        )

      would apply the AVERAGE function to each sheet individually.  Another option might be to use INDIRECT which can return ranges.  For example

      =INDEX(
          MAP(
              TOCOL(Sheet1:Sheet3!$A$1) & "!K3:K20",
              LAMBDA(string,
                  LAMBDA(TRIMRANGE(INDIRECT(string)))
              )
          ),
          2,
          1
      )( )

      returns the trimmed range on sheet 2. 

      Note: The range Sheet1:Sheet3!$A$1 contains the sheet names

      = TEXTAFTER(CELL("filename",A1), "]")

      Nothing too elegant but, as yet, we are simple sounding out the limits of what is possible, though I concede that may be limited by my lack of understanding.

  • I have updates the file to show the thunk array as #CALC! errors when output to the worksheet and also evaluated to show their content (Edit does not seem to be an option on the original post).

Resources