Forum Discussion

data_junky's avatar
data_junky
Copper Contributor
Feb 21, 2025

Using Scan function with Oddly shaped arrays

I have two arrays of different shares.

One array is a set of values. The other is a set of growth rates.

The challenge is to apply the scan function to get the an array of values that are increased by the growth rates assuming compounding. Below is as snapshot of the problem and the desired solution.

 

  • rkarpisz's avatar
    rkarpisz
    Copper Contributor

    A less technical/more naive solution (i.e., one that works specifically for this situation) can be for you to use SCAN with the ISOMITTED function in the LAMBDA to handle the blank areas of the array, so that they function as a 0% increase.

    =SCAN(value,rates,LAMBDA(val,rate,IF(ISOMITTED(rate),val,val*(1+rate))))

    To mask the unchanged values, if you always have the "upper triangular" form, then you can use MAP on the above array: return "-" if the result is equal to the original value.

    =MAP(SCAN(value,rates,LAMBDA(val,rate,IF(ISOMITTED(rate),val,val*(1+rate)))),LAMBDA(result,IF(result=value,"-",result)))

    Forgive me, I don't write technically for others often, so I hope you can interpret the above with my explanation without using LET to put things step-by-step.

  • data_junky  More on creating jagged arrays.

    A thunk is no more than a function that doesn't require any input parameters.  Like any other function, it contains nothing more than a formula.  In that, it is similar to a defined name which also refers to a formula as text.  Whereas the defined name is evaluated whenever it is referenced within a cell, a function requires a parameter string to trigger its evaluation.  Since a thunk does not require parameters, it will evaluate to give the same result every time it is used.  The formula may required substantial processing effort, in which case one wouldn't wish to call it too often, or it could be as simple a task of returning a pre-evaluated LET variable, in which case the function provides no more than a reference to a region of memory.

    Thunks are useful because they allow the programmer to manipulate data structures that are beyond the scope of the Excel calculation engine.  These include data structures such as an array of ranges, or arrays of arrays where the latter might include ragged right arrays or arrays comprising a mix or row and column arrays.

    To form a thunk, one performs the required calculation and then encloses the calculated result within a further LAMBDA.  Whilst the array of arrays is treated as an error, the equivalent array of thunks is valid and can be further processed by Excel formulas.  The catch is that, once one has the solution as an array of thunks, Excel will be capable of evaluating any element of the array, but error when asked to evaluate the result as a whole.  There are a number of ways of circumventing the problem.

    1. MAKEARRAY.  This will return a 2D array from a column of thunks containing rows (say).  Each row is returned by index multiple times and a single term is returned until the entire array is built up.
    2. REDUCE/VSTACK. This will work row by row, at each step stacking the newly returned thunk at the end of the array built so far.  Each of the increasing sized arrays exists within memory until the calculation is complete and the evaluation returns from the recursive process.  Memory management issues can make the evaluation very slow.
    3. EVALTHUNKARRλ.  This is a module I have included within the attached workbook.  It takes a column of thunks and uses WRAPROWS to form a two column array of thunks.  MAP allows each pair to be processed in isolation.  The thunk pairs are evaluated, stacked and converted back into a new thunk.  One now as a new array of thunks, but it comprises only half the number of terms.  Eventually the entire problem is reduced to a single thunk that contains the solution to the problem.  At worst the process requires 20 such steps.  That may require significant computational effort, but is not as bad as REDUCE/VSTACK which may require 100 000s such steps.

    The final formula uses MAPλ.  The calculation is the same as 3. (above) but the process of forming and stacking thunks is hidden from the user.  This helper function does some pre-processing of the user-specified function before passing it in modified form to the in-build MAP helper function.  Each time the modified function is applied, it first applies the original user-function and then converts the result to a thunk.  Once MAPλ has run through the recursion it will have generated an array of thunks.  As before, EVALTHUNKARRλ will evaluate each thunk within the array and stack the results appropriately.

    Developing this little lot was itself an iterative process that involved stops for the fried braincells to recuperate, but the aim was always to hide the complexity from sight within helper functions that can be used without further examination .  The one thing that JoeMcDaid could do to help is to create a native version of EVALTHUNKARRλ (or perhaps it would be better named just STACK) that does not rely upon recursive helper functions but instead simply iterates its way through the stacking process from one to the array size.

    Note: in the extreme, result arrays might be far more than simple 2D heterogeneous arrays.  It should be possible to create an array of complete financial or engineering models, comprising headers, white space and multiple blocks of data.  All that is required is that the result fits on one sheet since dynamic arrays do not spill from sheet to sheet.  There may be other resource problems that prevent such solutions but I have no way of telling (so far, so good)!

    = LET(
        k, SEQUENCE(n),
        triangle, MAP(k, LAMBDA(a, THUNK(EXPAND(0,,a,0)))),
        EVALTHUNKARRλ(triangle)
      )
    = MAPλ(SEQUENCE(n), LAMBDA(a, EXPAND(0,,a,0)))

     

      • PeterBartholomew1's avatar
        PeterBartholomew1
        Silver Contributor

        In a way it does work!  It returns an array of 5 functions, any one of which will generate some 0s if executed.  Picking one out

        INDEX(triangle,5,1)()

        will return the 5th row for example.

        What you cannot do is evaluate the array of functions directly

        = triangle()
        
        because TYPE(triangle) = 64 (array)
        whereas
        = TYPE(INDEX(triangle,5,1)) = 128 (function / rich data type)

        and can be evaluated

  • data_junky's avatar
    data_junky
    Copper Contributor

    Thank you PeterBartholomew1 Patrick2788 djclements .

    I'm admittedly new to the thunk thing but intrigued.  I though I might be able to get the intermediate paddding of zero using a thunk, but alas, getting errors.  Anyone have some pointers on how best to fix?




  • djclements's avatar
    djclements
    Bronze Contributor

    Another possibility, using NA() as the reset point for each new row, then DROP to remove the helper column(s):

    =DROP(SCAN(,HSTACK(IF({1,0},NA(),array1),1+array2),LAMBDA(a,v,IFNA(a,1)*v)),,2)*ISNUMBER(array2)

    See attached...

  • If it is the ragged array of rates that is important to you, it is possible to select each row as a range object and use TRIMRANGE to reduce it to size.

    = LET(
        valuesϑ, THUNK(values),
        rateϑ,   BYROW(rate, LAMBDA(row, THUNK(TRIMRANGE(row)))),
        resultϑ, MAP(values, rateϑ, LAMBDA(v,r, THUNK(SCAN(v, 1+r(), PRODUCT)))),
        EVALTHUNKARRλ(resultϑ)
      )

    The catch is that the natural stacking of the smaller dimensioned results is to the left of the result array.

    Again, this relies upon thunks for referencing arrays of arrays.

     

  • Patrick2788's avatar
    Patrick2788
    Silver Contributor

    My solution utilizes SCAN within MAKEARRAY (the black sheep of Lambda helper functions!).

    =LET(
        h, ROWS(array2),
        w, COLUMNS(array2),
        UpperPyrScan, LAMBDA(i, j,
            LET(
                upper_pyr, j >= i,
                initial, INDEX(array1, i),
                arr, CHOOSEROWS(array2, i),
                acc, SCAN(initial, arr, LAMBDA(a, v, a * (1 + v))),
                IF(upper_pyr, INDEX(acc, , j), 0)
            )
        ),
        MAKEARRAY(h, w, UpperPyrScan)
    )

     

    • PeterBartholomew1's avatar
      PeterBartholomew1
      Silver Contributor

      Hi Patrick

      You got away without thunks which may well the cause of fear and distress even for advanced Excel users!

      I think the downside may be that your black sheep revisits the SCAN feed for each column j evaluation?

      I am tending to use thunks ever more as the means to pass references to arrays of arrays [or even arrays of arrays of arrays ...!]

      • Patrick2788's avatar
        Patrick2788
        Silver Contributor

        Yes, indeed, SCAN is being run quite often with my approach. I had considered using a mesh grid (to borrow a Python term) of i-j coordinates and going at it without MAKEARRAY, but the result wouldn't be much different.  I think other ways to solve this would be with a recursive SCAN but if the results are to be stacked, it would not be all that different than a typical REDUCE/VSTACK combo, in my opinion.

         

        In terms of performance, I wonder how often data_junky encounters these upper triangular arrangements and how large the data sets are? If the data is mostly rows that have been summarized (not in the 10s of thousands of rows), then maybe the best solution is the one that makes the most sense to the author of the workbook.

  • The are at least two problems here.  The first is how to apply SCAN to multiple rows as an array operation.  My solution is to reference the rates array column by column using thunks.  This means the results will be returned as a row of thunks which need to be evaluated and stacked (I have written a utility function EVALTHUNKARRλ for that).

    The other problem is how to deal with ragged array problems.  In this case I was able to duck the problem because interpreting blanks as escalate by 0% gives the required numerical values.  Masking the unwanted values provides the result.

    = LET(
        PRODUCTλ, LAMBDA(accϑ,fϑ, THUNK(accϑ() * fϑ())),
        valuesϑ,  THUNK(values),
        factorsϑ, BYCOL(1 + rate, THUNK),
        resultϑ,  SCAN(valuesϑ, factorsϑ, PRODUCTλ),
        result,   EVALTHUNKARRλ(resultϑ),
        IF(ISNUMBER(rate), result, "")
      )

    Even as it is, following it is not going to be straightforward!

     

    A version of Excel MAP helper function that will return an array of arrays

     

Resources