Forum Discussion

PeterBartholomew1's avatar
PeterBartholomew1
Silver Contributor
Jan 18, 2025

What do you think of thunks?

OK, so the most likely response by far is going to be "I don't".

 

However, I tried one of Omid Motamedisedeh's regular challenges and found it a suitable problem for exploring some of the lesser known byways of modern Excel.  The challenge is to pick out locally maximum values from a rolling range.

What I did was to write a function that used MAP to select one cell at a time, using DROP to remove the  initial cells and TAKE to return a range of 5 cells with the active cell in the middle.  The direct route to solving the stated problem would be to calculate the maximum value within each range immediately, but I was interested in the more general problem of "could I return the array of ranges in a form that would support further analysis?"  As shown, the following formula

ROLLINGRANGEλ
= LAMBDA(values, n,
    LET(
        rows, SEQUENCE(ROWS(values)),
        MAP(rows,
            LAMBDA(k, 
                LET(
                    m, QUOTIENT(n, 2),
                    rng, TAKE(DROP(values, k - m - 1), MIN(k + m, n)),
                    rng
                )
            )
        )
    )
  );

gives and array of ranges error, but simply by enclosing the 'rng' variable within a further LAMBDA 

...
                LET(
                    m, QUOTIENT(n, 2),
                    rng, TAKE(DROP(values, k - m - 1), MIN(k + m, n)),
                    LAMBDA(rng)
                 )

will cause Excel to return an array of functions, each one of which would return a range if evaluated.

In the attached workbook, a number of formulae are based upon this array of functions

= ROLLINGRANGEλ(dataValues, 5)

= MAP(ROLLINGRANGEλ(dataValues, 5), LAMBDA(ϑ, ROWS(ϑ())))

= MAP(ROLLINGRANGEλ(dataValues, 5), LAMBDA(ϑ, ISREF(ϑ())))

= MAP(ROLLINGRANGEλ(dataValues, 5), LAMBDA(ϑ, AVERAGE(ϑ())))

= MAP(ROLLINGRANGEλ(dataValues, 5), dataValues, LAMBDA(ϑ,v, MAX(ϑ())))

= LET(
    rollingMax, MAP(ROLLINGRANGEλ(dataValues, 5), LAMBDA(ϑ, MAX(ϑ()))),
    FILTER(Data, rollingMax=dataValues)
  )

The first simply returns #CALC! errors on the worksheet which is the normal response to a Lambda function defined on the grid.  The second formulas uses ROWS to show that the ranges are not all the same size, the third shows the returned objects to be range references and not simply arrays of numbers, the forth is a rolling average while the fifth is a rolling MAX. 

The final formula returns the solution to the problem, being a filtered list.

The purpose of this post is to demonstrate that Excel, which started out as a 'simple' spreadsheet program, now contains a very different programming environment that shares a function library and uses the grid for input/output but, other than that, has very little in common with 'normal' spreadsheet practice!

A related survey can be found at

https://www.linkedin.com/feed/update/urn:li:activity:7285432559902068736/

  • This thread is mainly to illustrate technique, not to solve concrete task. However. Perhaps I missed something, but all solutions take 2 records back and 2 records forward for the range. Not dates. What if we take the range for dates (assuming they are not sequential and some could be missed), i.e. [Date] -2, not two records back ?  

    • djclements's avatar
      djclements
      Bronze Contributor

      Introducing new parameters to a problem often means considering a new approach. For example:

      =FILTER(Data, MAXIFS(Data[Value], Data[Date], ">=" & Data[Date] - 2, Data[Date], "<=" & Data[Date] + 2) = Data[Value])

      However, I do respect PeterBartholomew1 for his quest to find a generalized approach for any given situation. My answer to the question, "What do you think of thunks?", remains the same: I think they're great for some situations and overkill for others. I would highly encourage anyone who's taken an interest in this sort of thing to explore the concepts that Peter has shared, but to also explore alternative methods as well. The more you know, the better your chances will become for finding an appropriate solution to any problem. ;)

    • PeterBartholomew1's avatar
      PeterBartholomew1
      Silver Contributor

      I have my prejudices, but I would expect my approach to become stronger.  If, say, XLOOKUP were used to identify first and last cells in each range, then the range ":" operator would return a variable length range for each mapped value.  I could put the search criterial within a Lambda function to modularise the problem and then return the ranges within thunks (ranges, especially of varying size would throw an error within MAP).

      I can then perform the calculations on the individual time slices of the data table with a second application of MAP (or, better, a custom version of MAP that expands the thunks automatically before passing the range to the function specified by the formula

      MAPϑ
      = LAMBDA(ϑ, FNλ, 
          MAP(ϑ, LAMBDA(ϑ, FNλ(ϑ())))
      );

      The strategy would be to break the calculation into a hierarchy of Lambda functions, each one with a well defined task.  In essence, I propose to use thunks as a 'pass by reference' strategy, as opposed to 'pass by value'.

      • m_tarler's avatar
        m_tarler
        Bronze Contributor

        Isn't this statement the 'essence' of Thunks?

        In essence, I propose to use thunks as a 'pass by reference' strategy, as opposed to 'pass by value'.

    • m_tarler's avatar
      m_tarler
      Bronze Contributor

      I would think Daniel's approach could be easily adapted to that:

      =FILTER(C3:D25,MAP(C3:C25,LAMBDA(x,AND(XLOOKUP(x,C3:C25,D3:D25)>N(FILTER(D3:D25,(x-2<C3:C25)*(x+2>C3:C25),0)))))))

  • Patrick2788's avatar
    Patrick2788
    Silver Contributor

    This is a fun challenge because there's several ways to solve this one. ROLLINGRANGEλ is subtle and efficient! Took me a few minutes to see where you hid the thunking.  One might argue this is a "black box" but this isn't the type of function most would be tinkering with to get into trouble.  I tested the rolling max at 50,000 and 100,000 elements and the calc time was very fast.

    My solutions are going to seem pedestrian in comparison:

    =LET(
        n, 2,
        i, SEQUENCE(ROWS(dataValues)),
        j, SEQUENCE(, n ^ 2 + 1, -n, 1),
        M, INDEX(dataValues, i + j),
        keep, BYROW(
            M,
            LAMBDA(each_row,
                LET(val, INDEX(each_row, , n + 1), max_val, MAX(TOROW(each_row, 2)), val = max_val)
            )
        ),
        FILTER(Data, keep)
    )

    My novice solution with pandas:

    #Set interval and day offset.
    n = xl("interval") ** 2 + 1
    day_offset = f"{n}D" #Converts to '5D'
    
    #Set table as data frame and set 'Date' column as index.
    df = pd.DataFrame(xl("Data[#All]", headers=True))
    df['Date'] = pd.to_datetime(df['Date'])
    df.set_index('Date', inplace=True)
    
    # Calculate rolling max for 'n' days before and after each day (designed as 'center').
    rolling_max = df.rolling(day_offset, min_periods=1, center=True).max()
    rolling_max

     

  • m_tarler's avatar
    m_tarler
    Bronze Contributor

    So I really want to learn Thunks better but still in the old mindset I guess.  For this problem my solution for excel is using the MMULT() as I find that Excel tends to be most efficient with matices over looping.  As such, my solution is:

    =LET(in_vals,dataValues, in_roll_width, N1,
    expanded_vals,VSTACK(EXPAND(0,in_roll_width,,0),in_vals,EXPAND(0,in_roll_width,,0)),
    _count,SEQUENCE(ROWS(expanded_vals)),
    _greater, expanded_vals>TRANSPOSE(expanded_vals),
    _diag, ABS(_count - TRANSPOSE(_count))<=in_roll_width,
    max_locations, --(DROP(DROP(MMULT(_greater*_diag,SEQUENCE(ROWS(expanded_vals),,1,0)),in_roll_width),-in_roll_width)=2*in_roll_width),
    out, FILTER(Data,max_locations),
    out)

    so basically creating a grid of which values are > than others then multiply by grid of the moving window, then using MMULT to do a sum by row and finally check if that sum indicates that value is > than the X values on each side.

    • PeterBartholomew1's avatar
      PeterBartholomew1
      Silver Contributor

      Hi Matt

      The formula works fine.  It is something of a curiosity, given that during my career I probably wrote more matrix formulas than I did natural language paragraphs, that I don't use matrix multiplication more now.  It is an extremely fast function even working well with1000x1000 arrays where it still kept up with a 1000 step SCAN.  Mind you, explaining matrix multiplication to a lay audience can almost be as bad as explaining a thunk!

      Rather perversely, I used a data dictionary comprising thunks to scroll though the variables of your LET function and examine the matrices.

      =LET(
          in_vals,       dataValues,
          in_roll_width, I1,
          expanded_vals, VSTACK(
              EXPAND(0, in_roll_width, , 0),
              in_vals,
              EXPAND(0, in_roll_width, , 0)
          ),
          _count,   SEQUENCE(ROWS(expanded_vals)),
          _greater, expanded_vals > TRANSPOSE(expanded_vals),
          _diag,    ABS(_count - TRANSPOSE(_count)) <= in_roll_width,
          max_locations, --(
              DROP(
                  DROP(
                      MMULT(_greater * _diag, SEQUENCE(ROWS(expanded_vals), , 1, 0)),
                      in_roll_width
                  ),
                  -in_roll_width
              ) = 2 * in_roll_width
          ),
          out, FILTER(Data, max_locations),
          dict, DB.DICTIONARYλ(
              "expanded_vals", expanded_vals,
              "count",        _count,
              "greater",      _greater,
              "diag",         _diag,
              "maxLocations",  max_locations,
              "out",           out
          ),
          DB.GETVλ(dict)(idx)
      )

      The 'dict' object can be returned as the result of the LET, or here I used a spinner to do the scrolling.

      I have been trying to build generic capability but it is only too easy to miss the specific.  Surveying the horizon is one thing, but it is not of much value if I trip over my shoelaces!

      • PeterBartholomew1's avatar
        PeterBartholomew1
        Silver Contributor

        Something that I forgot to do:  I added the data dictionary to help me understand Matt's formula (these days I often have problems understanding my own formulas, never mind someone else's!)  I showed the added code, but perhaps it would have been of greater help to include the version of the workbook in which the LET data dictionary has been inserted so that others can decide whether it is useful or not.

        My envisaged use case is that the dictionary would build as the LAMBDA/LET module is developed as a development/debug aid.  Once one is sure of the functionality of the module, most of the objects would be removed from the reporting.  Only the objects to be returned by the module would be left in.

        The advantage of this use of the data dictionary is that it allows multiple data objects to be returned to the calling module, so permitting code to be modularised in a more rational structure.

        All: I would love to know your thoughts on the idea of the data dictionary.

  • djclements's avatar
    djclements
    Bronze Contributor

    Good day Peter,

    As always, I appreciate the demonstration and hope others do too. Personally, I find thunks to be quite useful for highly complex scenarios, as they can be an elegant way of handling nested arrays; however, it's not necessarily the first method I would reach for in any given situation. Honestly, the number of scenarios I've come across where thunks were required, or where thunks were the most efficient solution, have been few and far between, but that hasn't stopped me from experimenting with them in the least. ;)

    For this particular challenge, there's a couple of methods that come to mind. For starters, MAP with OFFSET would work, although it seems like a bit of a cheat considering the table data range begins on row 3, leaving exactly 2 rows above to spare (if there weren't enough rows above the data range to complete the task, this wouldn't work):

    =FILTER(Data,MAP(dataValues,OFFSET(dataValues,-2,0),OFFSET(dataValues,2,0),LAMBDA(v,b,a,MAX(b:a)=v)))

    Another option could be BYROW with INDEX and a SEQUENCE matrix:

    =FILTER(Data,BYROW(INDEX(VSTACK(0,0,dataValues,0,0),SEQUENCE(ROWS(Data),,0)+SEQUENCE(,5)),MAX)=dataValues)

    On the topic of thunks, though, you may find this interesting... when it comes to returning the results for multiple functions in a single array (e.g. ROWS, ISREF, MAX, etc.), you can do so with a single MAP function:

    =LET(
        arrϑ, ROLLINGRANGEλ(dataValues, 5),
        func, HSTACK(ISREF,ROWS,AVERAGE,MAX),
        MAP(IFNA(func,arrϑ),IFNA(arrϑ,func),LAMBDA(fn,ϑ,fn(ϑ())))
    )

    Plus, the extra column on the end in your sample file:

    =LET(
        arrϑ, ROLLINGRANGEλ(dataValues, 5),
        func, HSTACK(ISREF,ROWS,AVERAGE,MAX),
        calc, MAP(IFNA(func,arrϑ),IFNA(arrϑ,func),LAMBDA(fn,ϑ,fn(ϑ()))),
        HSTACK(calc,IF(TAKE(calc,,-1)=dataValues,dataValues,""))
    )

    Fun, fun, fun! Have a good one!

    • PeterBartholomew1's avatar
      PeterBartholomew1
      Silver Contributor

      Hi David

      You raise some really interesting points.

      "Honestly, the number of scenarios I've come across where thunks were required, or where thunks were the most efficient solution, have been few and far between"

      My thoughts are that, as with any computational framework, there tend to be a spectrum of methods from methods of choice though to methods of last resort.  That is not new.  For me OFFSET tends to be a method of last resort because of its volatility, yet I like the function's syntax and the order of calculation when it appears in a formula allows it to achieve results that would otherwise be problematic.

      The same might be said for thunks.  They are something of an untested curiosity, yet they provide a uniform approach to dealing with arrays of arrays (that is normal spreadsheet solutions).  The question is "where do they sit on the spectrum of methods?"  Some of the properties of thunks are not new but, rather, simply inherited from the Defined Names that have been around from 1992 or earlier.

      For example, one could define a Name 'maxOccurrences' to be 

      = MAX(COUNTIFS(array, array))

      As it stands, it is simply the definition of a computationally intensive formula.  It is not evaluated until it is used by a grid formula.  The catch is that the formula is re-evaluated from scratch every time it is referenced.  Something similar goes for the thunked version:

      = maxOccurrencesλ
      
      where
      
      maxOccurrencesλ = LAMBDA(MAX(COUNTIFS(array#, array#)))

      The difference here is that merely being referenced by a grid formula is not sufficient to force its evaluation.  That only happens when it is provided with its parameter string

      = maxOccurrencesλ()

      Where I am going with this somewhat rambling discourse is that thunks and, in particular, arrays of thunks could be the method of choice for dealing with problems involving arrays of arrays with problem-specific approaches such as MAKEARRAY being the methods of last resort.

      Before signing off, I ought to confess to a weakness in the argument.  At present there is no inbuilt function that will persist that calculation beyond the in-memory LAMBDA scope in which it was created, by outputting the result to the grid.  The following function does the job. but one shouldn't be dipping into a user-provided Lambda to achieve core functionality!

      p.s. Some of your recommendations found their way into my GIST.  Thank you!

      /*  FUNCTION NAME:  EVALTHUNKARRλ
          DESCRIPTION:    Called by modified helper functions to stack the contents of any array of thunks  */
      /*  REVISIONS:      Date            Developer           Description
                          14 Oct 2024     Peter Bartholomew   Extracted for MAPλ as a separate module  
      */
      EVALTHUNKARRλ = LAMBDA(thunkArrayϑ,
          LET(
              m, ROWS(thunkArrayϑ),
              n, COLUMNS(thunkArrayϑ),
              h, SEQUENCE(CEILING.MATH(LOG(n,2),1)),
              recombinedRowsϑ, IF(
                  n > 1,
                  BYROW(thunkArrayϑ, LAMBDA(thunkRowϑ, @REDUCE(thunkRowϑ, h, JOINPAIRSλ(1)))),
                  thunkArrayϑ
              ),
              k, SEQUENCE(CEILING.MATH(LOG(m,2),1)),
              recombinedϑ, IF(
                  m > 1, 
                  REDUCE(recombinedRowsϑ, k, JOINPAIRSλ(0)), 
                  recombinedRowsϑ
              ),
              result, IFNA((@recombinedϑ)(), ""),
              result
          )
      );
      
      /*  FUNCTION NAME:  JOINPAIRSλ
          DESCRIPTION:    Called by EVALTHUNKARRλ to stack the contents of thunks pairwise  */
      /*  REVISIONS:      Date            Developer           Description
                          09 May 2024     Peter Bartholomew   Original Development  
                          16 May 2024     Peter Bartholomew   Test for unpaired thunk in binary tree
                          30 Aug 2024     Peter Bartholomew   Modify to stack horizontally or vertically
                          15 Nov 2024     David Clements      Efficiency improvements / simplification
      */
      
      JOINPAIRSλ = LAMBDA([by_col], LAMBDA(thunkArray, [k],
          LET(
              STACKλ,    IF(by_col, HSTACK, VSTACK),
              alternate, WRAPROWS(thunkArray, 2),
              MAP(
                  TAKE(alternate, , 1),
                  DROP(alternate, , 1),
                  LAMBDA(ϑ₁, ϑ₂,
                      LET(
                          x₁, ϑ₁(), 
                          x₂, ϑ₂(), 
                          v,  IF(TYPE(ϑ₂)=16, x₁, STACKλ(x₁, x₂)), 
                          LAMBDA(v)
                      )
                  )
              )
          )
      ));

      A version of Excel MAP helper function that will return an array of arrays

      • djclements's avatar
        djclements
        Bronze Contributor

        Hi Peter,

        Just to clarify: my comment about thunks rarely being "the most efficient solution" shouldn't be misconstrued as being "inefficient". I've actually found thunks to be very efficient in many, if not all cases. Your custom LAMBDA functions are by far the best generalized approach I've seen for handling nested arrays, completely destroying the standard DROP-REDUCE-STACK method.

        Having said that, whenever an array manipulation/reshaping method exists for a particular problem, thunks tend to come in second or third when timed over a larger dataset (but not by much). I believe this to be, in part, due to the fact that thunks require at least two passes over an array: once to generate the thunks, and again to call and return the results.

        To give them a fair shake, though, it may be prudent to rate thunks on a scale from 1 to 10 in different categories (e.g. efficiency, flexibility, adaptability, difficulty, etc.). I might give them an 8 or 9 in efficiency, but definitely a 10 in flexibility and adaptability. Difficulty level would depend on the person, but for the average user, it's probably a 10.

        I admit, the alternative methods I often share are anything but generalized, and it's unrealistic to think the average user would have the same interest in exploring and remembering each one, especially considering many of them would also likely be ranked high in difficulty. For me, though, the challenge is half the fun! Where the average user may find comfort in an easily adaptable, generalized solution, I think I would find monotony, lol.

        Regarding the OFFSET function, I totally agree with you. I only included it as an option because it actually proved to be the fastest solution when tested with 100K rows of data. I had started with an INDEX equivalent first:

        =LET(d,Ranges!D:D,r,ROW(Data),m,MIN(r),n,MAX(r),FILTER(Data,MAP(dataValues,INDEX(d,m-2):INDEX(d,n-2),INDEX(d,m+2):INDEX(d,n+2),LAMBDA(v,b,a,MAX(b:a)=v))))

        But found it to be a touch faster when converted to OFFSET. When I saw Daniel's variation, I immediately realized that I hadn't reduced my own version down to its simplest form, which would've been:

        =FILTER(Data,MAP(dataValues,LAMBDA(v,MAX(OFFSET(v,-2,,5))=v)))

        Which also means the original version could've been simplified as:

        =FILTER(Data,MAP(dataValues,LAMBDA(v,MAX(INDEX(Ranges!D:D,ROW(v)-2):INDEX(Ranges!D:D,ROW(v)+2))=v)))

        Not a big deal, though. If this were a real-world scenario, I'd likely sacrifice the 40 or so milliseconds and go with the second formula I shared (BYROW over an INDEX-SEQUENCE+SEQUENCE array) to avoid the OFFSET function altogether.

        Cheers!

    • PeterBartholomew1's avatar
      PeterBartholomew1
      Silver Contributor

      Hi David

      I was aware that for the specific challenge I could avoid thunks altogether but my focus was more to answer the question "can thunks provide a unified approach to dealing with a number of array problems as one moves from scalar, to array, to array of arrays, ... ?"  Ultimately, it could be a case of "why restrict yourself to one financial model?" when you could map them to a changing set of assumptions and create an array of models, rather like data tables on steroids!

      Along the way, it would be nice if a single technique were to provide an efficient way of addressing headaches such as handling arrays of ranges, arrays of arrays, ragged arrays, nested arrays, cartesian products of 2D arrays.  The solution may be something other than thunks, but at the moment they are looking promising and it is certainly better than the dreaded #CALC! error that tells you that the solution to your problem is not handled by Excel.

      The think that really caught my attention was your idea of applying an array of functions to each data set as a single operation.  Although I would have said the idea "could well be possible", that is not the same thing as adjusting one's thinking sufficiently to exploit the concept with any degree of confidence! 

      Well done.

       

  • peiyezhu's avatar
    peiyezhu
    Bronze Contributor

     

    sql:

    create temp table aa as 

    SELECT rowid old_rowid,*, max(`Value`) OVER (

      ORDER BY rowid ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING

    ) AS group_max FROM RollingMax;

    select * from aa;

    //list max value of each range BETWEEN 2 PRECEDING AND 2 FOLLOWING

    select Date,Value from aa where group_max=Value order by old_rowid;

    //filter Value equal group_max

     

     

     

    • PeterBartholomew1's avatar
      PeterBartholomew1
      Silver Contributor

      I have seen your posts on the use of sql for calculations far from the simple extraction, filtering and joining of text from tables that I associate with scope for sql.  What is not obvious to me is where the scripts are loaded, in particular within which app?

    • peiyezhu's avatar
      peiyezhu
      Bronze Contributor

       

      select Date,Value from (SELECT rowid old_rowid,*, max(`Value`) OVER (

        ORDER BY rowid ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING

      ) AS group_max FROM RollingMax) where group_max=Value order by old_rowid;

       

      2 lines merge to one row

  • It appears one can't edit the original post so I am replying to myself in order to update the workbook.

    The key difference is that, although I have demonstrated the results of 9 distinct, but related, formulas, there are only two formula cells.

    By bringing the calculation together within a single module, it became obvious that there was a lot of repetition, involving evaluating thunks and applying a in-built function to the result, functions that could be tidied up by writing a further Lambda function

    CALCλ
    = LAMBDA(values, n,
        LET(
            // Call ROLLINGRANGEλ to extract a rolling range of n cells from the range 'values'
            arrayϑ, ROLLINGRANGEλ(values, n),
            // Expand the thunks to give a 2D range for demonstrtion purposes
            array, TRANSPOSE(EVALTHUNKARRλ(TOROW(arrayϑ))),
            // demonstrate that the thunk array arrayϑ contains ranges and not simply arrays
            isref?, MAPϑ(arrayϑ, ISREF),
            // demonstrate that the ranges are of differing dimensions
            rows, MAPϑ(arrayϑ, ROWS),
            // calculate the average value over each range to give a rolling average
            avrg, MAPϑ(arrayϑ, AVERAGE),
            // calculate the maximum value over each range to give a rolling maximum
            rmax, MAPϑ(arrayϑ, MAX),
            // compare each value to its rolling maximum
            isMax?, values = rmax,
            // Stack reults for presentation
            HSTACK(date, array, isref?, rows, avrg, rmax, isMax?)
        )
    );

    which now use the function in order to combining the thunk evaluation with a sequence of aggregations

    MAPϑ
    = LAMBDA(arrayϑ, FNλ,
        MAP(arrayϑ, LAMBDA(ϑ, FNλ(ϑ( ))))
    );

    I realise that this style of programming is not going mainstream anytime soon, but the fact that it is possible and appears to work well should be of interest.

Resources