SOLVED

Alternative formulas for expanding a column of row thunks.

Silver Contributor

image.png

The two options that I tried are

Expandλ = LAMBDA(ϑ,
    LET(
        n₁, ROWS(ϑ),
        n₂, MAX(MAP(ϑ,Columnsλ)),
        MAKEARRAY(n₁,n₂,
            LAMBDA(i₁,i₂,
                INDEX(EXPAND(INDEX(ϑ,i₁,1)(),1,n₂,""),i₂)
            )
        )
    )
);

which expands the array thunk, ϑ, by using the MAKEARRAY helper function, first to select a specific thunk from the array then, once the row array is extracted, to use a further INDEX to pick a single value for output,

 

The second option is

Expand2λ = LAMBDA(arrϑ,
    IFERROR(
        DROP(
            REDUCE(0,arrϑ,
                LAMBDA(result,aϑ,
                    VSTACK(result,aϑ())
                )
            ), 
        1),
    "")
);

 which uses REDUCE to run through the array thunk, expanding each in turn and appending it the growing array formed of prior rows.

 

As further background, the relevance of the problem is that the array thunk offers an escape from the "Nested Arrays are not Supported" problem which makes life pretty difficult when the problem specifies a nested array as its solution.  The problem I have used to generate the array thunk is based on a random array, and the objective is to filter out duplicates and sort row by row.

 

Do you have a preference? To be honest, I am not totally enamoured with either solution (though I keep coming back to them) and I have doubts as to their performance as the problem size increases.

10 Replies
best response confirmed by Peter Bartholomew (Silver Contributor)
Solution

@Peter Bartholomew 

Second variant is much faster. 200x90 array

image.png

 

Second variant gives approximately the same time on 1400x900 array

image.png

I tested on =SEQUENCE( n, m )  array and on slow PC.

 

First variant works on per-element basis, second one with arrays loaded into memory as one object. My guess.

@Sergei Baklan 

Thanks for that.  It confirms my observations.

image.png

Apart from being slower, the first solution is growing as O(n²), so even at a couple of hundred rows it is passing into the unusable zone.  The REDUCE and VSTACK should be good for over 1000 rows.  Sad, because I liked the first solution.  

 

I think that removing the nested arrays constraint should be rising to the top of the 'to-do' list.  If not universally through Excel, at least for BYROW and BYCOL which should be able to stack arrays just like VSTACK and HSTACK.

Is the Excel team considering lifting this constraint in the future?

@Peter Bartholomew 

In addition I tried to compare your second variant (sortRowsA below) with simple by row number iteration:

fnRow=LAMBDA( x, SORT( UNIQUE(x,1),,,1) ); 

sortRowsA=LAMBDA( array, fn,
    LET(
        fakeVector, LAMBDA( vector, LAMBDA(vector) ),
        setRow, LAMBDA(x, fakeVector(fn(x))),
        set2D,  BYROW( array, setRow  ),
    DROP( 
        REDUCE( 1 , set2D, LAMBDA(a, v, VSTACK(a, v()) )
    ), 1)
));

sortRowsB=LAMBDA(array, fn,
REDUCE(
    "",
    SEQUENCE(ROWS(array)),
    LAMBDA(a, k,
        IF(
            SEQUENCE(k) = k,
            fn( INDEX(array, k, ) ),
            a
        )   )  )
);

On the same array average time of calculation

varA - 773 ms

varB - 1303 ms (i.e. two times slowly)

 

To compare, Solution 1 on the same array gives 57070 ms (or 57 sec) in average. That is huge difference.

 

@Patrick2788 

I don't think so. That means new calc engine for Excel. Build expanding into existing functions means to make it universal, i.e. workable in all possible scenarios and on different platforms. I could be wrong, but that is significant investments.

@Sergei Baklan 

I have a suspicion that 'lazy calculation' (that seems to be the case for Thunks) causes the search and filter operations to be repeated for each evaluation along the row.  It is easy to visualise the thunk as containing the values it wraps, but the reality appears to be that it contains the formulas that allow the values to be recalculated as needed.  If the calculation is computationally expensive, one can easily get to trouble timewise!

@Patrick2788 

To underline @Sergei Baklan reply the last information I had came from @JoeMcDaid last September.

It was the first response to my discussion

A statement of truth or a wasted opportunity. Lambda helper functions reluctant to return 2D arrays....

When dynamic arrays appeared I was suddenly set free from the rigidity of CSE (many of my workbooks comprised nothing but CSE formulae are re-dimensioning used to be a nightmare.  I rapidly found the lack of array breakup made accumulation difficult (corkscrews in finance) and having to use MMULT for row or column sums was painful.  Now I have SCAN and BYCOL, I should be happy, but now the 'nested array problem' keep on rearing its head!  Thunks do provide some sort of answer, but not one I would care to cover in a basic training course!

 

I think I would settle for BYROW and BYCOL acting as wrappers in which nested arrays were supported.  With a but of luck that removes issues of backward compatibility because Excel 2016 does not support the functions anyway.

 

I've only read through page 1 of that thread but must say your ideal for GLOBAL is exciting. I hope someone was taking notes.

@Patrick2788 

I think back in the day of the XLM macro language quite complicated definitions could be made on macro sheets.  My thought here was that, since formula scoped names can now be defined using LET, why not extend that to sheet or global names?

 

To work on the definition one would select the cell, which would automatically load the formula into the AFE (just as it does now with the formula bar).  Once edited, it would be back visible on the sheet rather than being hidden away behind defined Names interface.  I wouldn't like to guarantee that all users even know that the Name 'environment' even exists, so to have critical parts of the calculation hidden away in a as 'code' is not ideal.

Because I have doubts about the efficiency of forming and then expanding thunk arrays, I have tried to reformulate an accumulation problem so that the 2D results array grows throughout the calculation, by using REDUCE and HSTACK.  I have yet to perform any timing runs to see whether the problem of O(n²) calculation time is reduced.

 

The other characteristic of the test problems that may well confuse, is that the Lambda functions that perform the core calculation are passed into the formula as a parameter.  Radically different calculations can thus be conducted within the same formula/code framework.

 

To date, most of my effort has been committed to "Will this work?".  "Will it work efficiently as the problem size grows?" is entirely another matter.

1 best response

Accepted Solutions
best response confirmed by Peter Bartholomew (Silver Contributor)
Solution

@Peter Bartholomew 

Second variant is much faster. 200x90 array

image.png

 

Second variant gives approximately the same time on 1400x900 array

image.png

I tested on =SEQUENCE( n, m )  array and on slow PC.

 

First variant works on per-element basis, second one with arrays loaded into memory as one object. My guess.

View solution in original post