Recursive Lambda: Unpacking LET function

TTodorov
Copper Contributor
May 08, 2024
djclements
Formula works just PERFECT! 🙂
Many thanks!
djclements
Silver Contributor
May 03, 2024
TTodorov Try the updated workbook. The only changes I made to the formula were the "results" (sorted by Object, Position and Priority, then filtered to remove duplicate "keys").
SampleTable5.xlsx41 KB
TTodorov
Copper Contributor
May 03, 2024
djclementsMany thanks, formula works perfect, seems to be fastest one.
Would be possible to "remove duplicates"?
I mean, now formula duplicates positions by objects. If I create sorted table by positions and objects, I see duplicates:
In case of duplicates, I want to keep just line with priority = 0
rachel
Iron Contributor
Apr 21, 2024
PeterBartholomew1
Thanks a lot for demonstrating THUNK!
I played around with your file and figured that THUNK cannot do its magic without INDEX.
So I write a "CartesianProductλ_New" and tweak your VSTACK(index_1, index2, index3) into an array formula INDEX(listsϑ, {1,2,3}):

It is next to useless but I figured you might be interested.
CartesianProduct_index+array formula.xlsx19 KB

Silver Contributor

Apr 15, 2024

TTodorov Iterative methods, such as REDUCE with VSTACK or HSTACK, typically suffer from performance issues when the total number of iterations is in the thousands. As a general rule, it's best to avoid these methods whenever possible. Alternative methods may seem longer, but usually perform better overall.

If I've understood all of your instructions correctly, the following formula should produce the desired results, with no noticeable "lag" in the calculation time (based on the current size of the sample table); plus, it will continue to work even when adding new blank rows to the bottom of the table:

=LET(
    table; SampleTable; delim; ", ";
    lookup_table; SORT(ObjetsPositions; SEQUENCE(; COLUMNS(ObjetsPositions)));
    lookup1; CHOOSECOLS(lookup_table; 1);
    return1; CHOOSECOLS(lookup_table; 2);
    lookup2; UNIQUE(return1);
    return2; MAP(lookup2; LAMBDA(v;
        TEXTJOIN(delim;; FILTER(lookup1; return1 = v)))
    );
    array1; delim & CHOOSECOLS(table; 1) & delim;
    col_count1; (LEN(array1) - LEN(SUBSTITUTE(array1; delim; ))) / LEN(delim) - 1;
    col_nums1; SEQUENCE(; MAX(col_count1));
    row_nums1; TOCOL(IFS(col_count1 >= col_nums1; SEQUENCE(ROWS(array1))); 2);
    unpivot1; TOCOL(TEXTBEFORE(TEXTAFTER(array1; delim; col_nums1); delim); 2);
    change2; MAP(unpivot1; INDEX(table; row_nums1; 2); LAMBDA(a;b;
        IF(b = "For all"; XLOOKUP(--a; lookup2; return2; 0); b))
    );
    array2; delim & change2 & delim;
    col_count2; (LEN(array2) - LEN(SUBSTITUTE(array2; delim; ))) / LEN(delim) - 1;
    col_nums2; SEQUENCE(; MAX(col_count2));
    row_nums2; TOCOL(IFS(col_count2 >= col_nums2; SEQUENCE(ROWS(array2))); 2);
    tbl_rowId; INDEX(row_nums1; row_nums2);
    unpivot2; TOCOL(TEXTBEFORE(TEXTAFTER(array2; delim; col_nums2); delim); 2);
    change1; MAP(INDEX(unpivot1; row_nums2); unpivot2; INDEX(table; tbl_rowId; 3); LAMBDA(a;b;c;
        IF(a = "99"; LET(x; XLOOKUP(--b; lookup1; return1; 0;; -1); TEXT(IF(c = "PERS-0000"; MIN(x; 4); x); "@")); a))
    );
    results; HSTACK(
        change1;
        unpivot2;
        CHOOSEROWS(CHOOSECOLS(table; 3; 9); tbl_rowId)
    );
    VSTACK(
        CHOOSECOLS(SampleTable[#Headers]; 1; 2; 3; 9);
        UNIQUE(results)
    )
)

Note: the tbl_rowId variable allows you to pull the corresponding data from any column in the source table using either the INDEX or CHOOSEROWS function.

Please see attached...

SampleTable4.xlsx42 KB

SergeiBaklan

Diamond Contributor

Apr 15, 2024

TTodorov

I missed you need ID as well, slightly corrected previous formula

=LET(
  source, CHOOSECOLS(SampleTable, {3,1,2} ),
  sortPositions, SORT(ObjetsPositions, {1,2} ),

  pers, "PERS-0000",
  maxPers, 4,
  id, LAMBDA(line, INDEX(line,1,1)),
  pos, LAMBDA(line, INDEX(line,1,2)),
  obj, LAMBDA(line, INDEX(line,1,3)),

  splitter, LAMBDA(line,
    LET(
        a, TEXTSPLIT(pos(line),,", "),
        b, TEXTSPLIT(obj(line),, ", "),
        LAMBDA(
            LET(
                data,HSTACK(
                    TOCOL( IF(SEQUENCE(,ROWS(b) ),  a ) ),
                    TOCOL( TRANSPOSE( IF(SEQUENCE(,ROWS(a) ),  b ) ) )
                ),
                HSTACK(
                    IF( SEQUENCE(ROWS(data) ), id(line)),
                    data
                )
            )
        )
    )
),

unpacker,
LAMBDA(arrayOfThunks,
    LET(
        n, ROWS(arrayOfThunks),
        first, INDEX(arrayOfThunks, 1, 1)(),
        rest, INDEX(arrayOfThunks, SEQUENCE(n - 1, , 2), 1),
        IF(
            n = 1,
            first,
            REDUCE(first, rest, LAMBDA(a,v, VSTACK(a, v())))
        )
    )
),

getAll, LAMBDA(v,
    TEXTJOIN(", ",,
        FILTER(
            ObjetsPositions[Object],
            ObjetsPositions[Position] = VALUE(v)
        )
    )
),

rawSplit, unpacker( BYROW(source, splitter )),

setNine, LAMBDA(line,
    LAMBDA(
        LET(
            id, INDEX(line,1,1),
            pos, INDEX(line,1,2),
            obj, INDEX(line,1,3),
            maxPos, IF( TRIM(pos)="99",
                INDEX(
                    CHOOSECOLS(sortPositions,2),
                    XMATCH( VALUE(obj),
                        CHOOSECOLS(sortPositions,1),
                        0,
                        -1
                    )
                ), pos ),
            HSTACK(
                id,
                IF( id=pers,
                    MIN(pos, maxPers),
                    pos
                ),
            obj
        )
    ))
),
getNine, unpacker( BYROW(rawSplit, setNine )),
setAll, LAMBDA(line,
    LAMBDA(
        HSTACK( id(line), pos(line),
            IF(obj(line)="For all",
                getAll(pos(line)),
                obj(line)
            )
        )
    )
    ),

withAll, unpacker( BYROW(getNine, setAll ) ),
res, VSTACK(
    {"ID","Position","Object"},
    unpacker( BYROW(withAll, splitter ) )
),
UNIQUE(res)
)

Didn't catch which 4 columns you need to remove duplicates.

SampleTable.xlsx108 KB

TTodorov
Copper Contributor
Apr 14, 2024
Hi all,
PeterBartholomew1, SergeiBaklan I'm respected a lot from you knowledges in Excel, many thanks for your time and shared solutions.
rachelagain want to thank you for kind support with solutions.
djclementsthank you for suggestions and shared variants.

I started with simple task - to try to identify duplicates and prevent mistakes in future data transformations in Power Query. To do that, I realized that I need a list. That was my first challenge.
Seems easy, even for me, I created some kind of list. But realized, that I need not only splitting, but also replacing in same time.

And I realized that I have not enough knowledges to do that with formulas. I solve this challenge with Power Query, but I need result without need of refreshing.

I have data table like this:
Key columns are Objects, Positions, ID and Priority.
I also have table ObjetsPositions:

In data table, I need to handle these replacements:
1. Challenge "For all". This mean that I need to replace "For all" with all objects (from table ObjetsPositions) which have same positions (from data table).
2. Chanllenge "99". This mark "last" of positions. Seems easy, but something here, I realized that I need to get information from other two columns - ID and Priority.
a) ID = "PARK-0000". In this case, I need to replace 99 with last position for each Object.
b) ID = "PERS-0000". In this case, if the store have more than 4 positions, I need to replace 99 with 4. In other case, I need to replace 99 with last position for each Object.

Facing with this challenges, rachel helped me with solutions. Many thanks rachel! 🙂
First solutions works good with 2 key columns, but in with more columns, probably because of Excel limitation, there is and errors. After that rachel adapted it and with recursive Lambda in named range ListPositionsAndObjects:
=LAMBDA( data; LET( pairs; DROP( REDUCE( ""; SEQUENCE(ROWS(data); 1; 1; 1); LAMBDA( list; idx; VSTACK( list; TOCOL( LEFT(INDEX(data; idx; 3); 9) & "@" & TEXTSPLIT( INDEX(data; idx; 1); ", "; ; TRUE; 0; "" ) & "@" & TRANSPOSE( TEXTSPLIT( INDEX(data; idx; 2); ", "; ; TRUE; 0; "" ) ) & "@" & INDEX(data; idx; 5) ) ) ) ); 1; 0 ); DROP( REDUCE( ""; pairs; LAMBDA( list; pair; VSTACK( list; TEXTSPLIT(pair; "@") ) ) ); 1; 0 ) ) )
And after that with one more formula:
=LET( objects_by_position; ListPositionsAndObjects(SampleTable); ids; INDEX(objects_by_position; ; 1); positions; INDEX(objects_by_position; ; 2); objects; INDEX(objects_by_position; ; 3); replaced_positions; VALUE(IF(TRIM(positions) <> "99"; positions; IF(ids = "PERS-0000"; IF(MAXIFS(ObjetsPositions[Position]; ObjetsPositions[Object]; objects) < 4; MAXIFS(ObjetsPositions[Position]; ObjetsPositions[Object]; objects); "4"); MAXIFS(ObjetsPositions[Position]; ObjetsPositions[Object]; objects)))); replaced_objects; IF(objects <> "For all"; objects; MAP(replaced_positions; LAMBDA(x; TEXTJOIN(", "; TRUE; FILTER(ObjetsPositions[Object]; ObjetsPositions[Position] = x))))); ListPositionsAndObjects(HSTACK(replaced_positions; replaced_objects; ids)) )
I received desired list. But realized that for identifying duplicates I need to use at least 4 columns, not 3.

After that SergeiBaklan shared also solution (with 2 columns, without additional logic). Many thanks Sergei! 🙂

Seems that due to complexity of the task, finding duplicates with formulas also take time. I didn't measured to compare with result from Power Query, but calculations take time. 🙂

Will continue to try to find a way (with 4 and more columns), in any way! 🙂

Till then, as attached file you can find sample table with solutions from rachel and SergeiBaklan.

Again, want to thank you for sharing knowledges and solutions! 🙂

Best regards
Todor
SampleTable.xlsx98 KB
PeterBartholomew1
Silver Contributor
Apr 14, 2024
rachel
Sorry, I mis-read the requirement. To return two columns requires a slightly different approach. The one I chose differs slightly from the path taken by SergeiBaklan in that I assemble a grid over which I could broadcast the values. So, for a single cell list
ThunkedCartesianProductλ =LET( list₁, TEXTBEFORE(combinedLists, "|"), list₂, TEXTAFTER(combinedLists, "|"), arr₁, VALUE(TEXTSPLIT(list₁, , ",")), arr₂, VALUE(TEXTSPLIT(list₂, ",")), grid, SIGN(arr₁ + arr₂), THUNK(HSTACK( TOCOL(IF(grid, arr₁)), TOCOL(IF(grid, arr₂)) )) )
The worksheet formula is unaltered from that I show above
= LET( listsϑ, MAP(inputList, ThunkedCartesianProductλ), VSTACK( INDEX(listsϑ,1,1)(), INDEX(listsϑ,2,1)(), INDEX(listsϑ,3,1)() ) )
Both formulas could be a little more concise but, for me, that is not a priority.

As for thunks, they are Lambda functions that do not require arguments but return a specific array when invoked with a null parameter string. The function I use to create a thunk from an array is
THUNK(x) = LAMBDA(x, LAMBDA(x))
Excel will accept an array of thunks for later calculation whereas it throws a wobbly when faced with an array of arrays!
CartesianProduct.xlsx20 KB
rachel
Iron Contributor
Apr 14, 2024
Thanks for the wiki!
SergeiBaklan
Diamond Contributor
Apr 14, 2024
rachel

I guess PeterBartholomew1 uses

Thunk = lambda(x, lambda(x) )

Instead we may use directly LAMBDA( TOCOL(...) )

Just in case, thunks are explained here Thunk - Wikipedia

Forum Discussion

Recursive Lambda: Unpacking LET function

Resources