Forum Discussion
Recursive Lambda: Unpacking LET function
Yest another problem in which the solution is a nested array. Surely, a perfectly normal situation for a dynamic array calculation engine. The basic requirement is to allow the manipulation of multidimensional arrays held within a 2D grid (higher dimensions are represented by nesting). To fail here is rather like PowerPivot throwing an error if more than one dimension table is present! It undermines the purpose of the calculation.
Forming a cartesian product corresponding to any individual cell in the list is easy, combining them is ridiculously difficult!
"ThunkedCartesianProductλ"
= LET(
list₁, TEXTBEFORE(combinedLists, "|"),
list₂, TEXTAFTER(combinedLists, "|"),
arr₁, TEXTSPLIT(list₁, , ","),
arr₂, TEXTSPLIT(list₂, ","),
THUNK(TOCOL(TRIM(arr₁) & "-" & TRIM(arr₂)))
)The formula
= MAP(inputList, ThunkedCartesianProductλ)gives the result, but as an array of 3 thunks. Sure I can expand any array of thunks by recursive bisection, I could even do it by recursively combining blocks of 16. The point is that to do so is a pain and I shouldn't have to do it!
Since there were only 3 data cells one could obtain the result explicitly using
= LET(
listsϑ, MAP(inputList, ThunkedCartesianProductλ),
VSTACK(
INDEX(listsϑ,1,1)(),
INDEX(listsϑ,2,1)(),
INDEX(listsϑ,3,1)()
)
)but it wouldn't look so good for an array of a thousand thunks!
The expected results are actually in two columns, (not joined by "-").
Like this:
So for CartesianProduct, better use SergeiBaklan 's version, it removes TEXTJOIN TEXTBEFORE etc completely.
Also, what is "THUNK"?
=HSTACK(
TOCOL(IF(SEQUENCE(, ROWS(list2)), list1)),
TOCOL(TRANSPOSE(IF(SEQUENCE(, ROWS(list1)), list2)))
)
- TTodorovMay 08, 2024Copper Contributor
djclements
Formula works just PERFECT! 🙂
Many thanks!
- djclementsMay 03, 2024Silver Contributor
TTodorov Try the updated workbook. The only changes I made to the formula were the "results" (sorted by Object, Position and Priority, then filtered to remove duplicate "keys").
- TTodorovMay 03, 2024Copper Contributor
djclementsMany thanks, formula works perfect, seems to be fastest one.
Would be possible to "remove duplicates"?
I mean, now formula duplicates positions by objects. If I create sorted table by positions and objects, I see duplicates:In case of duplicates, I want to keep just line with priority = 0
- rachelApr 21, 2024Iron Contributor
Thanks a lot for demonstrating THUNK!
I played around with your file and figured that THUNK cannot do its magic without INDEX.
So I write a "CartesianProductλ_New" and tweak your VSTACK(index_1, index2, index3) into an array formula INDEX(listsϑ, {1,2,3}):
It is next to useless but I figured you might be interested.
- djclementsApr 15, 2024Silver Contributor
TTodorov Iterative methods, such as REDUCE with VSTACK or HSTACK, typically suffer from performance issues when the total number of iterations is in the thousands. As a general rule, it's best to avoid these methods whenever possible. Alternative methods may seem longer, but usually perform better overall.
If I've understood all of your instructions correctly, the following formula should produce the desired results, with no noticeable "lag" in the calculation time (based on the current size of the sample table); plus, it will continue to work even when adding new blank rows to the bottom of the table:
=LET( table; SampleTable; delim; ", "; lookup_table; SORT(ObjetsPositions; SEQUENCE(; COLUMNS(ObjetsPositions))); lookup1; CHOOSECOLS(lookup_table; 1); return1; CHOOSECOLS(lookup_table; 2); lookup2; UNIQUE(return1); return2; MAP(lookup2; LAMBDA(v; TEXTJOIN(delim;; FILTER(lookup1; return1 = v))) ); array1; delim & CHOOSECOLS(table; 1) & delim; col_count1; (LEN(array1) - LEN(SUBSTITUTE(array1; delim; ))) / LEN(delim) - 1; col_nums1; SEQUENCE(; MAX(col_count1)); row_nums1; TOCOL(IFS(col_count1 >= col_nums1; SEQUENCE(ROWS(array1))); 2); unpivot1; TOCOL(TEXTBEFORE(TEXTAFTER(array1; delim; col_nums1); delim); 2); change2; MAP(unpivot1; INDEX(table; row_nums1; 2); LAMBDA(a;b; IF(b = "For all"; XLOOKUP(--a; lookup2; return2; 0); b)) ); array2; delim & change2 & delim; col_count2; (LEN(array2) - LEN(SUBSTITUTE(array2; delim; ))) / LEN(delim) - 1; col_nums2; SEQUENCE(; MAX(col_count2)); row_nums2; TOCOL(IFS(col_count2 >= col_nums2; SEQUENCE(ROWS(array2))); 2); tbl_rowId; INDEX(row_nums1; row_nums2); unpivot2; TOCOL(TEXTBEFORE(TEXTAFTER(array2; delim; col_nums2); delim); 2); change1; MAP(INDEX(unpivot1; row_nums2); unpivot2; INDEX(table; tbl_rowId; 3); LAMBDA(a;b;c; IF(a = "99"; LET(x; XLOOKUP(--b; lookup1; return1; 0;; -1); TEXT(IF(c = "PERS-0000"; MIN(x; 4); x); "@")); a)) ); results; HSTACK( change1; unpivot2; CHOOSEROWS(CHOOSECOLS(table; 3; 9); tbl_rowId) ); VSTACK( CHOOSECOLS(SampleTable[#Headers]; 1; 2; 3; 9); UNIQUE(results) ) )Note: the tbl_rowId variable allows you to pull the corresponding data from any column in the source table using either the INDEX or CHOOSEROWS function.
Please see attached...
- SergeiBaklanApr 15, 2024Diamond Contributor
I missed you need ID as well, slightly corrected previous formula
=LET( source, CHOOSECOLS(SampleTable, {3,1,2} ), sortPositions, SORT(ObjetsPositions, {1,2} ), pers, "PERS-0000", maxPers, 4, id, LAMBDA(line, INDEX(line,1,1)), pos, LAMBDA(line, INDEX(line,1,2)), obj, LAMBDA(line, INDEX(line,1,3)), splitter, LAMBDA(line, LET( a, TEXTSPLIT(pos(line),,", "), b, TEXTSPLIT(obj(line),, ", "), LAMBDA( LET( data,HSTACK( TOCOL( IF(SEQUENCE(,ROWS(b) ), a ) ), TOCOL( TRANSPOSE( IF(SEQUENCE(,ROWS(a) ), b ) ) ) ), HSTACK( IF( SEQUENCE(ROWS(data) ), id(line)), data ) ) ) ) ), unpacker, LAMBDA(arrayOfThunks, LET( n, ROWS(arrayOfThunks), first, INDEX(arrayOfThunks, 1, 1)(), rest, INDEX(arrayOfThunks, SEQUENCE(n - 1, , 2), 1), IF( n = 1, first, REDUCE(first, rest, LAMBDA(a,v, VSTACK(a, v()))) ) ) ), getAll, LAMBDA(v, TEXTJOIN(", ",, FILTER( ObjetsPositions[Object], ObjetsPositions[Position] = VALUE(v) ) ) ), rawSplit, unpacker( BYROW(source, splitter )), setNine, LAMBDA(line, LAMBDA( LET( id, INDEX(line,1,1), pos, INDEX(line,1,2), obj, INDEX(line,1,3), maxPos, IF( TRIM(pos)="99", INDEX( CHOOSECOLS(sortPositions,2), XMATCH( VALUE(obj), CHOOSECOLS(sortPositions,1), 0, -1 ) ), pos ), HSTACK( id, IF( id=pers, MIN(pos, maxPers), pos ), obj ) )) ), getNine, unpacker( BYROW(rawSplit, setNine )), setAll, LAMBDA(line, LAMBDA( HSTACK( id(line), pos(line), IF(obj(line)="For all", getAll(pos(line)), obj(line) ) ) ) ), withAll, unpacker( BYROW(getNine, setAll ) ), res, VSTACK( {"ID","Position","Object"}, unpacker( BYROW(withAll, splitter ) ) ), UNIQUE(res) )Didn't catch which 4 columns you need to remove duplicates.
- TTodorovApr 14, 2024Copper Contributor
Hi all,
PeterBartholomew1, SergeiBaklan I'm respected a lot from you knowledges in Excel, many thanks for your time and shared solutions.
rachelagain want to thank you for kind support with solutions.
djclementsthank you for suggestions and shared variants.
I started with simple task - to try to identify duplicates and prevent mistakes in future data transformations in Power Query. To do that, I realized that I need a list. That was my first challenge.
Seems easy, even for me, I created some kind of list. But realized, that I need not only splitting, but also replacing in same time.
And I realized that I have not enough knowledges to do that with formulas. I solve this challenge with Power Query, but I need result without need of refreshing.
I have data table like this:
Key columns are Objects, Positions, ID and Priority.
I also have table ObjetsPositions:
In data table, I need to handle these replacements:
1. Challenge "For all". This mean that I need to replace "For all" with all objects (from table ObjetsPositions) which have same positions (from data table).
2. Chanllenge "99". This mark "last" of positions. Seems easy, but something here, I realized that I need to get information from other two columns - ID and Priority.
a) ID = "PARK-0000". In this case, I need to replace 99 with last position for each Object.
b) ID = "PERS-0000". In this case, if the store have more than 4 positions, I need to replace 99 with 4. In other case, I need to replace 99 with last position for each Object.
Facing with this challenges, rachel helped me with solutions. Many thanks rachel! 🙂
First solutions works good with 2 key columns, but in with more columns, probably because of Excel limitation, there is and errors. After that rachel adapted it and with recursive Lambda in named range ListPositionsAndObjects:
=LAMBDA( data; LET( pairs; DROP( REDUCE( ""; SEQUENCE(ROWS(data); 1; 1; 1); LAMBDA( list; idx; VSTACK( list; TOCOL( LEFT(INDEX(data; idx; 3); 9) & "@" & TEXTSPLIT( INDEX(data; idx; 1); ", "; ; TRUE; 0; "" ) & "@" & TRANSPOSE( TEXTSPLIT( INDEX(data; idx; 2); ", "; ; TRUE; 0; "" ) ) & "@" & INDEX(data; idx; 5) ) ) ) ); 1; 0 ); DROP( REDUCE( ""; pairs; LAMBDA( list; pair; VSTACK( list; TEXTSPLIT(pair; "@") ) ) ); 1; 0 ) ) )And after that with one more formula:
=LET( objects_by_position; ListPositionsAndObjects(SampleTable); ids; INDEX(objects_by_position; ; 1); positions; INDEX(objects_by_position; ; 2); objects; INDEX(objects_by_position; ; 3); replaced_positions; VALUE(IF(TRIM(positions) <> "99"; positions; IF(ids = "PERS-0000"; IF(MAXIFS(ObjetsPositions[Position]; ObjetsPositions[Object]; objects) < 4; MAXIFS(ObjetsPositions[Position]; ObjetsPositions[Object]; objects); "4"); MAXIFS(ObjetsPositions[Position]; ObjetsPositions[Object]; objects)))); replaced_objects; IF(objects <> "For all"; objects; MAP(replaced_positions; LAMBDA(x; TEXTJOIN(", "; TRUE; FILTER(ObjetsPositions[Object]; ObjetsPositions[Position] = x))))); ListPositionsAndObjects(HSTACK(replaced_positions; replaced_objects; ids)) )I received desired list. But realized that for identifying duplicates I need to use at least 4 columns, not 3.
After that SergeiBaklan shared also solution (with 2 columns, without additional logic). Many thanks Sergei! 🙂
Seems that due to complexity of the task, finding duplicates with formulas also take time. I didn't measured to compare with result from Power Query, but calculations take time. 🙂
Will continue to try to find a way (with 4 and more columns), in any way! 🙂
Till then, as attached file you can find sample table with solutions from rachel and SergeiBaklan.
Again, want to thank you for sharing knowledges and solutions! 🙂
Best regards
Todor
- PeterBartholomew1Apr 14, 2024Silver Contributor
Sorry, I mis-read the requirement. To return two columns requires a slightly different approach. The one I chose differs slightly from the path taken by SergeiBaklan in that I assemble a grid over which I could broadcast the values. So, for a single cell list
ThunkedCartesianProductλ =LET( list₁, TEXTBEFORE(combinedLists, "|"), list₂, TEXTAFTER(combinedLists, "|"), arr₁, VALUE(TEXTSPLIT(list₁, , ",")), arr₂, VALUE(TEXTSPLIT(list₂, ",")), grid, SIGN(arr₁ + arr₂), THUNK(HSTACK( TOCOL(IF(grid, arr₁)), TOCOL(IF(grid, arr₂)) )) )The worksheet formula is unaltered from that I show above
= LET( listsϑ, MAP(inputList, ThunkedCartesianProductλ), VSTACK( INDEX(listsϑ,1,1)(), INDEX(listsϑ,2,1)(), INDEX(listsϑ,3,1)() ) )Both formulas could be a little more concise but, for me, that is not a priority.
As for thunks, they are Lambda functions that do not require arguments but return a specific array when invoked with a null parameter string. The function I use to create a thunk from an array is
THUNK(x) = LAMBDA(x, LAMBDA(x))Excel will accept an array of thunks for later calculation whereas it throws a wobbly when faced with an array of arrays!
- rachelApr 14, 2024Iron ContributorThanks for the wiki!
- SergeiBaklanApr 14, 2024Diamond Contributor
I guess PeterBartholomew1 uses
Thunk = lambda(x, lambda(x) )Instead we may use directly LAMBDA( TOCOL(...) )
Just in case, thunks are explained here Thunk - Wikipedia