Forum Discussion
A generalisation of the MAP lambda helper function
- Sep 08, 2024
This is a demonstration of the proposed MAP function used to solve a challenge provided by Oz_Sunray and Wyn Hopkins on data cleansing and vote counting. The key element of my solution is to build a table of validated votes for each voter ID and each candidate (i.e. an array of arrays - much the same as any other problem of interest). The formula is modularised using Lambda functions. They are not that pretty, but neither are they impossible to follow.
"Worksheet formula" = AggregateVotesλ( tblVOTES[VoterID], tblVOTES[[Vote1]:[Vote3]], weight, tblAvailable[AVAILABLE] )
which calls
/* "FUNCTION NAME: AggregateVotesλ" "DESCRIPTION: Aggregates all votes to candidate colors" */ /* "REVISIONS: Date Developer Description" " 08 Sep 2024 Peter Bartholomew Original Development" */ AggregateVotesλ = LAMBDA(voterID, votesCast, weight, available, LET( person, TOROW(UNIQUE(voterID)), votes, MAPλ(person, AssignVotesλ(voterID, votesCast, weight, available)), count, HSTACK(available, BYROW(votes, SUM)), return, TAKE(SORT(count, 2, -1), 4), return ) );
to build and aggregate the table of votes. This, in turn, calls another Lambda function to clean and validate the votes corresponding to a single voterID
/* "FUNCTION NAME: AssignVotesλ" "DESCRIPTION: Validates and assigns votes to candidate colors for a single voterID " */ /* "REVISIONS: Date Developer Description" " 08 Sep 2024 Peter Bartholomew Original Development" */ AssignVotesλ = LAMBDA(voterID, ballots, weight, available, LAMBDA(p, LET( // "In the case of duplicate ballots return first" votesCast, XLOOKUP(p, voterID, ballots), // "Remove invalid and repeated votes" valid, COUNTIFS(available, votesCast), distinct, EXPAND(UNIQUE(FILTER(votesCast, valid), TRUE), , 3, ""), // "Assign votes to candidate colours" assigned, BYROW((distinct = available) * weight, SUM), assigned ) ) );
I took the problem from Wyn's recording
https://www.youtube.com/watch?v=FIcxyLqzWcE
This is a demonstration of the proposed MAP function used to solve a challenge provided by Oz_Sunray and Wyn Hopkins on data cleansing and vote counting. The key element of my solution is to build a table of validated votes for each voter ID and each candidate (i.e. an array of arrays - much the same as any other problem of interest). The formula is modularised using Lambda functions. They are not that pretty, but neither are they impossible to follow.
"Worksheet formula"
= AggregateVotesλ(
tblVOTES[VoterID],
tblVOTES[[Vote1]:[Vote3]],
weight,
tblAvailable[AVAILABLE]
)
which calls
/* "FUNCTION NAME: AggregateVotesλ"
"DESCRIPTION: Aggregates all votes to candidate colors" */
/* "REVISIONS: Date Developer Description"
" 08 Sep 2024 Peter Bartholomew Original Development"
*/
AggregateVotesλ
= LAMBDA(voterID, votesCast, weight, available,
LET(
person, TOROW(UNIQUE(voterID)),
votes, MAPλ(person, AssignVotesλ(voterID, votesCast, weight, available)),
count, HSTACK(available, BYROW(votes, SUM)),
return, TAKE(SORT(count, 2, -1), 4),
return
)
);
to build and aggregate the table of votes. This, in turn, calls another Lambda function to clean and validate the votes corresponding to a single voterID
/* "FUNCTION NAME: AssignVotesλ"
"DESCRIPTION: Validates and assigns votes to candidate colors for a single voterID " */
/* "REVISIONS: Date Developer Description"
" 08 Sep 2024 Peter Bartholomew Original Development"
*/
AssignVotesλ
= LAMBDA(voterID, ballots, weight, available,
LAMBDA(p,
LET(
// "In the case of duplicate ballots return first"
votesCast, XLOOKUP(p, voterID, ballots),
// "Remove invalid and repeated votes"
valid, COUNTIFS(available, votesCast),
distinct, EXPAND(UNIQUE(FILTER(votesCast, valid), TRUE), , 3, ""),
// "Assign votes to candidate colours"
assigned, BYROW((distinct = available) * weight, SUM),
assigned
)
)
);
I took the problem from Wyn's recording
https://www.youtube.com/watch?v=FIcxyLqzWcE
- Patrick2788Dec 31, 2024Silver Contributor
Finally getting to this one!
In re: Wyn's challenge.
Here's my take:
=LET( flat, UnPivotMλ( tblVOTES[VoterID], weight, tblVOTES[[Vote1]:[Vote3]] ), colors, TAKE(UNIQUE(flat), , -1), agg, SORT(GROUPBY(colors, colors, COUNTA, 0, 0), 2, -1), TAKE(agg, 4) ) //UnPivotMλ being: UnPivotMλ = LAMBDA(rows, columns, values, LET( i, ROWS(rows), j, COLUMNS(rows), c, COLUMNS(columns), s, TOCOL(SEQUENCE(, j) * SEQUENCE(c, , 1, 0)), row_labels, WRAPROWS(TOCOL(CHOOSECOLS(rows, s)), j), attribute, TOCOL(CHOOSEROWS(columns, SEQUENCE(i, , 1, 0))), v, TOCOL(values), HSTACK(row_labels, attribute, v) ) )
- djclementsSep 13, 2024Bronze Contributor
PeterBartholomew1 Once again, great job on producing a generalized solution to the nested arrays issue. Hopefully Microsoft will take note of your efforts and come up with a built-in solution in the future.
Your reference to Oz_Sunray's live challenge caught my attention, as I too had come across that video in my YouTube feed. A lambda-free variant to avoid the array of arrays issue for that particular challenge could be:
=LET( names, UNIQUE(Votes[Ballot]), colors, INDEX(Votes, XMATCH(names, Votes[Ballot]), {2,3,4}), names_colors, TOCOL(names & "|" & colors), keys, UNIQUE(names_colors), weight, XLOOKUP(keys, names_colors, TOCOL(IF({1}, {3,2,1}, colors))), TAKE(SORT(HSTACK(Available, MMULT(N(Available = TOROW(TEXTAFTER(keys, "|"))), weight)), 2, -1), 4) )
Note: the weight array can be adjusted as desired (e.g. {1,1,1} instead of {3,2,1}).
This is one situation where INDEX / XMATCH is preferred over XLOOKUP, as it is perfectly capable of returning multiple columns of data for multiple lookup values, whereas XLOOKUP is not.
While I typically seek out alternative methods to overcome the nested arrays issue, complex scenarios can quickly become convoluted, making it difficult for the average user to follow. Your MAPλ function seems to take care of that nicely. Thank you for sharing!
- PeterBartholomew1Sep 14, 2024Silver Contributor
Whilst we are chatting, you mention a 'Lambda free' variant. In general terms I am, at present' travelling in the opposite direction; that is trying to turn every formula into a Lambda. The reasoning goes, a basic formula exposes the functions it uses and, scattered through, the references it makes to data. Conversely, a named Lambda conceals the 'how' but describes the task it is performing and lists its precedents. The 'how' is concealed but it is available by listing the function.
- djclementsSep 15, 2024Bronze Contributor
I've been experimenting with thunks over the past month or so and have a pretty good handle on it now. However, the binary tree method for unpacking a thunked array of arrays is still a bit of a mind twist for me. The examples that both you and lori_m have shared are some of the best resources I've found on this topic, so thank you for that.
By a "lambda-free" variant, what I really meant to say was a solution that spills naturally via lifting or broadcasting, without having to rely on any of the lambda helper functions (BYROW, MAP, REDUCE, etc.). MMULT was a function that I always struggled with but have a better grasp of now as I've been using it with increased frequency. I'm all for defining custom lambda functions, but typically only do so if they will be called more than once.
On a side note, I'm happy to report that I now have access to both GROUPBY and PIVOTBY (after applying the September 10th update on the Current Channel), as well as eta-lambda for all of the lambda helper functions. Still waiting on the new regex functions and Python, but they're not too high on my wish list. I'm more excited about the upcoming TRIMRANGE function and Trim Refs, which will make it much easier to dynamically reference a data input range without having to use structured tables.
As always, your contributions to the community are invaluable, and I look forward to seeing what you come up with next. Cheers!
- PeterBartholomew1Sep 14, 2024Silver Contributor
Hello David. Thank you for your favourable comments on MAPλ. I would welcome others to evaluate the function in terms of its usability. I think its strength is the way in which it hides its complexity but others may be of a different opinion.
The pipe-separated list can work well particularly if, as in this case, you are packing 2 text strings into a variable. Something else I have done in the past is to pad each string with spaces to give a fixed width format. The MID function will return an array of arrays by broadcasting the index. I am, however, getting more comfortable with the idea of storing arrays or the calculation that result in an array within a function (usually a thunk).
In a recent example involving a financial modelling revolver, I turned the debt and cash at each period into a thunk. Given an array of periods extracting the values could present a challenge but, in the event, I realised that using MAP to first return the debt and then a second use of MAP to return the cash worked perfectly.
- Oz_SunraySep 08, 2024MVPVery cool! 👏🏽👏🏽👏🏽