Forum Discussion
Extracting table values if values in two columns have identical values in other rows
This is not a serous recommendation but more a matter of experimentation. The object was to see whether I could use SCAN to pick out the required records. The first problem is that the criterion is based upon two fields. As we have seen, concatenation is a good option, despite being somewhat inelegant. Instead, to get SCAN to work over an array of arrays, I thunked the value pairs, and fed them through SCAN as thunks. Similarly, there a problems with the accumulated values since I need to know both whether the previous record matches the current one and whether it matched its predecessor, in order to determine whether it should be retained or filtered out.
=LET(
sorted, SORT(Table1, {2, 3}),
attrϑ, BYROW(CHOOSECOLS(sorted, 2, 3), THUNK),
initϑ, THUNK(HSTACK("", "", FALSE, FALSE)),
resultϑ, SCAN(initϑ, attrϑ, IsGroupedλ),
boolean, MAP(resultϑ, LAMBDA(rϑ, LET(r, rϑ(), INDEX(r, 1, 4)))),
final, INDEX(INDEX(resultϑ, ROWS(data), 1)(), , 3),
FILTER(sorted, VSTACK(DROP(boolean, 1), final))
)The bulk of the problem-specific comparisons are made in the Lambda function 'IsGroupedλ' which is defined to be
= LAMBDA(accϑ, aϑ₁,
LET(
acc, accϑ(),
a₁, aϑ₁(),
a₁₀, INDEX(acc, 1, 1),
a₂₀, INDEX(acc, 1, 2),
a₁₁, INDEX(a₁, 1, 1),
a₂₁, INDEX(a₁, 1, 2),
b₀, INDEX(acc, 1, 3),
b₁, INDEX(acc, 1, 4),
c, AND(a₁₁ = a₁₀, a₂₁ = a₂₀),
b, OR(c, b₀),
THUNK(HSTACK(a₁, c, b))
)
)The time taken for the resulting calculation was 1000ms which is slow, but not ridiculously so. The formula returns an array of thunks, from which a single value is selected to be the filter criterion.
For the current problem, direct array methods are both more transparent and considerably faster. Despite that, the challenges for me were:
1. Could I get around the array of array limitations of SCAN?
2. Is the performance or the resulting formula acceptable?
My conclusion is that the resulting methods are both possible and usable.
Some tidying is required and the problem-specific features of the Lambdas need to be more contained if they are to be used as 'black box' functions.
With SCAN you mainly answer on first question
Could I get around the array of array limitations of SCAN?
and perfectly demonstrated it.
Performance is 4 times better in sample from your previous post. That's a huge difference, but only if we speak about seconds, not milliseconds.