Mar 07 2023 05:49 AM
I wonder if any of you have figured this out already (and i would like to have feedback about what i have done so far), i have been playing with columns of data and creating easy set theory operations. The addition of VSTACK and TOCOL basically nailed the sum operation as i can just select an entire array and merge it easily (and then apply a UNIQUE if i want to), the subtraction and intersection are not that straightforward though...
My approach was fairly simple and also known already (as shown here How can we perform common set operations (union, intersection, minus) in MS Excel?) but i wanted a solution that was neither VBA or ADD-IN based, and of course, that used dynamic arrays. For the subtraction operation i started with:
=FILTER(list1 , COUNTIF( list2 , list1) = 0)
As the second argument of COUNTIF is the list1 itself the size must be the same so i can apply FILTER to it and get only the elements on list1 that did not appear in list2, a similar solution without COUNTIF but with MATCH would be:
=FILTER(list1 , NOT(ISNUMBER(MATCH(list1 , list2 , 0))))
The approach with MATCH is actually better because it can be substituted with XMATCH and used wildcards and so on, but the main reason is that MATCH can accepct a lookup_vector that is a VSTACK or a TOCOL (an array instead of a range) which COUNTIF cannot handle (for reasons), the thing is now i can have a more general solution:
=
LAMBDA( list1, list2, FILTER( list1, NOT(ISNUMBER(MATCH(list1, list2, 0)))))
(listA , VSTACK(listB, listC))
And with this anonymous lambda i can pass the parameters as ranges or arrays and get the difference between combined sets in a general formula.
Now for the intersection, i have used a similar solution to the difference since i need to get values that are present in both lists at the same time so basically:
COUNTIF Solution:
=FILTER(list1 , COUNTIF( list2 , list1) > 0)
MATCH Solution:
=FILTER(list1 , ISNUMBER(MATCH(list1 , list2 , 0)))
The problem is that i was not able to make a formula that could do intersection with more lists in a general way, sure i could just add more conditions to the filter using boolean logic and repeating the arguments for each list with a * sign, but this is far from ideal compared to the difference solution.
I started to work with some BYROW and BYCOL combinations but as you guys know nested arrays are not supported, i managed to get away with it in Google Sheets using:
=FILTER(list1,
BYROW(
ArrayFormula(
BYCOL(matrix_lists,
LAMBDA(main, COUNTIF(main, list1)))), LAMBDA(matrix, PRODUCT(matrix) > 0)))
The idea is to create a COUNTIF (or MATCH with a few tweaks) result for each column and then do the product of each row of results and FILTER out the ones that had a zero because that means the value does not appear in all lists at the same time. Still i was not capable of implementing this idea in Excel as i dont know how to overcome nested arrays not being supported, does any of you have some ideas to share?
Mar 07 2023 06:55 AM
Perhaps with REDUCE:
=REDUCE("Not in List2",list1,LAMBDA(a,v,IF(OR(v=list2),a,VSTACK(a,v))))
Mar 07 2023 08:40 AM - edited Mar 07 2023 09:10 AM
Dressing it up a bit!
Union: = SetOperateλ(setA, setB, Unionλ);
Intersection: = SetOperateλ(setA, setB, Intersectλ);
Difference1: = SetOperateλ(setA, setB, Diffλ);
Difference2: = SetOperateλ(setB, setA, Diffλ);
Symmetric difference: = SetOperateλ(setA, setB, SymDiffλ);
where
SetOperateλ
= LET(
items, VSTACK(A, B),
distinct, SORT(UNIQUE(items)),
isA, COUNTIFS(A,distinct),
isB, COUNTIFS(B,distinct),
criterion, MAP(isA, isB, Fnλ),
FILTER(distinct, criterion)
)
and
Unionλ = OR(x, y);
Intersectλ = AND(x, y);
Diffλ = AND(x, NOT(y));
SymDiffλ = XOR(x, y);
Mar 07 2023 09:06 AM
File attached
Mar 07 2023 09:15 AM
Mar 07 2023 09:26 AM