Set theory operations with dynamic arrays

Copper Contributor

I wonder if any of you have figured this out already (and i would like to have feedback about what i have done so far), i have been playing with columns of data and creating easy set theory operations. The addition of VSTACK and TOCOL basically nailed the sum operation as i can just select an entire array and merge it easily (and then apply a UNIQUE if i want to), the subtraction and intersection are not that straightforward though...

My approach was fairly simple and also known already (as shown here How can we perform common set operations (union, intersection, minus) in MS Excel?) but i wanted a solution that was neither VBA or ADD-IN based, and of course, that used dynamic arrays. For the subtraction operation i started with:

 

=FILTER(list1 , COUNTIF( list2 , list1) = 0)


As the second argument of COUNTIF is the list1 itself the size must be the same so i can apply FILTER to it and get only the elements on list1 that did not appear in list2, a similar solution without COUNTIF but with MATCH would be:

=FILTER(list1 , NOT(ISNUMBER(MATCH(list1 , list2 , 0))))


The approach with MATCH is actually better because it can be substituted with XMATCH and used wildcards and so on, but the main reason is that MATCH can accepct a lookup_vector that is a VSTACK or a TOCOL (an array instead of a range) which COUNTIF cannot handle (for reasons), the thing is now i can have a more general solution:

=
LAMBDA( list1, list2, FILTER( list1, NOT(ISNUMBER(MATCH(list1, list2, 0)))))

(listA , VSTACK(listB, listC))


And with this anonymous lambda i can pass the parameters as ranges or arrays and get the difference between combined sets in a general formula.

Now for the intersection, i have used a similar solution to the difference since i need to get values that are present in both lists at the same time so basically:

 

COUNTIF Solution:
=FILTER(list1 , COUNTIF( list2 , list1) > 0)

MATCH Solution:
=FILTER(list1 , ISNUMBER(MATCH(list1 , list2 , 0)))

 

The problem is that i was not able to make a formula that could do intersection with more lists in a general way, sure i could just add more conditions to the filter using boolean logic and repeating the arguments for each list with a * sign, but this is far from ideal compared to the difference solution.

I started to work with some BYROW and BYCOL combinations but as you guys know nested arrays are not supported, i managed to get away with it in Google Sheets using:

=FILTER(list1, 
  BYROW( 
   ArrayFormula( 
    BYCOL(matrix_lists, 
     LAMBDA(main, COUNTIF(main, list1)))), LAMBDA(matrix, PRODUCT(matrix) > 0)))

 

The idea is to create a COUNTIF (or MATCH with a few tweaks) result for each column and then do the product of each row of results and FILTER out the ones that had a zero because that means the value does not appear in all lists at the same time. Still i was not capable of implementing this idea in Excel as i dont know how to overcome nested arrays not being supported, does any of you have some ideas to share?

5 Replies

@MarkolaKrai 

Perhaps with REDUCE:

=REDUCE("Not in List2",list1,LAMBDA(a,v,IF(OR(v=list2),a,VSTACK(a,v))))

@MarkolaKrai 

Dressing it up a bit!

 

 

Union:                = SetOperateλ(setA, setB, Unionλ);
Intersection:         = SetOperateλ(setA, setB, Intersectλ);
Difference1:          = SetOperateλ(setA, setB, Diffλ);
Difference2:          = SetOperateλ(setB, setA, Diffλ);
Symmetric difference: = SetOperateλ(setA, setB, SymDiffλ);

 

 

where

 

 

SetOperateλ
= LET(
    items,     VSTACK(A, B),
    distinct,  SORT(UNIQUE(items)),
    isA,       COUNTIFS(A,distinct),
    isB,       COUNTIFS(B,distinct),
    criterion, MAP(isA, isB, Fnλ),
    FILTER(distinct, criterion)
  )

 

 

and

 

 

Unionλ      = OR(x, y);
Intersectλ  = AND(x, y);
Diffλ       = AND(x, NOT(y));
SymDiffλ    = XOR(x, y);

 

 

@MarkolaKrai 

File attached

@Patrick2788 Thank you for the reply, this formula is cleaner than mine for the difference of sets i did not remember to use REDUCE because for some reason my brain always thinks the initial value and the accumulator must handle numbers only. I will be using this solution from now on if i have more than 2 sets.
Hey @Peter Bartholomew thank you for the reply! Huge fan btw i love seeing your beautifully crafted formulas as these that you presented, they are useful of course but i was actually pursuing a general formula that could handle intersection of 3 or more sets without the need of adding more boolean logic nor needing to repeat the formulas one inside the other for every new set