Mar 17 2019 01:53 PM
I have a dataset that involves two columns, Customer ID and Product. The green highlighted is what my final results should be. I need to find the total number of unique customers that only ordered apples only, bananas only, and both.
Mar 17 2019 04:15 PM
My starting point was to get a list of the 5 distinct customers, {"n1";"n2";"n3";"n4";"n5"}. Using modern Dynamic Arrays that was just
= UNIQUE( Customer )
Rather than letting this be a spilt array, I used a named formula 'distinctCustomer'.
From there, the number of customers that bought apples is
= SUM( SIGN( COUNTIFS( Customer, distinctCustomer, Product,"apples" ) ) )
and bananas
= SUM( SIGN( COUNTIFS( Customer, distinctCustomer, Product,"banana" ) ) )
Rather more complicated, the number that bought both is given by the product [the numerical equivalent of an AND]
= SUM( SIGN(
COUNTIFS( Customer, distinctCustomer, Product,"apples" ) *
COUNTIFS( Customer, distinctCustomer, Product,"banana" ) ) )
By the way, my formula gave Apples, 4; Bananas, 2; Both 1.
Mar 17 2019 04:34 PM
Because, at this stage in the game, it is probably cheating to use UNIQUE(), it is also possible to use MATCH to reduce the number of instances of each customer to one
= SUM( IF(
MATCH(Customer, Customer, 0 ) = Index,
SIGN( COUNTIFS( Customer, Customer, Product, "apples" ) ),
0 ) )
Index is a helper column containing the row number. These-days I would want to use
= SEQUENCE( ROWS(Table) )
but meanwhile an index column can be added by hand or by using ROW().