SOLVED

Top n most common entries in an array of numbers and how often they occur

Copper Contributor

I have a 3 column x 1000 row array that contains a variety of numbers in one or more of the columns i.e. for any given row it might have a different entry in each of the 3 columns or it may simply have an entry in 1 column with the other 2 columns being blank. The numbers range from 1111 to 9999 and can be any mix. I want to extract the 10 numbers that occur the most often irrespective of what row or column they're in, put them in a separate column showing the 10 numbers and showing the number of times each number occurs and then sort descending from most common to least common. Sounds simple but I can't work out how to do it. Can anyone help please? Thanks (Office 365)

15 Replies

Try

=LET(Liste,SEQUENCE(MAX(A2:C1001),,MIN(A2:C1001)),TAKE(SORT(VSTACK(Liste,FREQUENCY(A2:C1001,Liste)),2,-1),10))

 

@BF_UK 

best response confirmed by BF_UK (Copper Contributor)
Solution

@BF_UK 

=TAKE(SORT(HSTACK(UNIQUE(TOCOL(B1:D1000)),BYROW(UNIQUE(TOCOL(B1:D1000)),LAMBDA(row,COUNTIF(B1:D1000,row)))),2,-1),10)

You can try this formula which works in my sheet.

top 10 numbers.JPG

@BF_UK 

 

My solution:

 

matrix is the 1000x3 array.

 

'Vector
=TOCOL(matrix)

'Total' Lambda - run each of the numbers through a filter to obtain the counts. Stack, sort, and take top 10.

=LAMBDA(a,v,LET(
    filtered, COUNT(FILTER(vector, vector = v)),
    stack, VSTACK(a, HSTACK(v, filtered)),
    TAKE(SORT(stack, 2, -1), 10)
))

 

Sheet level formula:

=REDUCE({"Number","Count"},UNIQUE(vector),Total)

@BF_UK 

My solution is similar to the others

= LET(
    occurrences, TOCOL(COUNTIFS(array#,array#)),
    valuesList,  TOCOL(array#),
    sorted,      SORT(HSTACK(valuesList, occurrences),{2,1},-1),
    TAKE(UNIQUE(sorted),10)
  )

The most significant differences are that I applied COUNTIFS to the 2D array whilst is was still a Range and only then reduced the results to a column.  I used LET to avoid excessive function nesting but did not place the formula within a LAMBDA despite the fact that I am moving in that direction.

 

I also noticed the use of FREQUENCY as an alternative to COUNTIFS.

= LET(
    list,        TOCOL(array#),
    distinct,    SORT(UNIQUE(list)),
    occurrences, DROP(FREQUENCY(list, distinct),-1),
    sorted,      SORT(HSTACK(distinct, occurrences),{2,1},-1),
    TAKE(sorted, 10)
  )

That also seems to work well.

@BF_UK 

 

This will also discount any null strings ("") present within the range:

 

=LET(
    ζ,B1:D1000,
    TAKE(SORT(IF({0,1},FREQUENCY(ζ,0+(0&ζ)),TOCOL(ζ)),2,-1),10)
)
Nice. You make better use of FREQUENCY than I did. My sorting and removing duplicates for the bins was unnecessary. Since my array was generated using RANDARRAY blanks were not an issue.

@Peter Bartholomew 

 

The OP did state "for any given row it might have a different entry in each of the 3 columns or it may simply have an entry in 1 column with the other 2 columns being blank". I just assumed that some or all of those "blanks" could be null strings.

@JosWoolley 

Another possibility is to use the TOCOL function to remove the blanks

= LET(
    ζ, TOCOL(data,1),
    TAKE(SORT(HSTACK(ζ,FREQUENCY(ζ,ζ)),2,-1),10)
  )

 

@Peter Bartholomew 

 

Unfortunately that only works with 'genuine' blanks. Any null strings present within the range will cause the second column in the output to contain one or more #N/A returns.

 

Regards

Thank you very much for posting a solution to my problem. In the end I used the solution given by  Quadruple_Pawn because they showed a table and a result. No other reason than that, It worked perfectly. I could not have done this on my own so thank you so much for answering my question. It's fantastic to see the Excel community in action. Thank you once again. 
Thank you very much for posting a solution to my problem. In the end I used the solution given by Quadruple_Pawn because they showed a table and a result. No other reason than that, It worked perfectly. I could not have done this on my own so thank you so much for answering my question. It's fantastic to see the Excel community in action. Thank you once again. 
Thank you very much for posting a solution to my problem. In the end I used the solution given by  because they showed a table and a result. No other reason than that, It worked perfectly. I could not have done this on my own so thank you so much for answering my question. It's fantastic to see the Excel community in action. Thank you once again. 

@BF_UK 

My solution shows a 'results matrix':

 

Patrick2788_0-1679345551312.png

 

Thank you very much for posting a solution to my problem. In the end I used the solution given by  because they showed a table and a result. No other reason than that, It worked perfectly. I could not have done this on my own so thank you so much for answering my question. It's fantastic to see the Excel community in action. Thank you once again. 
1 best response

Accepted Solutions
best response confirmed by BF_UK (Copper Contributor)
Solution

@BF_UK 

=TAKE(SORT(HSTACK(UNIQUE(TOCOL(B1:D1000)),BYROW(UNIQUE(TOCOL(B1:D1000)),LAMBDA(row,COUNTIF(B1:D1000,row)))),2,-1),10)

You can try this formula which works in my sheet.

top 10 numbers.JPG

View solution in original post