Quintile division help

Copper Contributor

Hi team, 


Hope you're doing great. So, I have a question. I'm trying to automate a similar dataset, where I try to categorize the skillset based on their scores while making sure that I only calculate for the rows with the same skill name.




Let's just say, I have a dataset with 3 columns, 'Skill', 'Name' & 'Score'. here, what I'm trying to do is, create a new column that categorizes their quintile and assigns A, B, C, or D only based on what their skill is showing. 


Let's take the 'Draw' skill, we've 5 rows for that, so I want scores categorized into a, b, c, and d categories just for that. And then the next skill 'Paint'and so on. 


I'm trying to automate this, so Power Query would also be good, let me know what would be a good way to do so.

6 Replies

@bsrujan022  so the issue I have here is understanding how to break them into A,B,C,D and what you want.  For example I did 2 different versions in the attached.  In 1 case I used the built in PERCENTRANK.EXC function and the other I used my own calculation based on the value and the max-min range.  The methods yield different results and depend on how you want to define the groups.

Hi @mtarler, thank you. So, I was looking for categorizing the names for skills based on their scores into quintiles. So, we have an option in Pivot to categorize based on the Top 10%, 20%, and so on in conditional formatting. Can I do something there so I could get them into A, B, C, D instead of colors?
Hi. did you look at the attached worksheet? Do either or both of those options work for you? If not what did you want different? My point is both of those solutions do what you ask but as you noted, the skill 'Draw' has 5 so how do they get spread into 4 categories? How does top 25% get defined? is it based on rank or value? is it an inclusive or exclusive set? If you don't care about those nuances then either solution will work for you, but the results will be slightly different.
Hi @mtarler. I understand. I checked both of your solutions and proposed to the team. Seems, they're trying to weigh more based on the highest scorers. Since this was just a sample, you're not seeing multiple entries. Actual dataset has like 45 different skills, and over 90000 rows worth of data. When working there, I noticed that classifying based on both your formulas, I was able to see categories divided and somewhat biased to medium categories and not on top scorers. I then remembered, in Pivot table, we've option to format top 25% or bottom 25%. And so on for 50%. If we're able to put the rules in order, we'll get based on who scored highest in top 25%, then next being 25 to 50% and so on.. also, it works when new skills are getting added and actually it gets more dynamic with new rows.

So, is it possible to let's say, top 25% are formatted as Green color, can we create a new column in pivot which can say green is A, Yellow (26-50%) is B, Orange (51-75%) is C and last being D? Just checking since that'll make it easy and format friendly even if we try to add new columns in the raw data in future since lot of people use it.
best response confirmed by bsrujan022 (Copper Contributor)
I'm sorry I'm a bit confused. You are saying the formulas are middle based? I'm also not sure what you mean by pivot table making colors. You can use Conditional Formatting to color cells based on many things including top X%. In pivot tables you can make a calculated field and potentially use that as a way to create columns. As for the formulas I created, they don't care how many cells or how many categories. As for being 'biased' toward the middle or high I don't understand what you mean/want. In terms of rank based grouping that will 'bias' by only 1 or multiple of the same value but on large set shouldn't be noticeable/significant. In terms of value based grouping that will be entirely dependent on the value distribution.
Hmm.. ok.. seems it's hard to share in words until I expose my sensitive dataset. No worries, will try to figure out something. Thank you.