Forum Discussion
Percentiles from frequency table
Hello,
This is a quite common problem to solve in statistics it seems to me, but I can't find an easy solution in Excel. Am I missing something or is this just something that's not incorporated?
The problem is simple: I have a frequency table, and want to calculate percentiles.
Eg.
Score (0-5) / Number of students
0 / 3
1 / 2
2 / 1
3 / 5
4 / 6
5 / 1
To calculate percentiles, you would line the results up as follows:
0 0 0 1 1 2 3 3 3 3 3 4 4 4 4 4 4 5
You can then use the DAX PERCENTILE function (I understand there is no equivalent in M) on such an "expanded" list.
However:
1) I don't see an easy way to convert the dataset into the "expanded" dataset in M or DAX (in Excel there's https://www.got-it.ai/solutions/excel-chat/excel-tutorial/median/learn-how-to-calculate-the-median-from-a-frequency-table-in-excel)
2) Even if I would, in real life the expanded dataset would be humungous - and this only to come to a few crunched numbers in the end. In my case, the range of values (left column) goes from 1 to 53 for 10.000 products, and the frequencies (right column) go up to thousands per value, so we're talking about 10.000 lists of 100's of thousands of numbers each.
I found a mathematical explanation https://astarmathsandphysics.com/ib-maths-notes/probability-and-statistics/1063-quartiles-and-median-from-frequency-tables.html, but it doesn't seem obvious to me to get this done in Excel either...
So I'm hoping there's a formula for this?
You may group by Product without aggregation and apply former procedure to each group
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content], #"Grouped Rows" = Table.Group( Source, {"Product"}, {{"Percentile", each [ allValues = Table.AddColumn( _ , "Lists", each List.Numbers([Value],[Frequency],0) ), Percentile = List.Percentile ( List.Combine(allValues[Lists]), 0.5 ) ][Percentile] }} ) in #"Grouped Rows"
or use function instead
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content], fnPercentile = (tbl as table) => let allValues = Table.AddColumn( tbl , "Lists", each List.Numbers([Value],[Frequency],0) ), Percentile = List.Percentile ( List.Combine(allValues[Lists]), 0.5 ) in Percentile, #"Grouped Rows" = Table.Group( Source, {"Product"}, {{ "Percentile", each fnPercentile(_) }} ) in #"Grouped Rows"
Here the only we assume predefined column names in functions.
I'd filter Frequency = 0 before grouping.
In attached file is above and "Excel variant".
5 Replies
- SergeiBaklanDiamond Contributor
Not sure about the math, but it's easy to repeat Excel sample. For such table
script which returns median (or 0.5 percentile) is
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content], allValues = Table.AddColumn( Source, "Lists", each List.Numbers([Value],[Frequency],0)), Percentile = List.Percentile( // or List.Median List.Combine(allValues[Lists]), 0.5) in Percentile
Performance could be an issue, but that's only to play with actual. Perhaps fixing in memory with Index column or Table.Buffer() could help a bit. Or wrap above with List.Buffer().
- bartvanaIron Contributor
SergeiBaklan That's genius!
I'm trying to implement this in my real life data and have only one problem, the grouping by product. Imagine you would have sample data like this and want to calculate the median by product (see also attached workbook where I added the column):- SergeiBaklanDiamond Contributor
You may group by Product without aggregation and apply former procedure to each group
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content], #"Grouped Rows" = Table.Group( Source, {"Product"}, {{"Percentile", each [ allValues = Table.AddColumn( _ , "Lists", each List.Numbers([Value],[Frequency],0) ), Percentile = List.Percentile ( List.Combine(allValues[Lists]), 0.5 ) ][Percentile] }} ) in #"Grouped Rows"
or use function instead
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content], fnPercentile = (tbl as table) => let allValues = Table.AddColumn( tbl , "Lists", each List.Numbers([Value],[Frequency],0) ), Percentile = List.Percentile ( List.Combine(allValues[Lists]), 0.5 ) in Percentile, #"Grouped Rows" = Table.Group( Source, {"Product"}, {{ "Percentile", each fnPercentile(_) }} ) in #"Grouped Rows"
Here the only we assume predefined column names in functions.
I'd filter Frequency = 0 before grouping.
In attached file is above and "Excel variant".