SOLVED

Percentiles from frequency table

Iron Contributor

Hello,

This is a quite common problem to solve in statistics it seems to me, but I can't find an easy solution in Excel. Am I missing something or is this just something that's not incorporated?

The problem is simple: I have a frequency table, and want to calculate percentiles.

Eg.

Score (0-5) / Number of students

0 / 3

1 / 2

2 / 1

3 / 5

4 / 6

5 / 1

To calculate percentiles, you would line the results up as follows:

0 0 0 1 1 2 3 3 3 3 3 4 4 4 4 4 4 5

You can then use the DAX PERCENTILE function (I understand there is no equivalent in M) on such an "expanded" list.

However:

1) I don't see an easy way to convert the dataset into the "expanded" dataset in M or DAX (in Excel there's this "hack")

2) Even if I would, in real life the expanded dataset would be humungous - and this only to come to a few crunched numbers in the end. In my case, the range of values (left column) goes from 1 to 53 for 10.000 products, and the frequencies (right column) go up to thousands per value, so we're talking about 10.000 lists of 100's of thousands of numbers each.

I found a mathematical explanation here, but it doesn't seem obvious to me to get this done in Excel either...

So I'm hoping there's a formula for this?

5 Replies

@bartvana 

Not sure about the math, but it's easy to repeat Excel sample. For such table

image.png

script which returns median (or 0.5 percentile) is

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    allValues = Table.AddColumn(
        Source, "Lists",
        each List.Numbers([Value],[Frequency],0)),
    Percentile = List.Percentile(            // or List.Median
        List.Combine(allValues[Lists]), 0.5)
in
    Percentile

Performance could be an issue, but that's only to play with actual. Perhaps fixing in memory with Index column or Table.Buffer() could help a bit. Or wrap above with List.Buffer().

 

@Sergei Baklan That's genius!
I'm trying to implement this in my real life data and have only one problem, the grouping by product. Imagine you would have sample data like this and want to calculate the median by product (see also attached workbook where I added the column):

bartvana_0-1631359251920.png

 

best response confirmed by bartvana (Iron Contributor)
Solution

@bartvana 

You may group by Product without aggregation and apply former procedure to each group

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Grouped Rows" = Table.Group(
        Source, {"Product"},
        {{"Percentile",
                each
                [
                    allValues = Table.AddColumn(
                    _ ,
                    "Lists",
                    each List.Numbers([Value],[Frequency],0)
                    ),
                    Percentile = List.Percentile (
                        List.Combine(allValues[Lists]), 0.5
                    )                
                ][Percentile]
        }}
    )
in
    #"Grouped Rows"

or use function instead

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],

    fnPercentile = (tbl as table) => 
        let
            allValues = Table.AddColumn(
                tbl , "Lists",
                    each List.Numbers([Value],[Frequency],0)
            ),
            Percentile = List.Percentile (
                List.Combine(allValues[Lists]), 0.5
            )                
        in
            Percentile,

    #"Grouped Rows" = Table.Group(
        Source, {"Product"},
        {{ "Percentile", each fnPercentile(_) }}
    )
in
    #"Grouped Rows"

Here the only we assume predefined column names in functions.

 

I'd filter Frequency = 0 before grouping.

In attached file is above and "Excel variant".

@Sergei Baklan You've been of great help, thank you! I managed to do it.

I like to use the Power Query UI as much as possible, and also go in gradual steps so I can understand afterwards what I did, so I ended up with this to make the lists:

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Product", type text}, {"Value", Int64.Type}, {"Frequency", Int64.Type}}),
    //List per Value
    #"Added Custom" = Table.AddColumn(#"Changed Type", "ListPerValue", each List.Numbers([Value], [Frequency], 0), type list),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Frequency"}),
    //List per product (combine lists per value for each product)
    #"Grouped Rows" = Table.Group(#"Removed Columns", {"Product"}, {{"ListPerProduct", each List.Combine([ListPerValue]), type list}}),
    //Show list to check
    #"Extracted Values" = Table.TransformColumns(#"Grouped Rows", {"ListPerProduct", each Text.Combine(List.Transform(_, Text.From), ","), type text})
in
    #"Extracted Values"

First a simple "Add custom column", with the List.Number function.

Then the grouping by doing a generic sum grouping using the UI, then changing the sum function into List.Combine as you showed. The last step is just for checking the result.

Thanks again!

(Workbook attached for future reference).

@bartvana you are welcome, glad to help.

 

1 best response

Accepted Solutions
best response confirmed by bartvana (Iron Contributor)
Solution

@bartvana 

You may group by Product without aggregation and apply former procedure to each group

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Grouped Rows" = Table.Group(
        Source, {"Product"},
        {{"Percentile",
                each
                [
                    allValues = Table.AddColumn(
                    _ ,
                    "Lists",
                    each List.Numbers([Value],[Frequency],0)
                    ),
                    Percentile = List.Percentile (
                        List.Combine(allValues[Lists]), 0.5
                    )                
                ][Percentile]
        }}
    )
in
    #"Grouped Rows"

or use function instead

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],

    fnPercentile = (tbl as table) => 
        let
            allValues = Table.AddColumn(
                tbl , "Lists",
                    each List.Numbers([Value],[Frequency],0)
            ),
            Percentile = List.Percentile (
                List.Combine(allValues[Lists]), 0.5
            )                
        in
            Percentile,

    #"Grouped Rows" = Table.Group(
        Source, {"Product"},
        {{ "Percentile", each fnPercentile(_) }}
    )
in
    #"Grouped Rows"

Here the only we assume predefined column names in functions.

 

I'd filter Frequency = 0 before grouping.

In attached file is above and "Excel variant".

View solution in original post