Forum Discussion

bartvana's avatar
bartvana
Iron Contributor
Sep 11, 2021
Solved

Percentiles from frequency table

Hello,

This is a quite common problem to solve in statistics it seems to me, but I can't find an easy solution in Excel. Am I missing something or is this just something that's not incorporated?

The problem is simple: I have a frequency table, and want to calculate percentiles.

Eg.

Score (0-5) / Number of students

0 / 3

1 / 2

2 / 1

3 / 5

4 / 6

5 / 1

To calculate percentiles, you would line the results up as follows:

0 0 0 1 1 2 3 3 3 3 3 4 4 4 4 4 4 5

You can then use the DAX PERCENTILE function (I understand there is no equivalent in M) on such an "expanded" list.

However:

1) I don't see an easy way to convert the dataset into the "expanded" dataset in M or DAX (in Excel there's https://www.got-it.ai/solutions/excel-chat/excel-tutorial/median/learn-how-to-calculate-the-median-from-a-frequency-table-in-excel)

2) Even if I would, in real life the expanded dataset would be humungous - and this only to come to a few crunched numbers in the end. In my case, the range of values (left column) goes from 1 to 53 for 10.000 products, and the frequencies (right column) go up to thousands per value, so we're talking about 10.000 lists of 100's of thousands of numbers each.

I found a mathematical explanation https://astarmathsandphysics.com/ib-maths-notes/probability-and-statistics/1063-quartiles-and-median-from-frequency-tables.html, but it doesn't seem obvious to me to get this done in Excel either...

So I'm hoping there's a formula for this?

  • bartvana 

    You may group by Product without aggregation and apply former procedure to each group

    let
        Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
        #"Grouped Rows" = Table.Group(
            Source, {"Product"},
            {{"Percentile",
                    each
                    [
                        allValues = Table.AddColumn(
                        _ ,
                        "Lists",
                        each List.Numbers([Value],[Frequency],0)
                        ),
                        Percentile = List.Percentile (
                            List.Combine(allValues[Lists]), 0.5
                        )                
                    ][Percentile]
            }}
        )
    in
        #"Grouped Rows"

    or use function instead

    let
        Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    
        fnPercentile = (tbl as table) => 
            let
                allValues = Table.AddColumn(
                    tbl , "Lists",
                        each List.Numbers([Value],[Frequency],0)
                ),
                Percentile = List.Percentile (
                    List.Combine(allValues[Lists]), 0.5
                )                
            in
                Percentile,
    
        #"Grouped Rows" = Table.Group(
            Source, {"Product"},
            {{ "Percentile", each fnPercentile(_) }}
        )
    in
        #"Grouped Rows"

    Here the only we assume predefined column names in functions.

     

    I'd filter Frequency = 0 before grouping.

    In attached file is above and "Excel variant".

5 Replies

  • SergeiBaklan's avatar
    SergeiBaklan
    Diamond Contributor

    bartvana 

    Not sure about the math, but it's easy to repeat Excel sample. For such table

    script which returns median (or 0.5 percentile) is

    let
        Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
        allValues = Table.AddColumn(
            Source, "Lists",
            each List.Numbers([Value],[Frequency],0)),
        Percentile = List.Percentile(            // or List.Median
            List.Combine(allValues[Lists]), 0.5)
    in
        Percentile

    Performance could be an issue, but that's only to play with actual. Perhaps fixing in memory with Index column or Table.Buffer() could help a bit. Or wrap above with List.Buffer().

     

    • bartvana's avatar
      bartvana
      Iron Contributor

      SergeiBaklan That's genius!
      I'm trying to implement this in my real life data and have only one problem, the grouping by product. Imagine you would have sample data like this and want to calculate the median by product (see also attached workbook where I added the column):

       

      • SergeiBaklan's avatar
        SergeiBaklan
        Diamond Contributor

        bartvana 

        You may group by Product without aggregation and apply former procedure to each group

        let
            Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
            #"Grouped Rows" = Table.Group(
                Source, {"Product"},
                {{"Percentile",
                        each
                        [
                            allValues = Table.AddColumn(
                            _ ,
                            "Lists",
                            each List.Numbers([Value],[Frequency],0)
                            ),
                            Percentile = List.Percentile (
                                List.Combine(allValues[Lists]), 0.5
                            )                
                        ][Percentile]
                }}
            )
        in
            #"Grouped Rows"

        or use function instead

        let
            Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
        
            fnPercentile = (tbl as table) => 
                let
                    allValues = Table.AddColumn(
                        tbl , "Lists",
                            each List.Numbers([Value],[Frequency],0)
                    ),
                    Percentile = List.Percentile (
                        List.Combine(allValues[Lists]), 0.5
                    )                
                in
                    Percentile,
        
            #"Grouped Rows" = Table.Group(
                Source, {"Product"},
                {{ "Percentile", each fnPercentile(_) }}
            )
        in
            #"Grouped Rows"

        Here the only we assume predefined column names in functions.

         

        I'd filter Frequency = 0 before grouping.

        In attached file is above and "Excel variant".

Resources