Discussion Re: Percentiles from frequency table in Excel
https://techcommunity.microsoft.com/t5/excel/percentiles-from-frequency-table/m-p/2742279#M113551
<P><LI-USER uid="863782"></LI-USER> you are welcome, glad to help.</P>
<P> </P>Sat, 11 Sep 2021 12:47:07 GMTSergei Baklan2021-09-11T12:47:07ZPercentiles from frequency table
https://techcommunity.microsoft.com/t5/excel/percentiles-from-frequency-table/m-p/2741999#M113528
<P>Hello,</P><P>This is a quite common problem to solve in statistics it seems to me, but I can't find an easy solution in Excel. Am I missing something or is this just something that's not incorporated?</P><P>The problem is simple: I have a frequency table, and want to calculate percentiles.</P><P>Eg.</P><P>Score (0-5) / Number of students</P><P>0 / 3</P><P>1 / 2</P><P>2 / 1</P><P>3 / 5</P><P>4 / 6</P><P>5 / 1</P><P>To calculate percentiles, you would line the results up as follows:</P><P>0 0 0 1 1 2 3 3 3 3 3 4 4 4 4 4 4 5</P><P>You can then use the DAX PERCENTILE function (I understand there is no equivalent in M) on such an "expanded" list.</P><P>However:</P><P>1) I don't see an easy way to convert the dataset into the "expanded" dataset in M or DAX (in Excel there's <A href="https://www.got-it.ai/solutions/excel-chat/excel-tutorial/median/learn-how-to-calculate-the-median-from-a-frequency-table-in-excel" target="_blank" rel="noopener">this "hack"</A>)</P><P>2) Even if I would, in real life the expanded dataset would be humungous - and this only to come to a few crunched numbers in the end. In my case, the range of values (left column) goes from 1 to 53 for 10.000 products, and the frequencies (right column) go up to thousands per value, so we're talking about 10.000 lists of 100's of thousands of numbers each.</P><P>I found a mathematical explanation <A href="https://astarmathsandphysics.com/ib-maths-notes/probability-and-statistics/1063-quartiles-and-median-from-frequency-tables.html" target="_blank" rel="noopener">here</A>, but it doesn't seem obvious to me to get this done in Excel either...</P><P>So I'm hoping there's a formula for this?</P>Sat, 11 Sep 2021 07:18:43 GMThttps://techcommunity.microsoft.com/t5/excel/percentiles-from-frequency-table/m-p/2741999#M113528bartvana2021-09-11T07:18:43ZRe: Percentiles from frequency table
https://techcommunity.microsoft.com/t5/excel/percentiles-from-frequency-table/m-p/2742134#M113537
<P><LI-USER uid="863782"></LI-USER> </P>
<P>Not sure about the math, but it's easy to repeat Excel sample. For such table</P>
<P><span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="image.png" style="width: 150px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/309716i7FD53EE1D0DF6686/image-size/large?v=v2&px=999" role="button" title="image.png" alt="image.png" /></span></P>
<P>script which returns median (or 0.5 percentile) is</P>
<LI-CODE lang="powerquery">let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
allValues = Table.AddColumn(
Source, "Lists",
each List.Numbers([Value],[Frequency],0)),
Percentile = List.Percentile( // or List.Median
List.Combine(allValues[Lists]), 0.5)
in
Percentile</LI-CODE>
<P>Performance could be an issue, but that's only to play with actual. Perhaps fixing in memory with Index column or Table.Buffer() could help a bit. Or wrap above with List.Buffer().</P>
<P> </P>Sat, 11 Sep 2021 09:00:25 GMThttps://techcommunity.microsoft.com/t5/excel/percentiles-from-frequency-table/m-p/2742134#M113537Sergei Baklan2021-09-11T09:00:25ZRe: Percentiles from frequency table
https://techcommunity.microsoft.com/t5/excel/percentiles-from-frequency-table/m-p/2742228#M113542
<P><LI-USER uid="521"></LI-USER> That's genius!<BR />I'm trying to implement this in my real life data and have only one problem, the grouping by product. Imagine you would have sample data like this and want to calculate the median by product (see also attached workbook where I added the column):</P><P><span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="bartvana_0-1631359251920.png" style="width: 400px;"><img src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/309719iAD0A5DF36941D989/image-size/medium?v=v2&px=400" role="button" title="bartvana_0-1631359251920.png" alt="bartvana_0-1631359251920.png" /></span></P><P> </P>Sat, 11 Sep 2021 11:22:29 GMThttps://techcommunity.microsoft.com/t5/excel/percentiles-from-frequency-table/m-p/2742228#M113542bartvana2021-09-11T11:22:29ZRe: Percentiles from frequency table
https://techcommunity.microsoft.com/t5/excel/percentiles-from-frequency-table/m-p/2742243#M113544
<P><LI-USER uid="863782"></LI-USER> </P>
<P>You may group by Product without aggregation and apply former procedure to each group</P>
<LI-CODE lang="excel-formula">let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Grouped Rows" = Table.Group(
Source, {"Product"},
{{"Percentile",
each
[
allValues = Table.AddColumn(
_ ,
"Lists",
each List.Numbers([Value],[Frequency],0)
),
Percentile = List.Percentile (
List.Combine(allValues[Lists]), 0.5
)
][Percentile]
}}
)
in
#"Grouped Rows"</LI-CODE>
<P>or use function instead</P>
<LI-CODE lang="excel-formula">let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
fnPercentile = (tbl as table) =>
let
allValues = Table.AddColumn(
tbl , "Lists",
each List.Numbers([Value],[Frequency],0)
),
Percentile = List.Percentile (
List.Combine(allValues[Lists]), 0.5
)
in
Percentile,
#"Grouped Rows" = Table.Group(
Source, {"Product"},
{{ "Percentile", each fnPercentile(_) }}
)
in
#"Grouped Rows"</LI-CODE>
<P>Here the only we assume predefined column names in functions.</P>
<P> </P>
<P>I'd filter Frequency = 0 before grouping.</P>
<P>In attached file is above and "Excel variant".</P>Sat, 11 Sep 2021 11:51:46 GMThttps://techcommunity.microsoft.com/t5/excel/percentiles-from-frequency-table/m-p/2742243#M113544Sergei Baklan2021-09-11T11:51:46ZRe: Percentiles from frequency table
https://techcommunity.microsoft.com/t5/excel/percentiles-from-frequency-table/m-p/2742268#M113548
<P><LI-USER uid="521"></LI-USER> You've been of great help, thank you! I managed to do it.</P><P>I like to use the Power Query UI as much as possible, and also go in gradual steps so I can understand afterwards what I did, so I ended up with this to make the lists:</P><LI-CODE lang="c">let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Product", type text}, {"Value", Int64.Type}, {"Frequency", Int64.Type}}),
//List per Value
#"Added Custom" = Table.AddColumn(#"Changed Type", "ListPerValue", each List.Numbers([Value], [Frequency], 0), type list),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Frequency"}),
//List per product (combine lists per value for each product)
#"Grouped Rows" = Table.Group(#"Removed Columns", {"Product"}, {{"ListPerProduct", each List.Combine([ListPerValue]), type list}}),
//Show list to check
#"Extracted Values" = Table.TransformColumns(#"Grouped Rows", {"ListPerProduct", each Text.Combine(List.Transform(_, Text.From), ","), type text})
in
#"Extracted Values"</LI-CODE><P>First a simple "Add custom column", with the List.Number function.</P><P>Then the grouping by doing a generic sum grouping using the UI, then changing the sum function into List.Combine as you showed. The last step is just for checking the result.</P><P>Thanks again!</P><P>(Workbook attached for future reference).</P>Sat, 11 Sep 2021 12:34:50 GMThttps://techcommunity.microsoft.com/t5/excel/percentiles-from-frequency-table/m-p/2742268#M113548bartvana2021-09-11T12:34:50ZRe: Percentiles from frequency table
https://techcommunity.microsoft.com/t5/excel/percentiles-from-frequency-table/m-p/2742279#M113551
<P><LI-USER uid="863782"></LI-USER> you are welcome, glad to help.</P>
<P> </P>Sat, 11 Sep 2021 12:47:07 GMThttps://techcommunity.microsoft.com/t5/excel/percentiles-from-frequency-table/m-p/2742279#M113551Sergei Baklan2021-09-11T12:47:07Z