SOLVED

How do I find the median of a selected column of values within a worksheet

Copper Contributor

I have a worksheet that has a column of ID names. There are various numbers of line items with the same ID names.  I need to find the median of all values that share an ID name.  I know this should be simple, but I'm not a programmer and I don't think I could piece this together using online help.

10 Replies
best response confirmed by lodonnellma (Copper Contributor)
Solution

@lodonnellma 

 

Is it possible for you to upload a sample (devoid of actual names or other confidential info)? Otherwise, all you'll get is very general and it sounds very much like you would appreciate specific help.

Yes, I can do that. I need to get the de-identified dataset. I'll get that now.

@mathetes 

Here is a sample with some simple formulas I put in for illustration of what I'm trying to do. There are two tabs.  On the first tab (Differences within ID) there's a formula to calculate the differences for each line item, and a formula to calculate the median of the differences. Difference1 and Difference2 are independent of each other.  On the second tab (MedianDifferencesPerID). I just copied the median values 1 & 2 for each ID.  Finally, (I didn't mention this in my first post), I need to determine if any of the differences within an ID are outside of the range of the median value +/- 100,000.  Still interested in helping?  :)  Thanks!

@lodonnellma  this was fun.  here is a formula I used to calculate the conditional median:

 

=AVERAGE(AGGREGATE(14,6,F:F/(--($A:$A=$A2)),ROUNDUP((COUNTIF($A:$A,$A2)+1)/2,0)),AGGREGATE(14,6,F:F/(--($A:$A=$A2)),ROUNDDOWN((COUNTIF($A:$A,$A2)+1)/2,0)))

 

paste that in cell H2 and fill down and fill right into col I

But it will fill every cell with the corresponding value.  I could add an additional condition if you only want the 1st row for that Id to show the median.

That said sheet 2 can use the same basic formula:

 

=AVERAGE(AGGREGATE(14,6,'Difference within ID'!F:F/(--('Difference within ID'!$A:$A=$A2)),ROUNDUP((COUNTIF('Difference within ID'!$A:$A,$A2)+1)/2,0)),AGGREGATE(14,6,'Difference within ID'!F:F/(--('Difference within ID'!$A:$A=$A2)),ROUNDDOWN((COUNTIF('Difference within ID'!$A:$A,$A2)+1)/2,0)))

 

paste that into cell B2 and then copy/fill down and right.

Since you have the median being calculated on sheet 2 do you still need/want it on sheet 1?  

As for detecting if any individual diff is outside of the median +/- 100,000 do you want a column of cells with true/false or true/blank or use conditional formatting to highlight the individual diff cell or maybe the whole row.  There are lots of options.  

EDIT: I corrected a small typo in the second formula and attaching the workbook with the formulas in them.  I also added conditional formatting.  Because your numbers aren't anywhere close to 100,000 I added a cell J2 that will define the range that the conditional formatting will trigger at.  play with it and you will see.

@lodonnellma 

 

Interested, yes. Able to do it? Sigh...afraid not. I thought the Pivot Table would do it, but it's limited to Average, and so far as I could tell there's no readily available calculation for Median among the Pivot Table options, which surprises me. There are Standard Deviation calcs, but no Median.

 

So let's see if somebody else can finish it as you want it. Here's the Pivot Table, just so you can see what it does. FWIW, it's a very easy way to summarize data, and rarely fails to satisfy. This time it does fail.

 

P.S. Ah,, I see @mtarler has done it.

thanks for the interest. My colleague thought a pivot table too. I'm going to have a serious look-see at what @mtarler has done.

@mtarler 

Thank you.  I'm gonna give it some study and try to learn from your very thorough response.  I definitely appreciate your time and effort.  OK to post with additional questions if I need to?  

You are my hero. I'd never be able to reproduce it, but I did plug the formulas into my actual spreadsheet, and it worked. Yippee! I'm gonna look like a genius!

@lodonnellma  you're very welcome.  The aggregate function lets you do some array calculation while ignoring errors (the errors being when the look up in A isn't the same as this row and the false becomes 0 and it is dividing by 0).  I use it 2x for the 'largest' n/2 value and round up and down so if there is an even number it takes avg of those middle 2, otherwise it takes avg of the same middle number.

as for conditional formatting it is relatively straight forward

but sure feel free to ask questions :)

@lodonnellma 

I offer a solution with PQ and a solution with formulas.

 

1 best response

Accepted Solutions
best response confirmed by lodonnellma (Copper Contributor)
Solution

@lodonnellma 

 

Is it possible for you to upload a sample (devoid of actual names or other confidential info)? Otherwise, all you'll get is very general and it sounds very much like you would appreciate specific help.

View solution in original post