Excel 2007 average discrepency

Copper Contributor

I am averaging a group of numbers in Excel 2007 that have 21 columns & 42 rows. When I average all the cells at once I get one result, 59.25183, but when I average each row & then average those results I get an entirely different result, 72.791. Also if I average each column & then average those results it gives a different number, 49.231. In trying to figure out why I created another sheet with the same number of columns & rows & filled it with random numbers. All the cells were filled with numbers & when I did the above all the results were the same. But when I started deleting values in some of the cells the 3 results began to differ. The more I deleted the bigger the difference became between the 3 averages. Can anyone explain why this happens & which result is correct? To reiterate, if all the cells have numbers, the average results are exactly the same, but some cells are empty. the results are not the same. Can anyone explain why this happens?

5 Replies

@Bettypin 

Re: ``All the cells were filled with numbers & when I did the above all the results were the same. But when I started deleting values in some of the cells the 3 results began to differ``

 

That is right.  And the issue has nothing to do with Excel.  It has to do with mathematics.

 

In general, the average of averages of the parts is not the same as the average of the whole.  And the correct answer is the average of the whole. 

 

Alternatively, we can calculate a weighted average, which is the sum of each partial average times a weighting factor.  Each weighting factor is the number of values in the part divided by the number of values in the whole.

 

The exception is:  when each part has the same number of values, then the average of the partial averages equals the average of the whole.

 

Math and algebra might not be your thing.  But perhaps the following will make sense to you, anyway.

 

Consider two partial averages:  (10+20+30)/3 and (40+50)/2.

 

The average of the two partial averages is:  ( (10+20+30)/3 + (40+50)/2 ) / 2

 

That reduces to ( 2*(10+20+30) + 3*(40+50) ) / 12, where 12 = (3*2)*2.

 

In contrast, the average of the whole is (10+20+30+40+50)/5.

 

If the difference is not apparent, calculate these two values:

 

=AVERAGE(AVERAGE(10,20,30), AVERAGE(40,50))

which is the same as

=SUM( SUM(10,20,30)/3, SUM(40,50)/2 ) / 2

 

versus

 

=AVERAGE(10,20,30,40,50)

which is the same as

=SUM(10,20,30,40,50)/5

@Bettypin 

 

When all of the cells are filled in, then the individual row and column averages have the same denominator, so the average of parts is equivalent to the average of the whole.

 

Say you have a 3x3 grid of numbers (X1 through X9). Averaging the rows or columns and then averaging those results can be represented as:

JMB17_0-1618165808138.png

 

If you remember your algebra, fractions with the same denominator can be added, so this is equivalent to:

JMB17_1-1618165996633.png

 

which simplifies to this, and is the exact same as the average of the whole:

 

JMB17_2-1618166164060.png

 

When you start deleting cells, the average function excludes those cells. Say you delete X2, X3, and X4. You then have this, where each average of the parts (fractions in the numerator) have different denominators:

JMB17_3-1618166325789.png

 

Then, getting common denominators and  simplifying, you will see that averaging the parts is not the same as averaging the whole:

JMB17_4-1618166721906.png

 

 

@Joe User Thanks for the reply.  Even though it's been 50 years since my last algebra class your answer was very clear & understandable. I Googled "average of averages" & found some fascinating websites that dealt with this subject & learned about Simpson's Paradox. I was pretty sure it wasn't a problem with Excel but because of another problem with blank cells I wasn't positive. When I first d/l the data in a spreadsheet & sort from high to low by any column that has blank cells it always puts the blank cells in the top so I have to block mark the empty cells & hit delete. Then when I re-sort the high row will be at the top & the blank cells at the bottom as should be. Try as I might I can't find a way to figure out what character is in the cells so I can remove them from the entire sheet all at once.

Thank you once again

If a person can remember how to do it, algebra is very useful. My trouble is that not only do I not remember most of how to do it, I can't even remember that there's a way to do it. If I could remember that much then I could always look it up. Maybe when Mark Twain said, "There's lies, **bleep** lies, and statistics" he was thinking about people like me who can sort of do it but not correctly. One example I saw on the internet showed that David Justice's batting average was better than Derek Jeter year over year but when done correctly overall Derek's was better. It showed me that the interpretation of statistics is not as easy as it looks.
Thanks for the reply

@Bettypin 

Re: ``When I first d/l the data in a spreadsheet & sort from high to low by any column that has blank cells it always puts the blank cells in the top [....] can't find a way to figure out what character is in the cells so I can remove them``

 

I suspect the "blank" (i.e. blank-appearing) cells contain null strings or non-breaking spaces (ASCII 160; HTML nbsp) that you might have copy-and-pasted-value.

 

Confirm that formulas of the form =ISTEXT(A1) return TRUE.

 

If those are the only text in the range, you can delete them with the following procedure.

 

1. Select the range.
2. Press function key f5 (Go To), and click Special.
3. Select Constants, and deselect Numbers, Logicals and Errors.  That is, select only Text.
4. Click OK.
5. Confirm that only the blank-appearing cells are selected.  If so, press Delete.

 

Caveat:  This runs the risk of deleting legitimate text inadvertently.  It would be prudent to make a backup copy of the file first, so that you have something to fall back to if and when you discover any mistaken deletions when it is too late to undo (press ctrl+z).