Apr 11 2021 09:21 AM
Apr 11 2021 09:21 AM
I am averaging a group of numbers in Excel 2007 that have 21 columns & 42 rows. When I average all the cells at once I get one result, 59.25183, but when I average each row & then average those results I get an entirely different result, 72.791. Also if I average each column & then average those results it gives a different number, 49.231. In trying to figure out why I created another sheet with the same number of columns & rows & filled it with random numbers. All the cells were filled with numbers & when I did the above all the results were the same. But when I started deleting values in some of the cells the 3 results began to differ. The more I deleted the bigger the difference became between the 3 averages. Can anyone explain why this happens & which result is correct? To reiterate, if all the cells have numbers, the average results are exactly the same, but some cells are empty. the results are not the same. Can anyone explain why this happens?
Apr 11 2021 10:48 AM - edited Apr 13 2021 07:18 AM
Re: ``All the cells were filled with numbers & when I did the above all the results were the same. But when I started deleting values in some of the cells the 3 results began to differ``
That is right. And the issue has nothing to do with Excel. It has to do with mathematics.
In general, the average of averages of the parts is not the same as the average of the whole. And the correct answer is the average of the whole.
Alternatively, we can calculate a weighted average, which is the sum of each partial average times a weighting factor. Each weighting factor is the number of values in the part divided by the number of values in the whole.
The exception is: when each part has the same number of values, then the average of the partial averages equals the average of the whole.
Math and algebra might not be your thing. But perhaps the following will make sense to you, anyway.
Consider two partial averages: (10+20+30)/3 and (40+50)/2.
The average of the two partial averages is: ( (10+20+30)/3 + (40+50)/2 ) / 2
That reduces to ( 2*(10+20+30) + 3*(40+50) ) / 12, where 12 = (3*2)*2.
In contrast, the average of the whole is (10+20+30+40+50)/5.
If the difference is not apparent, calculate these two values:
which is the same as
=SUM( SUM(10,20,30)/3, SUM(40,50)/2 ) / 2
which is the same as
Apr 11 2021 11:51 AM
When all of the cells are filled in, then the individual row and column averages have the same denominator, so the average of parts is equivalent to the average of the whole.
Say you have a 3x3 grid of numbers (X1 through X9). Averaging the rows or columns and then averaging those results can be represented as:
If you remember your algebra, fractions with the same denominator can be added, so this is equivalent to:
which simplifies to this, and is the exact same as the average of the whole:
When you start deleting cells, the average function excludes those cells. Say you delete X2, X3, and X4. You then have this, where each average of the parts (fractions in the numerator) have different denominators:
Then, getting common denominators and simplifying, you will see that averaging the parts is not the same as averaging the whole:
Apr 12 2021 07:00 PM
@Joe User Thanks for the reply. Even though it's been 50 years since my last algebra class your answer was very clear & understandable. I Googled "average of averages" & found some fascinating websites that dealt with this subject & learned about Simpson's Paradox. I was pretty sure it wasn't a problem with Excel but because of another problem with blank cells I wasn't positive. When I first d/l the data in a spreadsheet & sort from high to low by any column that has blank cells it always puts the blank cells in the top so I have to block mark the empty cells & hit delete. Then when I re-sort the high row will be at the top & the blank cells at the bottom as should be. Try as I might I can't find a way to figure out what character is in the cells so I can remove them from the entire sheet all at once.
Thank you once again
Apr 12 2021 07:12 PM
Apr 13 2021 07:40 AM - edited Apr 13 2021 08:37 AM
Re: ``When I first d/l the data in a spreadsheet & sort from high to low by any column that has blank cells it always puts the blank cells in the top [....] can't find a way to figure out what character is in the cells so I can remove them``
I suspect the "blank" (i.e. blank-appearing) cells contain null strings or non-breaking spaces (ASCII 160; HTML nbsp) that you might have copy-and-pasted-value.
Confirm that formulas of the form =ISTEXT(A1) return TRUE.
If those are the only text in the range, you can delete them with the following procedure.
1. Select the range.
2. Press function key f5 (Go To), and click Special.
3. Select Constants, and deselect Numbers, Logicals and Errors. That is, select only Text.
4. Click OK.
5. Confirm that only the blank-appearing cells are selected. If so, press Delete.
Caveat: This runs the risk of deleting legitimate text inadvertently. It would be prudent to make a backup copy of the file first, so that you have something to fall back to if and when you discover any mistaken deletions when it is too late to undo (press ctrl+z).