Mar 31 2024 08:57 AM
Dear All,
I have been following the instructions on this YouTube video for executing the chi-squared test in Excel. It has been working well the first few times I've used it; however, now I have the #NUM! error that appears when inputting the formula for the chi-squared test statistic - as per printscreen below (you can see the formula I used in the formula bar). Baased on my research so far, it is because the p-value seems to be at zero (even if the p-value may actually be an ultra miniscule number that Excel cannot show). What can I do about this?
Many thanks,
Jenny
Mar 31 2024 10:21 AM
It might be helpful and expedite a resolution if you attach the Excel file. Use the "browse files to attach" link near the bottom of the "message" window.
Mar 31 2024 12:12 PM
Mar 31 2024 01:03 PM
@JennyLMace wrote: "this doesn't seem to be an option available to me"
Obviously, the problem is the p-value. We cannot help with that without access to all of the data and formulas in the Excel file.
(BTW, format G20 as Scientific to see the true p-value. 0.00E+00 is exactly zero. CHISQ.INV.RT can work with p-values as low as 1E-60. But your p-value shouldn't be that low, anyway.)
Upload the Excel file to a file-sharing website such as box.net/files, dropbox.com, onedrive.live.com, etc. (Not Google Sheets.) Then post a view-only download link that does not require that we login to download.
If this forum is "stoopid" enough to disallow file attachments for new users, it might not allow you to post the download link. (Sigh.)
To work-around that, copy and edit the link, removing the first few periods. For example, the "link" for this message is techcommunity microsoft com /t5/forums/replypage/board-id/ExcelGeneral/message-id/225406.
Mar 31 2024 02:22 PM
I still think it would be prudent for you provide the Excel file, working around the "stoopid" forum limitation as I explained previously.
But I might have an answer for you without it. You might confirm it on your own.
Previously, I wrote: "format G20 as Scientific to see the [actual] p-value. 0.00E+00 is exactly zero. CHISQ.INV.RT [with 8 degrees of freedom] can work with p-values as low as 1E-60. But your p-value shouldn't be that low".
Apparently, that last assertion is wrong. But the key is indeed to format the p-value as Scientific, not Number, in order to see its actual value.
Using the video that you cited (which is very good, BTW), I generated random data that actually has a strong dependency on having a college education. My p-value is about 3E-110 (!).
So the p-value certainly can be "that low".
Nevertheless, CHISQ.INV.RT works in my case, probably because the degrees of freedom is 2, not 8.
But my point is: in your case, if there is a strong dependency among the data, perhaps the p-value is indeed so small that CHISQ.INV.RT will return #NUM.
Off-hand, I cannot say whether that can or should not be the case. Perhaps not mathematically. But perhaps it is a numerical compuational limitation.
Again, without the Excel file to look, I can only speculate.
Mar 31 2024 02:50 PM
Apr 01 2024 01:39 AM - edited Apr 01 2024 01:41 AM
I was right: chisq.test returns exactly zero for the p-value because it encounters a numerical computation limitation. But that should not affect your interpretation with respect to the null hypothesis.
And I suspect that SPSS would encounter the same numerical limitation, since most applications use the same internal numerical representation (64-bit binary floating-point).
On the other hand, the #NUM error from chisq.inv.rt seems to be due to an arbitrary limitation of the internal algorithm. SPSS might behave differently.
But since we have the actual and expected data, we can calculate the chisq statistic directly with the following formula in G21:
=SUMPRODUCT( (G4:I8 - G12:I16)^2 / G12:I16 )
That returns 1453.99071.
Aside.... You can replace SUMPRODUCT with SUM in "dynamic-array aware" versions of Excel. That might be clearer.
-----
TMI.... With the chisq statistic in G21, we could calculate the p-value in G20 with the formula =CHISQ.DIST.RT(G21, 8). Of course, we might as well use CHISQ.TEST. But the CHISQ.DIST.RT formula gives us insight into the numerical limitation.
As we arbitrarily increase the chisq statistic from 1452, we see that the p-value gets infinitesimally smaller until it reaches 2.22508E-308 when the chisq statistic is 1452.74596. 2.22507E-308 (2^-1022) is the smallest number that we can represent with 64-bit binary floating-point. (But Excel does not allow us to enter 2.22507E-308 manually, an arbitrary limitation.)
Likewise, we find that 4.24870E-63 is the smallest p-value that works with =CHISQ.INV.RT(G20, 8). With that arbitrary p-value in G20, CHISQ.INV.RT returns the chisq statistic 314.0252679. Obviously, that is significantly less than the numerically-limited value of 1452.74596.
Apr 01 2024 09:26 AM
Apr 01 2024 09:41 AM
Apr 01 2024 12:39 PM
@JennyLMace wrote: "import my data to SPSS to see if a similar error message occurred. I'm pleased to report it did not"
In Excel, the "error message" per se arises when we calculate the chisq statistic based on the p-value alone. That is, CHISQ.INV.RT.
Is that the "error message" that you refer to?
What did you input to SPSS for that calculation: (1) the p-value alone; or (2) the actual and expected values, or the original data?
I am not familiar with SPSS. Even if you __think__ you did #1, is it possible that SPSS did #2 because it is aware of the related data (context of the "session")?
Just for grins, I would be interested in the result when SPSS calculates the chisq statistic without importing any data. IOW, just enter the p-value, ideally with 15 significant digits of precision.
Is that possible in SPSS?
-----
In any case, what p-value does SPSS return for your data?
Exactly zero (0.00E+00), as Excel does?
If not, please show us the SPSS p-value in Scientific notation, ideally with 5 or more decimal places (up to 15 significant digits).
Just curious. Thanks.
Apr 02 2024 02:09 AM
Apr 02 2024 07:38 AM
@JennyLMace wrote: "Sorry Joe, I can't prioritise replying thoroughly right now"
I understand. And thanks; I think you answered my questions sufficiently. Since SPSS is aware of all of the data, it probably calculates the chisq statistic with SUM((actual-expected)^2/expected), which works fine in Excel as well, not with an SPSS equivalent of chisq.inv(pValue,df), which is "numerically-challenged" in Excel (wink). And you confirmed that with such a large chisq statistic, SPSS hits the same numerical limitations as Excel, and it returns a zero p-value.
Bottom line: In Excel, do not use CHISQ.INV.RT(pValue, df) to calculate the chisq statistic, if we have the data. It is unreliable. Use the SUM formula instead. Good to know. Thanks again.