Forum Discussion
MikeJames630
Apr 01, 2024Copper Contributor
#NUM! error in regression analysis p-value
Hey all - I am running a regression analysis in excel for 30 data points and I am getting a #NUM! error in my p-value. what is really odd is if I exclude the last row and only include 29 data points,...
MikeJames630
Apr 01, 2024Copper Contributor
Thanks for suggestion of onedrive. I was struggling to upload the excel file to begin with. I've added the link to onedrive with the file in the post now. Here it is as well https://1drv.ms/f/s!ArvB_gNCvl6UayCd5VRZF-iYXCE?e=iKaf0X
JoeUser2004
Apr 02, 2024Bronze Contributor
With your data, both t-stat and p-value should display a #DIV/0 error for both the x and intercept coefficients because for all x-values, the original y-values in Q6:Q35 exactly equals the estimated y-values, which is calculated by the regression formula estY = m*x + b.
Apparently, the Data Analysis Regression tool arbitrarily replaces those errors with 65535 for t-stat and #NUM for p-value.
We know that 65535 is arbitrary because if it were the actual t-stat, the p-value would be TDIST(65535, 30-2, 2), which returns 3.7449E-116, not an Excel error.
(Also, 65535 is the largest unsigned 16-bit integer. That's a suspicious "coincidence".)
The p-value calculation returns a #DIV/0 error because it references the t-stat result, which is #DIV/0. The p-value is calculated by the formula =TDIST(tStat, n-2, 2), where n is the number of observations in B8.
The t-stat result is #DIV/0 because it is calculated by the formula =coeff / stdErr, and the std err is zero.
The std err is zero because it is calculated, in part, by the formula
=SQRT(SUMPRODUCT((Q6:Q35 - R6:R35)^2)*.../....
The SUMPRODUCT is the sum of the squared difference (SSE) between the original y-values in Q6:Q35 and the estimated y-values estY = m*x + b, which I calculate in R6:R35.
In your case, with m=-1, b=0 and x = -y, estY equals y for all values of x. Consequently, the SSE is zero.
(Aside.... In "dynamic-array aware" versions of Excel, SUMPRODUCT can be replaced with SUM, which might be clearer.)
-----
Just for grins, we can "correct" this behavior with a very small change to just one y-value.
For example, if the first y-value is 0.0004897000001 instead of 0.0004897, but the first x-value continues to be -0.0004897, both the t-stat and p-value are computable because the std err is no longer zero.
Note: I am __not__ suggesting such "corrections" as a work-around. It is merely a "proof of concept".
TMI.... With the y-value change, the resultign zero p-value for the x coefficient is due to numerical limitations of 64-bit binary floating-point, which is how numbers are represented internally. If the t-stat were 4.80E+11, the p-value would be 2.291E-308, which is very close to the smallest possible number (2.225E-308). Since the absolute value of the t-stat for the x coefficient is much larger (6.668E+12), the p-value calculation "underflows" and results in zero.