#NUM! error in regression analysis p-value

JoeUser2004

Bronze Contributor

Apr 02, 2024

With your data, both t-stat and p-value should display a #DIV/0 error for both the x and intercept coefficients because for all x-values, the original y-values in Q6:Q35 exactly equals the estimated y-values, which is calculated by the regression formula estY = m*x + b.

Apparently, the Data Analysis Regression tool arbitrarily replaces those errors with 65535 for t-stat and #NUM for p-value.

We know that 65535 is arbitrary because if it were the actual t-stat, the p-value would be TDIST(65535, 30-2, 2), which returns 3.7449E-116, not an Excel error.

(Also, 65535 is the largest unsigned 16-bit integer. That's a suspicious "coincidence".)

The p-value calculation returns a #DIV/0 error because it references the t-stat result, which is #DIV/0. The p-value is calculated by the formula =TDIST(tStat, n-2, 2), where n is the number of observations in B8.

The t-stat result is #DIV/0 because it is calculated by the formula =coeff / stdErr, and the std err is zero.

The std err is zero because it is calculated, in part, by the formula

=SQRT(SUMPRODUCT((Q6:Q35 - R6:R35)^2)*.../....

The SUMPRODUCT is the sum of the squared difference (SSE) between the original y-values in Q6:Q35 and the estimated y-values estY = m*x + b, which I calculate in R6:R35.

In your case, with m=-1, b=0 and x = -y, estY equals y for all values of x. Consequently, the SSE is zero.

(Aside.... In "dynamic-array aware" versions of Excel, SUMPRODUCT can be replaced with SUM, which might be clearer.)

-----

Just for grins, we can "correct" this behavior with a very small change to just one y-value.

For example, if the first y-value is 0.0004897000001 instead of 0.0004897, but the first x-value continues to be -0.0004897, both the t-stat and p-value are computable because the std err is no longer zero.

Note: I am __not__ suggesting such "corrections" as a work-around. It is merely a "proof of concept".

TMI.... With the y-value change, the resultign zero p-value for the x coefficient is due to numerical limitations of 64-bit binary floating-point, which is how numbers are represented internally. If the t-stat were 4.80E+11, the p-value would be 2.291E-308, which is very close to the smallest possible number (2.225E-308). Since the absolute value of the t-stat for the x coefficient is much larger (6.668E+12), the p-value calculation "underflows" and results in zero.

Forum Discussion

#NUM! error in regression analysis p-value

Resources