Nov 28 2020 07:53 AM - edited Nov 28 2020 07:54 AM
How do I get an exponential regression that minimizes R^2? (The exponential regression trendline fails to minimize R^2, and instead seems to minimize the R^2 of the linear regression on the log of the y-axis.)
Consider the following data, generated by the Python script
import math, random; print("\n".join(f"{n:<2} {math.exp(n)+int(random.uniform(0,100)):<15}" for n in range(1, 12)))
n | eapprox |
1 | 97.71828 |
2 | 101.3891 |
3 | 87.08554 |
4 | 92.59815 |
5 | 194.4132 |
6 | 410.4288 |
7 | 1174.633 |
8 | 3032.958 |
9 | 8198.084 |
10 | 22124.47 |
11 | 59973.14 |
Plotting this on a log-y scale in Excel 2019 with an exponential trendline gives
Doing the same computation in Google sheets gives
Clearly the Excel trendline fails to minimize R^2, presumably because Excel is assuming that a linear regression with unit weights on x, log(y) is the same as an exponential regression with unit weights on x, y (it's not; you need to log-scale the weights/errors as well). How do I correct this issue, and get Excel to give me the actually correct trendline?
Nov 28 2020 11:06 AM
Nov 30 2020 07:48 AM - edited Dec 02 2020 08:32 AM
@JasonGross Re: ``How do I get an exponential regression that minimizes R^2? [....] Doing the same computation in Google sheets gives [in image, effectively] y = 2.07e^(0.931x)``
I must be brief. I might add an image and attach an Excel file later. And I might add a more-explanatory response later. But in a nutshell....
Obviously, we want to maximize, not minimize R^2.
And there appears to be a typo(!) in your image. Given the form of the exponentional trendline equation as y = b*EXP(m*x), for m=0.931 (rounded), b must be 2.207 (rounded), not 2.07 [sic].
So you can mimic the Google chart by entering an intercept value (b) that rounds to 2.207.
Ignore the R^2 that appears in the chart. It is calculated differently than either LINEST or RSQ below, which seems to be how it is calculated for the Google image.
Empirically, I have determined that the intercept can be a value between 2.206561 and 2.207499, rounded to 6 decimal places, in order for it to be displayed as 2.207.
Then the slope (m) is between 0.931499 and 0.931444, truncated to 6 decimal places, which rounds to 0.931.
(In contrast, if the intercept (b) were between 2.065000 and 2.07499, which rounds to 2.07 [sic], the slope (m) would round to 0.940, not 0.931 [sic].)
Caveat: You might get slightly different results when you use the exact given y-values. I must use the y-values as you posted them, which might be rounded due to the cell format.
In Excel, the slope (m) can be calculated with the following normally-entered formula (press Enter as usual) in I11:
=LINEST(LN(B2:B12)-LN(N10), A2:A12, FALSE)
where N10 is the intercept value (b) that you choose, A2:A12 are the x-values (1 to 11), and B2:B12 are the given y-values (97.71828 to 59973.14, as you posted them).
R^2 can be calculated with the following normally-entered formula:
=RSQ(N10*EXP(I11*A2:A12), B2:B12)
where I11 contains the LINEST formula above. That displays 0.999 (rounded), like Google.
But IMHO, that is not the best "trend" formula to use. Again, perhaps I'll post more later.
Gotta run! Well, bike. (smile)