SOLVED

R-Squared for Nonlinear Trendlines in Graphs

Copper Contributor

The r-squared calculation in Excel for nonlinear trendlines previously displayed the quantity developed using the linear regression model of the transformed data. Can anyone point me to the updated equation used to define r-squared in a nonlinear trendline? Below is a example of an exponential trend where the r-squared displayed on the log-linear graph matches the calculated value in cell L7, however, is not consistent with the r-squared in the linear-linear trend despite equivalence of the regression coefficients.

 

jmt188_0-1632996703571.png

6 Replies
I can probably help you. But I am too lazy to retype your data and reverse-engineer your formulas.

Please attach an Excel file (not an image) that demonstrates the problem.

@Joe User Please see the attached file. The y-values include random noise in an effort to ensure r-squared values less than unity. That said, all equations should update each time the worksheet is modified. Thanks for the help.

@jmt188  wrote: ``Please see the attached file. The y-values include random noise``

 

Thanks for the file.  And I'm happy to find someone (else) who knows how to create credible examples with "random noise".  That is my modus operandi.

 

May I suggest an improvement?  See the highlighted changes that I made in the "orig" worksheet in the attached file.  Note: I "protected" the worksheet so that I don't modify it accidentally.  But the password is empty.  So just right-click the tab and click Unprotect, if you wish.

 

In a nutshell, the random data is created "on the side" (column N), and initially, the actual data (column B) references column N with formulas of the form =N2.

 

The advantage is: when we find the example that we want to discuss, copy-and-paste-value column N into column O, and change column B to reference column O with formulas of the form =O2.

 

That ensures that we all see the same example.  And it makes it easy to switch between new random data and one or more snapshots.

 

-----

Back to your question....

 

Unfortunately, I do not understand what you are trying to do in columns F:L.  So I have taken the liberty of redesigning the worksheet.  The following refers to my "new" worksheet.

 

(If I have missed an important point, and you want assistance, feel free to explain the intent of columns F:L.  And I apologize in advance for my denseness.)

 

I will try to be brief, because there is a lot that I could explain, but I want to "rush ahead" to answer your question.

 

LMK if you need more clarification.  I am happy to explain in more detail.  But I get the sense that you are very capable of filling in the details yourself.

 

-----

@jmt188  wrote originally: ``Can anyone point me to the updated equation used to define r-squared in a nonlinear trendline?``

 

There has indeed been a change in how the trendline R^2 is calculated in Office 365 Excel and other recent versions of Excel vs. older versions Excel like 2010, which I use.  I could explain that in more detail.  But I do not believe that is germane to your question.

 

Nevertheless, please LMK what version(s) of Excel you use, in case it is needed for any follow-up discussion.

 

The difference that you see can be explained by the difference highlighted in E4:E5 and compared with M4:M5.

 

For the "log-linear" chart on the right (chart #2), the trendline R^2 is the correlation coefficent (really the coefficient of determination) between the actual data for "y2" in column J and the estimated data in column K.  That can be calculated with the RSQ function, as demonstrated in M5.

 

The estimated data for y2 is based on the formula of the form y = a*x + ln(b).  The LINEST coefficients are calculated in M3:M4.

 

For the "linear-linear" chart on the left (chart #1), the trendline R^2 is the correlation coefficient (again, really the coefficient of determination) between the actual data for "y1" in column B and the estimated data in column C.  That can be calculated with the RSQ function, as demonstrated in E5.

 

The estimated data for y1 is based on the nonlinear formula of the form y = b*exp(a*x).

 

But note that the coefficient of determination for y1 that LINEST returns in E4 is the trendline R^2 for chart #2 and the LINEST CoD and RSQ in M4:M5 for y2.

 

That is because the coefficients for estimated y1 are derived from the same linear formula for estimated y2.

 

That is, the linear equation for deriving "a" and "b" for estimated y1 is of the form ln(y) = a*x + ln(b), and estimated ln(y1) equals estimated y2 because the original data y2 = ln(y1).

 

For estimated y1, the coefficent b is EXP(F3), where F3 is ln(b).

 

I hope that is helpful.  I rushed through it very quickly, I know.  Again, LMK if you want more detail and a slower presentation.

 

best response confirmed by jmt188 (Copper Contributor)
Solution

@jmt188.... Note that I just made some significant edits to my previous response in order to improve(?) the explanation.  If you had trouble following the explanation previously, you might want to reread the edited response.

@Joe User Thank you for identifying the cause of the observed discrepancy between my calculation and the values presented in the graph. I was not aware that Excel had modified the r-squared value associated with nonlinear trendlines to the Pearson Coefficient as calculated in linear-linear space. As noted in your reply, the Pearson Coefficient and Coefficient of Determination are equivalent in the absence of a data transformation. This equivalence, coupled with the modified definition in Excel resulted in differences when reopening older worksheets. Thank you again for clarifying the issue as well as providing some useful recommendations going forward.

@jmt188  wrote: ``I was not aware that Excel had modified the r-squared value associated with nonlinear trendlines to the Pearson Coefficient as calculated in linear-linear space. As noted in your reply, the Pearson Coefficient and Coefficient of Determination are equivalent in the absence of a data transformation.``

 

Just to clarify, I am not saying that the trendline R^2 calculates the square of the Pearson correlation coefficient per se, instead of the CoD.

 

My use of RSQ was a simplification, relying on the equivalence that you mention, which applies to your examples.

 

We know that LINEST R^2 calculates the CoD.  See the help page.

 

And according to my comments in another discussion (click here), I had concluded that the trendline R^2 also calculates the CoD.

 

I believe there have been two changes ("corrections") to the trendline R^2 calculation in Office 365 Excel and more recent versions.

 

1. The trendline R^2 has adopted the different CoD calculation that LINEST uses when a zero intercept is specified. Again, see the LINEST help page.

 

Thus, the trendline R^2 and LINEST R^2 will agree when a zero intercept is specified.

 

2. The trendline R^2 calculation is based on the correlation between the original y and the estimated y generated by the trendline formula, which might not be linear.

 

In contrast, using your chart #1 trendline for example, the Excel 2010 trendline R^2 is based on the correlation between the original y and the estimated ln(y) = a*x + ln(b) instead of the estimated y = b*exp(a*x).

 

(As does the LINEST R^2 (CoD), since it cannot know that the linear relationship between Y and X is derived from a nonlinear relationship.)

 

It is #2 that is relevant to the difference that you observed in the two charts, not the use of RSQ instead of CoD.

 

1 best response

Accepted Solutions
best response confirmed by jmt188 (Copper Contributor)
Solution

@jmt188.... Note that I just made some significant edits to my previous response in order to improve(?) the explanation.  If you had trouble following the explanation previously, you might want to reread the edited response.

View solution in original post