SOLVED

Same XY scatter but different R square

Copper Contributor

I have a set of X and Y, as below.

XY
0.050.28344174663489400
0.51.26762275157997000
13.43248288736777000
29.69228927203065000
3.519.16153253754330000
526.01969057665260000


In my computer trying Office 2013, 2016, and 2019, all the R square of the linear trend is 0.9886, while in my friends' computers, it's 0.9945. We used SAS and SPSS to calculate, which also showed R2=0.9945.

We set the intercept = 0.

 

Is anything wrong with my computer?

How can I fix it?

 

6 Replies

@linnan5945 

I get 0.9945 in Excel 2019 with your data.

S0515.png

What do you see if you open the attached workbook?

@linnan5945

 

I do not have the same Excel versions, but FWIW, I can see the same difference between the R^2 calculated with LINEST and the chart trendline.

 

See the image below.  Also see the Excel file; but caveat: you might get different results if the file is recalculated when you open it in another version of Excel.

 

I cannot offer a dispositive explanation at this time.

 

But the wikipage (click here) explains that there ``are several definitions of R2 that are only sometimes equivalent``.  One such difference is the calculation of the "correlation coefficient" (r) v. "coefficient of determination" (R).

 

However, IMHO, the wikipage is "confusing" (read: dubious). I would need more time to vet the information and to test its applicability to the difference that we see.

 

R2 zero intercept 2.jpg

 

 

Wow cool.
So how can I change the trendline to use LINEST result in my excel?

@linnan5945  wrote:  ``Wow cool``

 

I actually have more details to share.  I hope to post later.

 

-----

 

@linnan5945  wrote:  ``So how can I change the trendline to use LINEST result in my excel?``

 

We cannot.  All I can offer is a difficult work-around, that might not be satisfactory.

 

I would let the chart create the trendline label.  Then I would edit it and overwrite the chart R^2 with the LINEST R^2.  The trendline label is just a textbox.

 

You can calculate the LINEST R^2 with the following formula (normally-entered), if you wish:

 

=INDEX(LINEST(B2:B7, A2:A7, FALSE, TRUE), 3, 1)

best response confirmed by allyreckerman (Microsoft)
Solution

Tying up loose ends for posterity....

 

@linnan5945  wrote: ``In my computer trying Office 2013, 2016, and 2019, all the R square of the linear trend is 0.9886, while in my friends' computers, it's 0.9945. We used SAS and SPSS to calculate, which also showed R2=0.9945. We set the intercept = 0.``

 

I wrote: ``the [coefficient of determination] wikipage (click here) explains that there "are several definitions of R^2 that are only sometimes equivalent". One such difference is the calculation of the "correlation coefficient" (r) v. "coefficient of determination" (R).``

 

That is, indeed, the reason for the difference, to a degree.

 

Excel RSQ calculates the square of the Pearson correlation coefficient (Excel PEARSON).

 

But LINEST R^2 and the linear trendline R^2 are based on the coefficient of determination (CoD).

 

When "zero intercept" is not specified -- that is, when there is a non-zero intercept -- all 3 statistics are the same, within the limits of floating-point arithmetic.

 

But when "zero intercept" is specified -- that is, when there is no intercept -- CoD differs from the Excel RSQ.

 

And more to the point, there is some debate among statisticians about how to calculate the CoD when a zero intercept is specified.

 

First, note that LINEST R^2 and the linear trendline R^2 are __not__ the square of any number, whereas Excel RSQ is. "R^2" is simply the notation that statisticians use for the CoD. I will use the term "CoD" (duh!) to avoid confusion.

 

The distinction is significant because the general method of calculating the CoD can result in __negative__ values for "R^2", whereas Excel RSQ is always non-negative.

 

The negative "R^2" (CoD) might seem like an error, because the square of any real number cannot be negative. But it is not an error, for the very reason that the CoD is not the square of any number.

 

Nevertheless, some statiscians justify the alternative method of calculating the CoD, in part, in order to avoid the negative CoD.

 

The method of calculating Excel RSQ is described in the RSQ help page (click here).

 

The methods of calculating the CoD is described in the LINEST help page (click here)

 

In general, the CoD is calculated by the formula 1 - SSres/SStot, or equivalently SSreg/SStot. SStot is Sigma((Y - avgY)^2), where "Y" is the original data.

 

That formula is always used to calculate the linear trendline R^2, at least in Excel 2010 (which I use) and in 2013, 2016 and 2019 (according to you).

 

That formula is also used to calculate LINEST R^2 when "zero intercept" is not specified (const = TRUE).

 

However, when "zero intercept" is specified (const = FALSE), LINEST calculates SStot differently, namely: Sigma(Y^2).

 

An explanation of that formula can be found in an academic article (click here), which is a heavy read (i.e. TMI).

 

That is why LINEST R^2 and the linear trendline R^2 differ in Excel 2010, 2013, 2016 and 2019, among others(?).

 

However, I am told that in Office 365 Excel, the calculation of the linear trendline R^2 has been changed to agree with LINEST R^2 when "zero intercept" is specified.

 

So for Office 365 users, the two values of "R^2" (CoD) should be the same, within the limits of floating-point arithmetic.

 

-----

 

From the "coefficient of determination" wikipage....

 

wiki coeff of determ defn.jpg

 

From the LINEST help page ....

 

linest r2 defn 2.jpg

 

From the RSQ help page....

 

excel rsq defn.jpg

 

@Joe User 

This is brilliant.

Now I understand why they are different and find the way to figure it out.

You totally answered my questions and solved my problems.

Thank you so so so so so much.

 

1 best response

Accepted Solutions
best response confirmed by allyreckerman (Microsoft)
Solution

Tying up loose ends for posterity....

 

@linnan5945  wrote: ``In my computer trying Office 2013, 2016, and 2019, all the R square of the linear trend is 0.9886, while in my friends' computers, it's 0.9945. We used SAS and SPSS to calculate, which also showed R2=0.9945. We set the intercept = 0.``

 

I wrote: ``the [coefficient of determination] wikipage (click here) explains that there "are several definitions of R^2 that are only sometimes equivalent". One such difference is the calculation of the "correlation coefficient" (r) v. "coefficient of determination" (R).``

 

That is, indeed, the reason for the difference, to a degree.

 

Excel RSQ calculates the square of the Pearson correlation coefficient (Excel PEARSON).

 

But LINEST R^2 and the linear trendline R^2 are based on the coefficient of determination (CoD).

 

When "zero intercept" is not specified -- that is, when there is a non-zero intercept -- all 3 statistics are the same, within the limits of floating-point arithmetic.

 

But when "zero intercept" is specified -- that is, when there is no intercept -- CoD differs from the Excel RSQ.

 

And more to the point, there is some debate among statisticians about how to calculate the CoD when a zero intercept is specified.

 

First, note that LINEST R^2 and the linear trendline R^2 are __not__ the square of any number, whereas Excel RSQ is. "R^2" is simply the notation that statisticians use for the CoD. I will use the term "CoD" (duh!) to avoid confusion.

 

The distinction is significant because the general method of calculating the CoD can result in __negative__ values for "R^2", whereas Excel RSQ is always non-negative.

 

The negative "R^2" (CoD) might seem like an error, because the square of any real number cannot be negative. But it is not an error, for the very reason that the CoD is not the square of any number.

 

Nevertheless, some statiscians justify the alternative method of calculating the CoD, in part, in order to avoid the negative CoD.

 

The method of calculating Excel RSQ is described in the RSQ help page (click here).

 

The methods of calculating the CoD is described in the LINEST help page (click here)

 

In general, the CoD is calculated by the formula 1 - SSres/SStot, or equivalently SSreg/SStot. SStot is Sigma((Y - avgY)^2), where "Y" is the original data.

 

That formula is always used to calculate the linear trendline R^2, at least in Excel 2010 (which I use) and in 2013, 2016 and 2019 (according to you).

 

That formula is also used to calculate LINEST R^2 when "zero intercept" is not specified (const = TRUE).

 

However, when "zero intercept" is specified (const = FALSE), LINEST calculates SStot differently, namely: Sigma(Y^2).

 

An explanation of that formula can be found in an academic article (click here), which is a heavy read (i.e. TMI).

 

That is why LINEST R^2 and the linear trendline R^2 differ in Excel 2010, 2013, 2016 and 2019, among others(?).

 

However, I am told that in Office 365 Excel, the calculation of the linear trendline R^2 has been changed to agree with LINEST R^2 when "zero intercept" is specified.

 

So for Office 365 users, the two values of "R^2" (CoD) should be the same, within the limits of floating-point arithmetic.

 

-----

 

From the "coefficient of determination" wikipage....

 

wiki coeff of determ defn.jpg

 

From the LINEST help page ....

 

linest r2 defn 2.jpg

 

From the RSQ help page....

 

excel rsq defn.jpg

 

View solution in original post