Jun 17 2021 12:33 AM
I have a set of X and Y, as below.
X | Y |
0.05 | 0.28344174663489400 |
0.5 | 1.26762275157997000 |
1 | 3.43248288736777000 |
2 | 9.69228927203065000 |
3.5 | 19.16153253754330000 |
5 | 26.01969057665260000 |
In my computer trying Office 2013, 2016, and 2019, all the R square of the linear trend is 0.9886, while in my friends' computers, it's 0.9945. We used SAS and SPSS to calculate, which also showed R2=0.9945.
We set the intercept = 0.
Is anything wrong with my computer?
How can I fix it?
Jun 17 2021 02:36 AM
I get 0.9945 in Excel 2019 with your data.
What do you see if you open the attached workbook?
Jun 17 2021 10:00 AM - edited Jun 17 2021 08:20 PM
I do not have the same Excel versions, but FWIW, I can see the same difference between the R^2 calculated with LINEST and the chart trendline.
See the image below. Also see the Excel file; but caveat: you might get different results if the file is recalculated when you open it in another version of Excel.
I cannot offer a dispositive explanation at this time.
But the wikipage (click here) explains that there ``are several definitions of R2 that are only sometimes equivalent``. One such difference is the calculation of the "correlation coefficient" (r) v. "coefficient of determination" (R).
However, IMHO, the wikipage is "confusing" (read: dubious). I would need more time to vet the information and to test its applicability to the difference that we see.
Jun 17 2021 05:34 PM
Jun 17 2021 08:19 PM
@linnan5945 wrote: ``Wow cool``
I actually have more details to share. I hope to post later.
-----
@linnan5945 wrote: ``So how can I change the trendline to use LINEST result in my excel?``
We cannot. All I can offer is a difficult work-around, that might not be satisfactory.
I would let the chart create the trendline label. Then I would edit it and overwrite the chart R^2 with the LINEST R^2. The trendline label is just a textbox.
You can calculate the LINEST R^2 with the following formula (normally-entered), if you wish:
=INDEX(LINEST(B2:B7, A2:A7, FALSE, TRUE), 3, 1)
Jun 22 2021 08:18 AM - edited Jun 22 2021 11:16 PM
SolutionTying up loose ends for posterity....
@linnan5945 wrote: ``In my computer trying Office 2013, 2016, and 2019, all the R square of the linear trend is 0.9886, while in my friends' computers, it's 0.9945. We used SAS and SPSS to calculate, which also showed R2=0.9945. We set the intercept = 0.``
I wrote: ``the [coefficient of determination] wikipage (click here) explains that there "are several definitions of R^2 that are only sometimes equivalent". One such difference is the calculation of the "correlation coefficient" (r) v. "coefficient of determination" (R).``
That is, indeed, the reason for the difference, to a degree.
Excel RSQ calculates the square of the Pearson correlation coefficient (Excel PEARSON).
But LINEST R^2 and the linear trendline R^2 are based on the coefficient of determination (CoD).
When "zero intercept" is not specified -- that is, when there is a non-zero intercept -- all 3 statistics are the same, within the limits of floating-point arithmetic.
But when "zero intercept" is specified -- that is, when there is no intercept -- CoD differs from the Excel RSQ.
And more to the point, there is some debate among statisticians about how to calculate the CoD when a zero intercept is specified.
First, note that LINEST R^2 and the linear trendline R^2 are __not__ the square of any number, whereas Excel RSQ is. "R^2" is simply the notation that statisticians use for the CoD. I will use the term "CoD" (duh!) to avoid confusion.
The distinction is significant because the general method of calculating the CoD can result in __negative__ values for "R^2", whereas Excel RSQ is always non-negative.
The negative "R^2" (CoD) might seem like an error, because the square of any real number cannot be negative. But it is not an error, for the very reason that the CoD is not the square of any number.
Nevertheless, some statiscians justify the alternative method of calculating the CoD, in part, in order to avoid the negative CoD.
The method of calculating Excel RSQ is described in the RSQ help page (click here).
The methods of calculating the CoD is described in the LINEST help page (click here)
In general, the CoD is calculated by the formula 1 - SSres/SStot, or equivalently SSreg/SStot. SStot is Sigma((Y - avgY)^2), where "Y" is the original data.
That formula is always used to calculate the linear trendline R^2, at least in Excel 2010 (which I use) and in 2013, 2016 and 2019 (according to you).
That formula is also used to calculate LINEST R^2 when "zero intercept" is not specified (const = TRUE).
However, when "zero intercept" is specified (const = FALSE), LINEST calculates SStot differently, namely: Sigma(Y^2).
An explanation of that formula can be found in an academic article (click here), which is a heavy read (i.e. TMI).
That is why LINEST R^2 and the linear trendline R^2 differ in Excel 2010, 2013, 2016 and 2019, among others(?).
However, I am told that in Office 365 Excel, the calculation of the linear trendline R^2 has been changed to agree with LINEST R^2 when "zero intercept" is specified.
So for Office 365 users, the two values of "R^2" (CoD) should be the same, within the limits of floating-point arithmetic.
-----
From the "coefficient of determination" wikipage....
From the LINEST help page ....
From the RSQ help page....
Jun 22 2021 08:19 PM
This is brilliant.
Now I understand why they are different and find the way to figure it out.
You totally answered my questions and solved my problems.
Thank you so so so so so much.
Jun 22 2021 08:18 AM - edited Jun 22 2021 11:16 PM
SolutionTying up loose ends for posterity....
@linnan5945 wrote: ``In my computer trying Office 2013, 2016, and 2019, all the R square of the linear trend is 0.9886, while in my friends' computers, it's 0.9945. We used SAS and SPSS to calculate, which also showed R2=0.9945. We set the intercept = 0.``
I wrote: ``the [coefficient of determination] wikipage (click here) explains that there "are several definitions of R^2 that are only sometimes equivalent". One such difference is the calculation of the "correlation coefficient" (r) v. "coefficient of determination" (R).``
That is, indeed, the reason for the difference, to a degree.
Excel RSQ calculates the square of the Pearson correlation coefficient (Excel PEARSON).
But LINEST R^2 and the linear trendline R^2 are based on the coefficient of determination (CoD).
When "zero intercept" is not specified -- that is, when there is a non-zero intercept -- all 3 statistics are the same, within the limits of floating-point arithmetic.
But when "zero intercept" is specified -- that is, when there is no intercept -- CoD differs from the Excel RSQ.
And more to the point, there is some debate among statisticians about how to calculate the CoD when a zero intercept is specified.
First, note that LINEST R^2 and the linear trendline R^2 are __not__ the square of any number, whereas Excel RSQ is. "R^2" is simply the notation that statisticians use for the CoD. I will use the term "CoD" (duh!) to avoid confusion.
The distinction is significant because the general method of calculating the CoD can result in __negative__ values for "R^2", whereas Excel RSQ is always non-negative.
The negative "R^2" (CoD) might seem like an error, because the square of any real number cannot be negative. But it is not an error, for the very reason that the CoD is not the square of any number.
Nevertheless, some statiscians justify the alternative method of calculating the CoD, in part, in order to avoid the negative CoD.
The method of calculating Excel RSQ is described in the RSQ help page (click here).
The methods of calculating the CoD is described in the LINEST help page (click here)
In general, the CoD is calculated by the formula 1 - SSres/SStot, or equivalently SSreg/SStot. SStot is Sigma((Y - avgY)^2), where "Y" is the original data.
That formula is always used to calculate the linear trendline R^2, at least in Excel 2010 (which I use) and in 2013, 2016 and 2019 (according to you).
That formula is also used to calculate LINEST R^2 when "zero intercept" is not specified (const = TRUE).
However, when "zero intercept" is specified (const = FALSE), LINEST calculates SStot differently, namely: Sigma(Y^2).
An explanation of that formula can be found in an academic article (click here), which is a heavy read (i.e. TMI).
That is why LINEST R^2 and the linear trendline R^2 differ in Excel 2010, 2013, 2016 and 2019, among others(?).
However, I am told that in Office 365 Excel, the calculation of the linear trendline R^2 has been changed to agree with LINEST R^2 when "zero intercept" is specified.
So for Office 365 users, the two values of "R^2" (CoD) should be the same, within the limits of floating-point arithmetic.
-----
From the "coefficient of determination" wikipage....
From the LINEST help page ....
From the RSQ help page....