Forum Discussion

linnan5945's avatar
linnan5945
Copper Contributor
Jun 17, 2021
Solved

Same XY scatter but different R square

I have a set of X and Y, as below.

XY
0.050.28344174663489400
0.51.26762275157997000
13.43248288736777000
29.69228927203065000
3.519.16153253754330000
526.01969057665260000


In my computer trying Office 2013, 2016, and 2019, all the R square of the linear trend is 0.9886, while in my friends' computers, it's 0.9945. We used SAS and SPSS to calculate, which also showed R2=0.9945.

We set the intercept = 0.

 

Is anything wrong with my computer?

How can I fix it?

 

  • Tying up loose ends for posterity....

     

    linnan5945  wrote: ``In my computer trying Office 2013, 2016, and 2019, all the R square of the linear trend is 0.9886, while in my friends' computers, it's 0.9945. We used SAS and SPSS to calculate, which also showed R2=0.9945. We set the intercept = 0.``

     

    I wrote: ``the [coefficient of determination] wikipage (click here) explains that there "are several definitions of R^2 that are only sometimes equivalent". One such difference is the calculation of the "correlation coefficient" (r) v. "coefficient of determination" (R).``

     

    That is, indeed, the reason for the difference, to a degree.

     

    Excel RSQ calculates the square of the Pearson correlation coefficient (Excel PEARSON).

     

    But LINEST R^2 and the linear trendline R^2 are based on the coefficient of determination (CoD).

     

    When "zero intercept" is not specified -- that is, when there is a non-zero intercept -- all 3 statistics are the same, within the limits of floating-point arithmetic.

     

    But when "zero intercept" is specified -- that is, when there is no intercept -- CoD differs from the Excel RSQ.

     

    And more to the point, there is some debate among statisticians about how to calculate the CoD when a zero intercept is specified.

     

    First, note that LINEST R^2 and the linear trendline R^2 are __not__ the square of any number, whereas Excel RSQ is. "R^2" is simply the notation that statisticians use for the CoD. I will use the term "CoD" (duh!) to avoid confusion.

     

    The distinction is significant because the general method of calculating the CoD can result in __negative__ values for "R^2", whereas Excel RSQ is always non-negative.

     

    The negative "R^2" (CoD) might seem like an error, because the square of any real number cannot be negative. But it is not an error, for the very reason that the CoD is not the square of any number.

     

    Nevertheless, some statiscians justify the alternative method of calculating the CoD, in part, in order to avoid the negative CoD.

     

    The method of calculating Excel RSQ is described in the RSQ help page (click here).

     

    The methods of calculating the CoD is described in the LINEST help page (click here)

     

    In general, the CoD is calculated by the formula 1 - SSres/SStot, or equivalently SSreg/SStot. SStot is Sigma((Y - avgY)^2), where "Y" is the original data.

     

    That formula is always used to calculate the linear trendline R^2, at least in Excel 2010 (which I use) and in 2013, 2016 and 2019 (according to you).

     

    That formula is also used to calculate LINEST R^2 when "zero intercept" is not specified (const = TRUE).

     

    However, when "zero intercept" is specified (const = FALSE), LINEST calculates SStot differently, namely: Sigma(Y^2).

     

    An explanation of that formula can be found in an academic article (click here), which is a heavy read (i.e. TMI).

     

    That is why LINEST R^2 and the linear trendline R^2 differ in Excel 2010, 2013, 2016 and 2019, among others(?).

     

    However, I am told that in Office 365 Excel, the calculation of the linear trendline R^2 has been changed to agree with LINEST R^2 when "zero intercept" is specified.

     

    So for Office 365 users, the two values of "R^2" (CoD) should be the same, within the limits of floating-point arithmetic.

     

    -----

     

    From the "coefficient of determination" wikipage....

     

     

    From the LINEST help page ....

     

     

    From the RSQ help page....

     

     

6 Replies

  • JoeUser2004's avatar
    JoeUser2004
    Bronze Contributor

    Tying up loose ends for posterity....

     

    linnan5945  wrote: ``In my computer trying Office 2013, 2016, and 2019, all the R square of the linear trend is 0.9886, while in my friends' computers, it's 0.9945. We used SAS and SPSS to calculate, which also showed R2=0.9945. We set the intercept = 0.``

     

    I wrote: ``the [coefficient of determination] wikipage (click here) explains that there "are several definitions of R^2 that are only sometimes equivalent". One such difference is the calculation of the "correlation coefficient" (r) v. "coefficient of determination" (R).``

     

    That is, indeed, the reason for the difference, to a degree.

     

    Excel RSQ calculates the square of the Pearson correlation coefficient (Excel PEARSON).

     

    But LINEST R^2 and the linear trendline R^2 are based on the coefficient of determination (CoD).

     

    When "zero intercept" is not specified -- that is, when there is a non-zero intercept -- all 3 statistics are the same, within the limits of floating-point arithmetic.

     

    But when "zero intercept" is specified -- that is, when there is no intercept -- CoD differs from the Excel RSQ.

     

    And more to the point, there is some debate among statisticians about how to calculate the CoD when a zero intercept is specified.

     

    First, note that LINEST R^2 and the linear trendline R^2 are __not__ the square of any number, whereas Excel RSQ is. "R^2" is simply the notation that statisticians use for the CoD. I will use the term "CoD" (duh!) to avoid confusion.

     

    The distinction is significant because the general method of calculating the CoD can result in __negative__ values for "R^2", whereas Excel RSQ is always non-negative.

     

    The negative "R^2" (CoD) might seem like an error, because the square of any real number cannot be negative. But it is not an error, for the very reason that the CoD is not the square of any number.

     

    Nevertheless, some statiscians justify the alternative method of calculating the CoD, in part, in order to avoid the negative CoD.

     

    The method of calculating Excel RSQ is described in the RSQ help page (click here).

     

    The methods of calculating the CoD is described in the LINEST help page (click here)

     

    In general, the CoD is calculated by the formula 1 - SSres/SStot, or equivalently SSreg/SStot. SStot is Sigma((Y - avgY)^2), where "Y" is the original data.

     

    That formula is always used to calculate the linear trendline R^2, at least in Excel 2010 (which I use) and in 2013, 2016 and 2019 (according to you).

     

    That formula is also used to calculate LINEST R^2 when "zero intercept" is not specified (const = TRUE).

     

    However, when "zero intercept" is specified (const = FALSE), LINEST calculates SStot differently, namely: Sigma(Y^2).

     

    An explanation of that formula can be found in an academic article (click here), which is a heavy read (i.e. TMI).

     

    That is why LINEST R^2 and the linear trendline R^2 differ in Excel 2010, 2013, 2016 and 2019, among others(?).

     

    However, I am told that in Office 365 Excel, the calculation of the linear trendline R^2 has been changed to agree with LINEST R^2 when "zero intercept" is specified.

     

    So for Office 365 users, the two values of "R^2" (CoD) should be the same, within the limits of floating-point arithmetic.

     

    -----

     

    From the "coefficient of determination" wikipage....

     

     

    From the LINEST help page ....

     

     

    From the RSQ help page....

     

     

    • linnan5945's avatar
      linnan5945
      Copper Contributor

      JoeUser2004 

      This is brilliant.

      Now I understand why they are different and find the way to figure it out.

      You totally answered my questions and solved my problems.

      Thank you so so so so so much.

       

  • JoeUser2004's avatar
    JoeUser2004
    Bronze Contributor

    linnan5945

     

    I do not have the same Excel versions, but FWIW, I can see the same difference between the R^2 calculated with LINEST and the chart trendline.

     

    See the image below.  Also see the Excel file; but caveat: you might get different results if the file is recalculated when you open it in another version of Excel.

     

    I cannot offer a dispositive explanation at this time.

     

    But the wikipage (click here) explains that there ``are several definitions of R2 that are only sometimes equivalent``.  One such difference is the calculation of the "correlation coefficient" (r) v. "coefficient of determination" (R).

     

    However, IMHO, the wikipage is "confusing" (read: dubious). I would need more time to vet the information and to test its applicability to the difference that we see.

     

     

     

    • linnan5945's avatar
      linnan5945
      Copper Contributor
      Wow cool.
      So how can I change the trendline to use LINEST result in my excel?
      • JoeUser2004's avatar
        JoeUser2004
        Bronze Contributor

        linnan5945  wrote:  ``Wow cool``

         

        I actually have more details to share.  I hope to post later.

         

        -----

         

        linnan5945  wrote:  ``So how can I change the trendline to use LINEST result in my excel?``

         

        We cannot.  All I can offer is a difficult work-around, that might not be satisfactory.

         

        I would let the chart create the trendline label.  Then I would edit it and overwrite the chart R^2 with the LINEST R^2.  The trendline label is just a textbox.

         

        You can calculate the LINEST R^2 with the following formula (normally-entered), if you wish:

         

        =INDEX(LINEST(B2:B7, A2:A7, FALSE, TRUE), 3, 1)

Resources