Forum Discussion

Dave_Gerrard's avatar
Dave_Gerrard
Copper Contributor
Sep 01, 2021
Solved

Difference in calculated correlation coefficient value between Office 2016 and Office 365

When I create a scatter plot in Excel (2016) and add trendline to this data I get a (correct) R^2 vallue of 0.9838.  I also get the same value using the =correl() function.  However, if I load the f...
  • JoeUser2004's avatar
    Sep 01, 2021

    Dave_Gerrard  wrote: ``I get a (correct) R^2 vallue of 0.9838. I also get the same value using the =correl() function. [....] if I load the file into Office365 (online), [...] the R^2 value on the chart changes to 0.997``

     

    (Errata.... I believe you mean RSQ(), not CORREL().  CORREL is "R", the square root of R^2.)

     

    I suspect that you select Set Intercept=0 for the trendline.

     

    And yes, the trendline calculation of "R^2" for zero intercept did change in Office 365, or so I'm told.

     

    But that is a correction, not a bug.

     

    For details, see my last response in the thread at https://techcommunity.microsoft.com/t5/excel/same-xy-scatter-but-different-r-square/td-p/2456776 .

     

    (But rereading it myself, that explanation is contorted and difficult to follow.  Sigh.)

     

    In a nutshell, we can see the difference by looking at the "R^2" returned by LINEST, which is really the "coefficient of determination" (CoD).

     

    Usually, RSQ and CoD are the same.  But they differ under some conditions.  One of those conditions is when a zero intercept is specified.

     

    With your posted data:

     

    =INDEX(LINEST(B1:B5,A1:A5,TRUE,TRUE),3,1) returns 0.983799508754327

     

    =INDEX(LINEST(B1:B5,A1:A5,FALSE,TRUE),3,1) returns 0.997012717209636

     

    which round to 0.9838 and 0.9970 respectively.

     

    Excerpts from my previous explanation:

     

    In general, the CoD is calculated by the formula 1 - SSres/SStot, or equivalently SSreg/SStot. SStot is Sigma((Y - avgY)^2), where "Y" is the original data.

     

    [In contrast, see the RSQ help page to see how truly R^2 is calculated.]

     

    That formula is always used to calculate the linear trendline "R^2", at least in Excel 2010 (which I use) and in 2013, 2016 and 2019 [...].

     

    That formula is also used to calculate LINEST "R^2" when "zero intercept" is not specified (const = TRUE).

     

    However, when "zero intercept" is specified (const = FALSE), LINEST calculates SStot differently, namely: Sigma(Y^2).

    [....]

    I am told that in Office 365 Excel, the calculation of the linear trendline "R^2" has been changed to agree with LINEST "R^2" when "zero intercept" is specified.

     

    ----

     

    That difference in calculation of CoD for "zero intercept" is intentional.  See the LINEST help page.  It is based on an esoteric academic rationale, which not all statistics scientists agree with.

     

Resources