SOLVED

Difference in calculated correlation coefficient value between Office 2016 and Office 365

Copper Contributor

When I create a scatter plot in Excel (2016) and add trendline to this data I get a (correct) R^2 vallue of 0.9838.  I also get the same value using the =correl() function. 

However, if I load the file into Office365 (online), without changing anything myself, the R^2 value on the chart changes to 0.997.    This does not change the value given by the correl() function.  Is there a difference in the way the R^2 value that is shown with the trendline is calculated and displayed between the versions of Excel?  Is this a bug?

I also reported this throught the app at the end of 2020 but I have had no reply or acknowledgement. 

ConcentrationAbsorbance
0.10.323
0.20.521
0.30.981
0.41.243
0.51.478
4 Replies
best response confirmed by Dave_Gerrard (Copper Contributor)
Solution

@Dave_Gerrard  wrote: ``I get a (correct) R^2 vallue of 0.9838. I also get the same value using the =correl() function. [....] if I load the file into Office365 (online), [...] the R^2 value on the chart changes to 0.997``

 

(Errata.... I believe you mean RSQ(), not CORREL().  CORREL is "R", the square root of R^2.)

 

I suspect that you select Set Intercept=0 for the trendline.

 

And yes, the trendline calculation of "R^2" for zero intercept did change in Office 365, or so I'm told.

 

But that is a correction, not a bug.

 

For details, see my last response in the thread at https://techcommunity.microsoft.com/t5/excel/same-xy-scatter-but-different-r-square/td-p/2456776 .

 

(But rereading it myself, that explanation is contorted and difficult to follow.  Sigh.)

 

In a nutshell, we can see the difference by looking at the "R^2" returned by LINEST, which is really the "coefficient of determination" (CoD).

 

Usually, RSQ and CoD are the same.  But they differ under some conditions.  One of those conditions is when a zero intercept is specified.

 

With your posted data:

 

=INDEX(LINEST(B1:B5,A1:A5,TRUE,TRUE),3,1) returns 0.983799508754327

 

=INDEX(LINEST(B1:B5,A1:A5,FALSE,TRUE),3,1) returns 0.997012717209636

 

which round to 0.9838 and 0.9970 respectively.

 

Excerpts from my previous explanation:

 

In general, the CoD is calculated by the formula 1 - SSres/SStot, or equivalently SSreg/SStot. SStot is Sigma((Y - avgY)^2), where "Y" is the original data.

 

[In contrast, see the RSQ help page to see how truly R^2 is calculated.]

 

That formula is always used to calculate the linear trendline "R^2", at least in Excel 2010 (which I use) and in 2013, 2016 and 2019 [...].

 

That formula is also used to calculate LINEST "R^2" when "zero intercept" is not specified (const = TRUE).

 

However, when "zero intercept" is specified (const = FALSE), LINEST calculates SStot differently, namely: Sigma(Y^2).

[....]

I am told that in Office 365 Excel, the calculation of the linear trendline "R^2" has been changed to agree with LINEST "R^2" when "zero intercept" is specified.

 

----

 

That difference in calculation of CoD for "zero intercept" is intentional.  See the LINEST help page.  It is based on an esoteric academic rationale, which not all statistics scientists agree with.

 

@Joe User Thank you very much.   That does explain the values I am getting. 

 

I think it is poor of MS to change the underlying default formula of the trendline "R^2" displayed on the charts without making it clear or renaming it in someway. 

The way I am doing it is to take a single file in Office 2016 and re-open it in Office365. The intercept is NOT set at zero in the first but Office365 then re-calculates the value and displays the altered value in the same manner. Is there a list somewhere of similar alterations that are made between versions?  Where I work, we have several versions running over a large number of people, this kind of thing makes it very difficult to spot when someone has made an error or whether Excel is altering the results.

I will now add this to my list of reasons to be wary of Excel. 

 

[Incidentally: to explain the errata (for anyone following closely) - I was using CORREL()^2  rather than RSQ() (they are equivalent) and forgot to type the "^2"   in my question.  Thanks @Joe User for spotting that too.]

 

@Dave_Gerrard  wrote:  ``The intercept is NOT set at zero in the first but Office365 then re-calculates the value and displays the altered value in the same manner``

 

If you are saying that Office 365 Excel automagically __changed__ your Set Intercept=0 option from not selected to selected, or that it calculates R^2 __as_if__ that option is selected when it __is_not__, that is certainly a defect.

 

IMHO, you should report it.  I believe the most effective way to do that is to use the Feedback feature.  Do not bother posting to excel.uservoice.com.  That is a waste of time, IMHO.

 

(PS.... Oh good:  excel.uservoice.com is now officially defunct.  But do not follow MSFT's suggestion to use techcommunity.microsoft.com -- a total waste of time -- or MSFT Store.)

 

I do not have the Feedback feature in my version of Excel.  But I have read that the best way to ensure that someone at MSFT actually pays attention to it is:

 

1. After clicking Feedback, select "I Don't Like Something". (Not "I have a suggestion".)

 

2. Select all three checkboxes: Attach My Logs to Help Troubleshoot, Include Screenshot, and You Can Contact Me About This Feedback.

 

3. Provide a valid e-mail address that you actually check.

 

Thanks for all the input.   The decision to make the change was not taken lightly, but was done to provide more accurate calculations.

 

For those looking for more details on the changes to trendline calculations, please refer to this article

 

Changes to internal calculations of linear trendlines in a chart - Microsoft Support

 

Johnnie

Microsoft Excel

1 best response

Accepted Solutions
best response confirmed by Dave_Gerrard (Copper Contributor)
Solution

@Dave_Gerrard  wrote: ``I get a (correct) R^2 vallue of 0.9838. I also get the same value using the =correl() function. [....] if I load the file into Office365 (online), [...] the R^2 value on the chart changes to 0.997``

 

(Errata.... I believe you mean RSQ(), not CORREL().  CORREL is "R", the square root of R^2.)

 

I suspect that you select Set Intercept=0 for the trendline.

 

And yes, the trendline calculation of "R^2" for zero intercept did change in Office 365, or so I'm told.

 

But that is a correction, not a bug.

 

For details, see my last response in the thread at https://techcommunity.microsoft.com/t5/excel/same-xy-scatter-but-different-r-square/td-p/2456776 .

 

(But rereading it myself, that explanation is contorted and difficult to follow.  Sigh.)

 

In a nutshell, we can see the difference by looking at the "R^2" returned by LINEST, which is really the "coefficient of determination" (CoD).

 

Usually, RSQ and CoD are the same.  But they differ under some conditions.  One of those conditions is when a zero intercept is specified.

 

With your posted data:

 

=INDEX(LINEST(B1:B5,A1:A5,TRUE,TRUE),3,1) returns 0.983799508754327

 

=INDEX(LINEST(B1:B5,A1:A5,FALSE,TRUE),3,1) returns 0.997012717209636

 

which round to 0.9838 and 0.9970 respectively.

 

Excerpts from my previous explanation:

 

In general, the CoD is calculated by the formula 1 - SSres/SStot, or equivalently SSreg/SStot. SStot is Sigma((Y - avgY)^2), where "Y" is the original data.

 

[In contrast, see the RSQ help page to see how truly R^2 is calculated.]

 

That formula is always used to calculate the linear trendline "R^2", at least in Excel 2010 (which I use) and in 2013, 2016 and 2019 [...].

 

That formula is also used to calculate LINEST "R^2" when "zero intercept" is not specified (const = TRUE).

 

However, when "zero intercept" is specified (const = FALSE), LINEST calculates SStot differently, namely: Sigma(Y^2).

[....]

I am told that in Office 365 Excel, the calculation of the linear trendline "R^2" has been changed to agree with LINEST "R^2" when "zero intercept" is specified.

 

----

 

That difference in calculation of CoD for "zero intercept" is intentional.  See the LINEST help page.  It is based on an esoteric academic rationale, which not all statistics scientists agree with.

 

View solution in original post