SOLVED

What is the difference between the functions correl and pearson?

Copper Contributor

The Microsoft documentation for the functions correl and pearson both say that they calculate correlation coefficients and they both state the algebraic formula that the function uses in its calculation, and those two formulae are identical! So are those functions actually equivalent, or is one (or both) of those documentation pages wrong? If they are not equivalent, what is each one actually doing?

6 Replies
best response confirmed by anMSuser (Copper Contributor)
Solution

@anMSuser 

To my knowledge they have exactly the same math behind, but different implementation. On Excel after 2003 it shall be no difference. Bit more is within this article https://docs.microsoft.com/en-us/office/troubleshoot/excel/statistical-functions-rsq

@Sergei Baklan  - That's very interesting. If I understand that article correctly, both functions yield the same results now, but could have been different prior to Excel 2003 because of round-off errors that could occur in pearson() before improvements in Excel 2003 fixed those errors.

 

So I suppose the only reason to have both functions now is for backward compatibility with old code that might have used one or the other. Is that right?

 

But this raises the question, which, unless I missed it, the article doesn't answer, why Excel had two functions that did essentially the same thing, but one of them sometimes had errors. Why would anyone ever use pearson() then? Was it a lot faster for large problems? The article doesn't say anything about differences in efficiency or any other reason to have used pearson() when it was prone to error.

@anMSuser 

Again, I don't know. At least CORREL() is not marked "This function is available for compatibility..." as some others if you start typing them.

 

Performance - I did small simple test with two arrays on 1 million rows each. Bot CORREL() and PEARSON() calculates practically immediately.

@Sergei Baklan 

You would need to run that test in a version of Excel prior to Excel 2003, because that is when pearson() would sometimes yield erroneous results. It would have made sense to use pearson() back then if it ran a lot faster than correl() and if you knew how to determine that you were not using it in a domain that would yield errors.

@anMSuser 

Perhaps, but performance and errors in some situations are different things.

@Sergei Baklan 

I just took another look at that article you linked to, and it's interesting that the issue is explained (somewhat) in a troubleshooting article about function rsq(), which is just the square of pearson(), and is not explained in the function documentation articles about correl(), pearson(), or rsq().

1 best response

Accepted Solutions
best response confirmed by anMSuser (Copper Contributor)
Solution

@anMSuser 

To my knowledge they have exactly the same math behind, but different implementation. On Excel after 2003 it shall be no difference. Bit more is within this article https://docs.microsoft.com/en-us/office/troubleshoot/excel/statistical-functions-rsq

View solution in original post