Forum Discussion
What is the difference between the functions correl and pearson?
The Microsoft documentation for the functions https://support.office.com/en-us/article/correl-function-995dcef7-0c0a-4bed-a3fb-239d7b68ca92 and https://support.office.com/en-us/article/pearson-function-0c3e30fc-e5af-49c4-808a-3ef66e034c18 both say that they calculate correlation coefficients and they both state the algebraic formula that the function uses in its calculation, and those two formulae are identical! So are those functions actually equivalent, or is one (or both) of those documentation pages wrong? If they are not equivalent, what is each one actually doing?
To my knowledge they have exactly the same math behind, but different implementation. On Excel after 2003 it shall be no difference. Bit more is within this article https://docs.microsoft.com/en-us/office/troubleshoot/excel/statistical-functions-rsq
6 Replies
- SergeiBaklanDiamond Contributor
To my knowledge they have exactly the same math behind, but different implementation. On Excel after 2003 it shall be no difference. Bit more is within this article https://docs.microsoft.com/en-us/office/troubleshoot/excel/statistical-functions-rsq
- anMSuserCopper Contributor
SergeiBaklan - That's very interesting. If I understand that article correctly, both functions yield the same results now, but could have been different prior to Excel 2003 because of round-off errors that could occur in pearson() before improvements in Excel 2003 fixed those errors.
So I suppose the only reason to have both functions now is for backward compatibility with old code that might have used one or the other. Is that right?
But this raises the question, which, unless I missed it, the article doesn't answer, why Excel had two functions that did essentially the same thing, but one of them sometimes had errors. Why would anyone ever use pearson() then? Was it a lot faster for large problems? The article doesn't say anything about differences in efficiency or any other reason to have used pearson() when it was prone to error.
- SergeiBaklanDiamond Contributor
Again, I don't know. At least CORREL() is not marked "This function is available for compatibility..." as some others if you start typing them.
Performance - I did small simple test with two arrays on 1 million rows each. Bot CORREL() and PEARSON() calculates practically immediately.