Forum Discussion
What is the difference between the functions correl and pearson?
- Mar 24, 2020
To my knowledge they have exactly the same math behind, but different implementation. On Excel after 2003 it shall be no difference. Bit more is within this article https://docs.microsoft.com/en-us/office/troubleshoot/excel/statistical-functions-rsq
To my knowledge they have exactly the same math behind, but different implementation. On Excel after 2003 it shall be no difference. Bit more is within this article https://docs.microsoft.com/en-us/office/troubleshoot/excel/statistical-functions-rsq
- anMSuserMar 24, 2020Copper Contributor
SergeiBaklan - That's very interesting. If I understand that article correctly, both functions yield the same results now, but could have been different prior to Excel 2003 because of round-off errors that could occur in pearson() before improvements in Excel 2003 fixed those errors.
So I suppose the only reason to have both functions now is for backward compatibility with old code that might have used one or the other. Is that right?
But this raises the question, which, unless I missed it, the article doesn't answer, why Excel had two functions that did essentially the same thing, but one of them sometimes had errors. Why would anyone ever use pearson() then? Was it a lot faster for large problems? The article doesn't say anything about differences in efficiency or any other reason to have used pearson() when it was prone to error.
- SergeiBaklanMar 24, 2020Diamond Contributor
Again, I don't know. At least CORREL() is not marked "This function is available for compatibility..." as some others if you start typing them.
Performance - I did small simple test with two arrays on 1 million rows each. Bot CORREL() and PEARSON() calculates practically immediately.
- anMSuserMar 25, 2020Copper Contributor
You would need to run that test in a version of Excel prior to Excel 2003, because that is when pearson() would sometimes yield erroneous results. It would have made sense to use pearson() back then if it ran a lot faster than correl() and if you knew how to determine that you were not using it in a domain that would yield errors.