Forum Discussion

Martin349's avatar
Martin349
Copper Contributor
Feb 28, 2020

How to compare two variables with unequal data samples with CORREL function?

Hi

 

I have two variables denoted A and B. 

 

A has 5000 data points comprising data from the last 5000 days, whereas B only has 1000 data points comprising the last 1000 days.


When I use CORREL function between all 5000 data points from A and 1000 data points from B, I get a correlation value of -0,711.

However, when I only take the last 1000 data points from variable A (comprising the last 1000 days) and compare these with the 1000 data points from variable B by using CORREL function, I get a correlation value of 0,981.

 

I don't understand how there can be such a big difference. Should I assume that the correlation value of 0,981 is most correct since there exists a data point for each of the 1000 data points from variable A to be compared to by variable B (and vice versa), or I am missing something here?

 

Thanks for any help πŸ˜„

5 Replies

  • PReagan's avatar
    PReagan
    Bronze Contributor

    Hello Martin349,

     

    When using the CORREL() function, both arrays should have an equal amount of data points. CORREL([5000 data points], [1000 data points]) should return #N/A but you're saying it returns -0.711?

    • Martin349's avatar
      Martin349
      Copper Contributor

      HiPReagan 

      Thanks for your response

       

      Yes, I use the following formula:

      =CORREL('A'!E:E;'B'!E:E)

       

      Where E:E in both cases represents the columns containing the data points.

      • PReagan's avatar
        PReagan
        Bronze Contributor

        Martin349 

         

        You are manipulating giving a misinterpretation of your data by using the arrays "'A'!E:E" and "'B'!E:E". While these arrays are the same size, the arrays actually have different amounts of data points. You should only use the range in which there are data points in the CORREL() function.

Resources