Forum Discussion

davidleal's avatar
davidleal
Iron Contributor
Mar 30, 2023

XMATCH and MATCH return different result for approximate search lookup_array with repetitive values

The following examples produce a different result:

 

=MATCH(FALSE, {TRUE,TRUE,TRUE},-1) -> 3
=XMATCH(FALSE, {TRUE,TRUE,TRUE},1) -> 1

 

They are different functions, but serve for the same purpose. As per documentation of both functions (MATCH and XMATCH), they find the smallest value (index position) in lookup_array that is greater than or equal to lookup_value. XMATCH understands it as the first value of the repetition that satisfies the previous condition and MATCH as the last value.

 

Similarly for numbers:

 

=MATCH(0, {3,2,2},-1) -> 3
=XMATCH(0, {3,2,2},1) -> 2

 

The lookup_array is sorted in descending order for MATCH as it is required per its documentation. For XMATCH it is not required, we keep the same order, expecting to get the same result.

 

Is it the expected behavior,? is it documented somewhere?

 

Thanks,

 

  • JoeUser2004's avatar
    JoeUser2004
    Bronze Contributor

    davidleal  wrote:  ``Is it the expected behavior,? is it documented somewhere?``

     

    Yes and yes.  For XMATCH to have the same behavior as MATCH, do:

     

    =XMATCH(0,{3,2,2},1,-2)

     

    From the XMATCH help page (click here😞

     

     

    With the default search-mode (1), as you used, XMATCH does a linear search from first to last.  So with match-mode -1, XMATCH will find the first value that is an ``exact match or next largest item`` when there are duplicate values that qualify.

     

    But MATCH does a binary search.  So with match-type -1, MATCH will find the last value that is ``the smallest value that is greater than or equal to lookup_value`` when there are duplicate values that qualify.

     

    So with XMATCH, we must explicitly specify search-mode -2 in order to get the same binary search behavior.

    • davidleal's avatar
      davidleal
      Iron Contributor

      JoeUser2004 Thanks it clarifies a lot. Here are some comments:

      1. MATCH documentation doesn't specify the search algorithm used, it is assumed based on the sorting requirement.
      2. XMATCH documentation doesn't specify the algorithm used for the default option, it is indicated the direction, so it is assumed the linear search.
      3. There is no link where we can find some documentation on how the binary/linear algorithm was implemented in Excel. On the web, you can find some simulations but usually not for the case or repeated values. It requires some deep dive into the binary search algorithms. In any case, there is no reference.
      4. As a result of that, I have seen many people stop using XMATCH (even Microsoft indicates it is an improved version) due to performance issues compared to MATCH and the reason is that they don't use the search mode input argument in some scenarios, which results in lower performance, because binary search is faster than linear search for the sorted array.

      In summary in my opinion there is significant room for improvement in MATCH/XMATCH documentation.

       

      Thanks again for your help on this

  • JosWoolley's avatar
    JosWoolley
    Iron Contributor

    davidleal 

     

    Interesting. I also investigated XMATCH some time back and was dismayed to discover that there was no way to replicate one of the most important features of the old MATCH function (which I continue to use for this very reason). That feature relates to the ability to use fast binary searching in order to determine the position of the last value in a range (useful for defining dynamic ranges when one is not using a Table).

     

    For example, with A1 and A10 containing "z" and "a" respectively (and all other cells in that column blank):

     

    =MATCH("Ω",A:A)

     

    (or, equivalently =MATCH("Ω",A:A,1))

     

    will return 10, as desired.

     

    However, there is no way to obtain this result using XMATCH, as far as I can tell.

     

    A huge oversight in the working of XMATCH, in my opinion.

    • PeterBartholomew1's avatar
      PeterBartholomew1
      Silver Contributor

      JosWoolley 

      For me

      = XMATCH("Ω", A:A, -1)

      worked well (though it is many years since I have used a whole column reference;  I am usually content to take the end of a Table to define a column dynamically and the problem does not arise for array formulas and their spilt ranges).

       

      My recollection is that there is little need for the ±2 parameter in XLOOKUP and XMATCH since the functions are designed to exploit binary search methods and will perform sorting in memory to ensure that apparently linear searches are performed efficiently.

    • davidleal's avatar
      davidleal
      Iron Contributor

      JosWoolley thanks for your response too, I guess the issue you mentioned based on JoeUser2004's response can be explained and to be able to get the same result. The input range you have is not sorted: {"z";"";"";"";"";"";"";"";"";"a"}, so for using lower or equal search we need to have it sorted in ascending oder, therefore:

      =MATCH("Ω",SORT(A:A),1) -> 2

      which makes sense because after sorted "z" is in the second position and "Ω" is bigger than "z" and sorted lookup_array is as follows: "a", "z","",...,"".

       

      An equivalent result in XMATH is obtained as follows:

      =XMATCH("Ω",SORT(M:M),-1,2) -> 2

      to do a comparison we need to test under the same premise, so using binary search in ascending order (2).

       

      For MATCH approximate search (1) if the lookup_array is not in ascending order unexpected results may be obtained.

       

      Thanks for your input,

       

      David

       

      • JosWoolley's avatar
        JosWoolley
        Iron Contributor

        davidleal 

         

        Actually, David, I think you've misunderstood the point. Even when the lookup_array is not sorted, providing the choice of lookup_value is chosen to be greater than any value within lookup_array we can exploit this to return the last non-blank entry in a one-dimensional range.

         

        Regards

Resources