XMATCH and MATCH return different result for approximate search lookup_array with repetitive values

Iron Contributor

The following examples produce a different result:

 

=MATCH(FALSE, {TRUE,TRUE,TRUE},-1) -> 3
=XMATCH(FALSE, {TRUE,TRUE,TRUE},1) -> 1

 

They are different functions, but serve for the same purpose. As per documentation of both functions (MATCH and XMATCH), they find the smallest value (index position) in lookup_array that is greater than or equal to lookup_value. XMATCH understands it as the first value of the repetition that satisfies the previous condition and MATCH as the last value.

 

Similarly for numbers:

 

=MATCH(0, {3,2,2},-1) -> 3
=XMATCH(0, {3,2,2},1) -> 2

 

The lookup_array is sorted in descending order for MATCH as it is required per its documentation. For XMATCH it is not required, we keep the same order, expecting to get the same result.

 

Is it the expected behavior,? is it documented somewhere?

 

Thanks,

 

9 Replies

@davidleal 

 

Interesting. I also investigated XMATCH some time back and was dismayed to discover that there was no way to replicate one of the most important features of the old MATCH function (which I continue to use for this very reason). That feature relates to the ability to use fast binary searching in order to determine the position of the last value in a range (useful for defining dynamic ranges when one is not using a Table).

 

For example, with A1 and A10 containing "z" and "a" respectively (and all other cells in that column blank):

 

=MATCH("Ω",A:A)

 

(or, equivalently =MATCH("Ω",A:A,1))

 

will return 10, as desired.

 

However, there is no way to obtain this result using XMATCH, as far as I can tell.

 

A huge oversight in the working of XMATCH, in my opinion.

@davidleal  wrote:  ``Is it the expected behavior,? is it documented somewhere?``

 

Yes and yes.  For XMATCH to have the same behavior as MATCH, do:

 

=XMATCH(0,{3,2,2},1,-2)

 

From the XMATCH help page (click here:(

 

JoeUser_0-1680155815793.png

 

With the default search-mode (1), as you used, XMATCH does a linear search from first to last.  So with match-mode -1, XMATCH will find the first value that is an ``exact match or next largest item`` when there are duplicate values that qualify.

 

But MATCH does a binary search.  So with match-type -1, MATCH will find the last value that is ``the smallest value that is greater than or equal to lookup_value`` when there are duplicate values that qualify.

 

So with XMATCH, we must explicitly specify search-mode -2 in order to get the same binary search behavior.

@Joe User Thanks it clarifies a lot. Here are some comments:

  1. MATCH documentation doesn't specify the search algorithm used, it is assumed based on the sorting requirement.
  2. XMATCH documentation doesn't specify the algorithm used for the default option, it is indicated the direction, so it is assumed the linear search.
  3. There is no link where we can find some documentation on how the binary/linear algorithm was implemented in Excel. On the web, you can find some simulations but usually not for the case or repeated values. It requires some deep dive into the binary search algorithms. In any case, there is no reference.
  4. As a result of that, I have seen many people stop using XMATCH (even Microsoft indicates it is an improved version) due to performance issues compared to MATCH and the reason is that they don't use the search mode input argument in some scenarios, which results in lower performance, because binary search is faster than linear search for the sorted array.

In summary in my opinion there is significant room for improvement in MATCH/XMATCH documentation.

 

Thanks again for your help on this

@JosWoolley thanks for your response too, I guess the issue you mentioned based on @Joe User's response can be explained and to be able to get the same result. The input range you have is not sorted: {"z";"";"";"";"";"";"";"";"";"a"}, so for using lower or equal search we need to have it sorted in ascending oder, therefore:

=MATCH("Ω",SORT(A:A),1) -> 2

which makes sense because after sorted "z" is in the second position and "Ω" is bigger than "z" and sorted lookup_array is as follows: "a", "z","",...,"".

 

An equivalent result in XMATH is obtained as follows:

=XMATCH("Ω",SORT(M:M),-1,2) -> 2

to do a comparison we need to test under the same premise, so using binary search in ascending order (2).

 

For MATCH approximate search (1) if the lookup_array is not in ascending order unexpected results may be obtained.

 

Thanks for your input,

 

David

 

@davidleal 

 

Actually, David, I think you've misunderstood the point. Even when the lookup_array is not sorted, providing the choice of lookup_value is chosen to be greater than any value within lookup_array we can exploit this to return the last non-blank entry in a one-dimensional range.

 

Regards

@JosWoolley I didn't know what was your intention with the formula, I was trying to explain how you can get the same result following the specifications, but not getting the expected result you are looking for. Now from your post, I see you are looking for the position of the last non-blank cell.

 

Maybe it is a side effect or a glitch in MATCH, because we are not using it according to its specification. For example: 

MATCH(4,{3,2,1}) -> 3

You can permute the array in any order and you get the same result, i.e. the last position. Which is the correct position when we invoke the function according to its specification (ascending order):

 

MATCH(4,{1,2,3}) -> 3​

 

On contrary XMATH produces the correct result:

XMATCH(4,{3,2,1},-1) -> 1

with the above parameters, there is no need to have the array sorted.

 

So maybe rather than a wrong XMATCH behavior is a glitch in MATCH and we are taking advantage of that to get the last non blank cell in a range.

@JosWoolley 

For me

= XMATCH("Ω", A:A, -1)

worked well (though it is many years since I have used a whole column reference;  I am usually content to take the end of a Table to define a column dynamically and the problem does not arise for array formulas and their spilt ranges).

 

My recollection is that there is little need for the ±2 parameter in XLOOKUP and XMATCH since the functions are designed to exploit binary search methods and will perform sorting in memory to ensure that apparently linear searches are performed efficiently.

@Peter Bartholomew 

It returns position of next smaller within the range, not the position of last non-blank, isn't it?

@Sergei Baklan 

True,  I allowed myself to get mesmerised by the repeated values as in

{3,2,2}

Not that they are techniques I have needed for many years now,  I am far more likely to use formulas more tightly tied to the specifics of the problem, such as

= ROWS(Table1)

=COUNT(squares#)