SOLVED

Index & Match based on a substring of the look up value contained in a 2-dimension array

Copper Contributor

Is it possible to update my formula below using XLOOKUP, such that

  1. any substring of the lookup_value (Sheet1:A2) can be matched across the lookup_array (Sheet2:B-J)
  2. the formula ignores empty cells in the lookup_array (ps: all non-empty cells in the lookup_array are unique values, no duplicates)
  3. the formula returns the first match

 

My biggest challenge is doing an Index & Match based on a substring of the lookup_value that is contained anywhere in a 2-dimensional array. The formula below will match a substring of the lookup_value that matches BOTH Criteria 1 [column B] AND Criteria 2 [Column C]. This worked at the time, but now I would like to update it with XLOOKUP and also so that it matches based on ANY of the values from multiple columns, rather than matching exact two columns.

 

 

=INDEX(Sheet2[Category], MATCH(1,COUNTIFS($A2,"*"&Sheet2[Option1]&"*",$A2,"*"&Sheet2[Option2]&"*"),0))

 

 

Ideally, I would like to use the "name" of the lookup_array in the formula so that additional columns in the future can be added without having to change the formula. The correct formula will return "Fruit" for Sheet1:B2 and "Vegetable" for Sheet1:B6.  

excel option.PNG

 

Answers by @Sergei Baklan in previous post I think are so helpful, but I can't modify those answers enough to make it work for this particular case. Thank you everyone for your help.

14 Replies

@UI_c2022 

=INDEX($E$2:$E$6,MATCH(1,MMULT(--(ISNUMBER(SEARCH($F$2:$I$6,A2))),{1;1;1;1}),0))

 

Maybe with this formula as shown in the attached file. 

Thank you @OliverScheurich the SEARCH function is more efficient for sure! For some reason the equation is not working for me, perhaps because I have more than 4 columns in my real dataset. I tried to add {1;1;1;1....1} as many columns as I want it to search but it returns back an error. It may also be because I have empty blank cells in that array. How can I have the SEARCH ignore empty cells in the array?
I checked again, and the formula provided by @OliverScheurich works when there is NO EMPTY cells within the search array. My search array does have empty cells, so I filled them with a value that would not result in a false positive (is unlikely to be found in my search list). This is a great short-term fix. If there is a workaround filling empty cells, I would be happy to learn more.
best response confirmed by UI_c2022 (Copper Contributor)
Solution

If I may, a few suggestions to modifying the formula:
1) change Search(F2:I6, A2) to Search(A2, F2:I6). I believe blank cells in your data will match to A2 and could return incorrect results, but it won't match in the other direction.
2) Instead of hardcoding the array of 1's, you could use TRANSPOSE(COLUMN($F$2:$I$2)) - if you insert columns, the formula should update.
3) Try --(MMULT(...)>0). If you had "red apple" and "green apple" in the same row, mmult would return 2 for that row (searching for "apple"), causing match to fail (as match is looking for "1"). Also, using column() in #2 instead of an array of 1's makes this necessary.
4) You could wrap the formula with Iferror for instances where nothing matches.

Note you will need to use Ctrl+Shift+Enter if you don't have office 365 (because of the transpose function):
=IFERROR(INDEX($E$2:$E$6,MATCH(1,--(MMULT(--(ISNUMBER(SEARCH(A2,$F$2:$I$6))),TRANSPOSE(COLUMN($F$2:$I$2)))>0),0)), "No Match")

@UI_c2022 

=INDEX($E$2:$E$6,MATCH(1,MMULT(--(ISNUMBER(SEARCH($F$2:$P$6,A2)))*NOT(ISBLANK($F$2:$P$6)),{1;1;1;1;1;1;1;1;1;1;1}),0))

 

This formula takes into account empty cells and works for 11 options in my sheet. Unfortunately, when I made my first suggestion, I didn't consider the empty cells.

Thank you for your suggestions @JMB17, the transpose suggestion worked once I also included the NOT(ISBLANK) suggestion by @OliverScheurich . However, I tried to execute suggestions #1 and #3, but the formula would return an error message. I am particularly interested in addressing the issue of MMULT when the search term "Ana's Grocery Market" matches two options under the "grocery category" which has "grocery" and "market" in the same row. I notice that the match fails. How can I fix that?

Thank you @OliverScheurich. The formula works fantastic. I did notice that when the search item meets more than one option in a row, that the match fails. Unfortunately, this will be a common issue in my database because I am trying to categorize based on broad yet similar words that are likely to occur together in real scenarios, thus defeating the purpose of creating the formula in the first place.
Can you upload a sample workbook to see what you have (without any sensitive information)? After clicking on "reply" click "open full text editor" and you should see an option to upload a file. It works fine for me, so I'm not going to be able to re-create your problem without looking at the same thing you are.

@UI_c2022 

As variant ant to minimize hardcoding:

If name with some gap category and options in Sheet2 (without headers) as range

image.png

here

image.png

in B2 it could be

=LET(
   category, INDEX( range,,1 ),
   data,     INDEX( range,
                    SEQUENCE( ROWS( range ) ),
                    SEQUENCE(, COLUMNS( range )-1,2 ) ),
  matching, --ISNUMBER( SEARCH( IF( data= "", UNICHAR(9999), data ), A2) ),
  id,         SUM( SEQUENCE( ROWS( data ) ) *
                   MMULT( matching,
                          SEQUENCE( COLUMNS( data ),,1,0 ) ) ),
IFS( A2 = "", "", id = 0, "not found", 1, INDEX( category, id ) ) )

and drag it down

@UI_c2022 

=INDEX($E$2:$E$6,MATCH(1,--(MMULT(--(ISNUMBER(SEARCH($F$2:$P$6,A2))*NOT(ISBLANK($F$2:$P$6))),ROW(1:11)^0)>0),0))

 

You can try this formula which works for multiple matches in my spreadsheet. The expression ROW(1:11)^0 translates to {1;1;1;1;1;1;1;1;1;1;1} and can easily be adapted to other data ranges. The number of rows (11 in this example) must match the number of options which is the number of columns from F:P in this example. The number of columns must be entered within the expression $F$2:$P$6 in both the SEARCH and ISBLANK function.

After using @OliverScheurich formula for a while I saw two inefficiencies that I was able to address using @JMB17 suggestions above.

Inefficiency #1 was the "ROW(1:11)^0" which I changed to TRANSPOSE(COLUMN($F$2:$I$2). The row option made is such that each row had a different number, for example row 3 would have the formula above translated to "ROW(3:13)" and when these rows would be reordered (maybe ordered A-Z vs Z-A for different reasons), al the formulas were highlighted as inconsistent. This could hide a real error message, and also it was very inefficient. Now that they are frozen to a specific 15 columns, I also do not have to worry about accidently deleting those columns and creating another error.

Inefficiency #2: Not using the "IFERROR" function. When I added new data into the table that had not been indexed yet, the result could not return a match and would return an N/A which other mathematical functions in other tables could not translate and those tables were then broken. Using the IFERROR function allows me to flag those that do not match, while also not breaking other formulas that are depending on the match results.

@UI_c2022 

Since you spoke about XLOOKUP() in very first post of this thread, I believe you are on Excel 365 or 2021. If so why don't use SEQUENCE(), LET(), etc to make formula more transparent and reliable?

I do have Excel 365. Can you explain how SEQUENCE() and LET() make the formular more reliable? How would one implement these in the formula above?

@UI_c2022 

 

After looking over this thread again, I can see I erred in suggesting the search arguments could be switched - really not sure what I was thinking. Sounds like you've already sorted it out, though.

 

One thing that would be an issue going forward would be if you added on to your table by adding rows at the bottom or columns to the right - the range references in the formula won't update unless the row/columns were inserted in the middle of the table. So, along the lines of what Sergei is suggesting to make it more reliable, I think you could convert the table to a structured table, and use the let function to define the data range (all columns after the first column), so if you added another option category later, you shouldn't need to update the formula's range reference(s)).

 

I don't yet have the latest office 365 functions, but I think something like this:

Let(data, index(Table1,1,2):index(Table1,rows(Table1),columns(Table1)),
         searchresults, --(MMULT((ISNUMBER(SEARCH(data,A2)))*(NOT(ISBLANK(data))),TRANSPOSE(COLUMN(data)))>0), 
        IFERROR(INDEX(Table1[Category],MATCH(1,searchresults,0)), "No Match")

 

1 best response

Accepted Solutions
best response confirmed by UI_c2022 (Copper Contributor)
Solution

If I may, a few suggestions to modifying the formula:
1) change Search(F2:I6, A2) to Search(A2, F2:I6). I believe blank cells in your data will match to A2 and could return incorrect results, but it won't match in the other direction.
2) Instead of hardcoding the array of 1's, you could use TRANSPOSE(COLUMN($F$2:$I$2)) - if you insert columns, the formula should update.
3) Try --(MMULT(...)>0). If you had "red apple" and "green apple" in the same row, mmult would return 2 for that row (searching for "apple"), causing match to fail (as match is looking for "1"). Also, using column() in #2 instead of an array of 1's makes this necessary.
4) You could wrap the formula with Iferror for instances where nothing matches.

Note you will need to use Ctrl+Shift+Enter if you don't have office 365 (because of the transpose function):
=IFERROR(INDEX($E$2:$E$6,MATCH(1,--(MMULT(--(ISNUMBER(SEARCH(A2,$F$2:$I$6))),TRANSPOSE(COLUMN($F$2:$I$2)))>0),0)), "No Match")

View solution in original post