SOLVED

Automate: Find matching pairs between 2 columns with comma separated values & then vlookup + delete

Copper Contributor

Hi, 

 

I currently have a list of 80,000 patents (col. A) with their respective backward and forward citations (col. B&C). 

 

I want to write a formula in column D that identifies a match between any of the citation numbers in column B within the corresponding row in column C. E.g. in row 3, there is a match between BWD and FWD citation (US1012) in column B  and column C. 

 

My ideal output in column D is the matching value (i.e US1012). If no matches are found between col.B & C, the corresponding cell in column D can remain blank, as shown. 

 

My main problem (column E) is I want to have a way of identifying if the value in column D (i.e US 1012)  is present in column A (list of original patent IDs). If so, I wanted an automated quick method of  highlighting the corresponding row number in column A containing the matching citation in column D. The intention is to delete these matching patent IDs (i.e delete row 14 US1012). 

 

Overall example, 

Column D identifies US 1012 is present in both column B and C. 

Column E highlights the row containing US1012 in column A. This methodology would be applied to a spreadsheet with 30,000 rows; I will then delete all matching citation rows  (i.e row 14 with US1012 in col A). 

 

Any help is appreciated.

 

Many thanks

 

Screenshot 2024-08-23 at 23.20.44.png

6 Replies
best response confirmed by zchahin (Copper Contributor)
Solution

@zchahin 
It could be done using FILTER function to sort out first the returned values in column D, then using the ISNUMBER and MATCH function from Column D, to identify which Patent Numbers was matched on column D.

Rodrigo__1-1724462641476.png


formula for E2

=LET(
filteredValues, FILTER($D$2:$D$100, $D$2:$D$100<>""),
IF(ISNUMBER(MATCH(A2, filteredValues, 0)), "DELETE", "")
)

Sample file: zchahin.xlsx
 
If you're using an older version of excel
you'll need to add another column to achieve this, (or this could also be done by creating name range)
Rodrigo__2-1724463028959.png

array formula in E2:
=IFERROR(INDEX($D$2:$D$100, SMALL(IF($D$2:$D$100<>"", ROW($D$2:$D$100)-ROW($D$2)+1), ROW(1:1))), "")

 

formula in F2:
=IF(ISNUMBER(MATCH(A2, $E$2:$E$100, 0)), "DELETE", "")

@Rodrigo_ 

 

Excellent - thank you very much for a simple solution to my problem.

 

I wanted to clarify how to adjust the formula in column D so that if there are comma separated values in column C (spreadsheet initially attached contained single patent IDs in column C), it will return in column D the ID of the matching patent (between column B and C).

 

E.g. in the screenshot below citation FR2845889A1 is a match between column B and C but is not displayed in cell D10 (Column D formula you have provided works great when single citation is present in column C, which we are checking for a match in column B).  

@zchahin 

=XLOOKUP(TRUE,ISNUMBER(XMATCH(TEXTSPLIT(C2,"|"),TEXTSPLIT(B2,"|"))),TEXTSPLIT(C2,"|"),"")

Does this formula return the intended result?

matches.jpg

Hi @OliverScheurich 

Works perfectly, thank you!

 

 

Only issue is my formula in column E no longer recognises the values in column D and therefore does not identify that they are present in column A and output delete in the corresponding row in corresponding column E (please see above).

Very grateful for your help so far.

@zchahin 

You are welcome. I think the problem are the spaces for example there is " US1111" in cell D2 and "US1111" in cell A22. This doesn't return a match because of the leading space in cell D2 which isn't in A22 (in your sheet).

 

The reason for the entry in cell D2 is that you have e.g. "US1111 | US1234 | US1019" in columns B and C. In my example this would be "US1111|US1234|US1019" which means i have "|" instead of " | ".

 

With " | " in columns B and C you can use in D2:

=XLOOKUP(TRUE,ISNUMBER(XMATCH(TEXTSPLIT(C2," | "),TEXTSPLIT(B2," | "))),TEXTSPLIT(C2," | "),"")

spaces.jpg

 

In the attached sample file there is a worksheet for both "|" and " | ".

Brilliant, thank you!

I had one final question:

Is it possible to adapt the formula in column D of our initial problem, so that the output in column D is all the unique values from the coresponding column b cell. I.e remove the duplicated citation between column B and C. An e.g from your table would be for cell D2: matching citation in B2 and C2 = US9. Output in column D is US7 (values from column B minus the citation found to be present in both B2 and C2).

Again I am very grateful for your help.
1 best response

Accepted Solutions
best response confirmed by zchahin (Copper Contributor)
Solution

@zchahin 
It could be done using FILTER function to sort out first the returned values in column D, then using the ISNUMBER and MATCH function from Column D, to identify which Patent Numbers was matched on column D.

Rodrigo__1-1724462641476.png


formula for E2

=LET(
filteredValues, FILTER($D$2:$D$100, $D$2:$D$100<>""),
IF(ISNUMBER(MATCH(A2, filteredValues, 0)), "DELETE", "")
)

Sample file: zchahin.xlsx
 
If you're using an older version of excel
you'll need to add another column to achieve this, (or this could also be done by creating name range)
Rodrigo__2-1724463028959.png

array formula in E2:
=IFERROR(INDEX($D$2:$D$100, SMALL(IF($D$2:$D$100<>"", ROW($D$2:$D$100)-ROW($D$2)+1), ROW(1:1))), "")

 

formula in F2:
=IF(ISNUMBER(MATCH(A2, $E$2:$E$100, 0)), "DELETE", "")

View solution in original post