Forum Discussion

zchahin's avatar
zchahin
Copper Contributor
Aug 23, 2024
Solved

Automate: Find matching pairs between 2 columns with comma separated values & then vlookup + delete

Hi, 

 

I currently have a list of 80,000 patents (col. A) with their respective backward and forward citations (col. B&C). 

 

I want to write a formula in column D that identifies a match between any of the citation numbers in column B within the corresponding row in column C. E.g. in row 3, there is a match between BWD and FWD citation (US1012) in column B  and column C. 

 

My ideal output in column D is the matching value (i.e US1012). If no matches are found between col.B & C, the corresponding cell in column D can remain blank, as shown. 

 

My main problem (column E) is I want to have a way of identifying if the value in column D (i.e US 1012)  is present in column A (list of original patent IDs). If so, I wanted an automated quick method of  highlighting the corresponding row number in column A containing the matching citation in column D. The intention is to delete these matching patent IDs (i.e delete row 14 US1012). 

 

Overall example, 

Column D identifies US 1012 is present in both column B and C. 

Column E highlights the row containing US1012 in column A. This methodology would be applied to a spreadsheet with 30,000 rows; I will then delete all matching citation rows  (i.e row 14 with US1012 in col A). 

 

Any help is appreciated.

 

Many thanks

 

  • zchahin 
    It could be done using FILTER function to sort out first the returned values in column D, then using the ISNUMBER and MATCH function from Column D, to identify which Patent Numbers was matched on column D.


    formula for E2

    =LET(
    filteredValues, FILTER($D$2:$D$100, $D$2:$D$100<>""),
    IF(ISNUMBER(MATCH(A2, filteredValues, 0)), "DELETE", "")
    )

    Sample file: zchahin.xlsx
     
    If you're using an older version of excel
    you'll need to add another column to achieve this, (or this could also be done by creating name range)

    array formula in E2:
    =IFERROR(INDEX($D$2:$D$100, SMALL(IF($D$2:$D$100<>"", ROW($D$2:$D$100)-ROW($D$2)+1), ROW(1:1))), "")

     

    formula in F2:
    =IF(ISNUMBER(MATCH(A2, $E$2:$E$100, 0)), "DELETE", "")

6 Replies

  • Rodrigo_'s avatar
    Rodrigo_
    Steel Contributor

    zchahin 
    It could be done using FILTER function to sort out first the returned values in column D, then using the ISNUMBER and MATCH function from Column D, to identify which Patent Numbers was matched on column D.


    formula for E2

    =LET(
    filteredValues, FILTER($D$2:$D$100, $D$2:$D$100<>""),
    IF(ISNUMBER(MATCH(A2, filteredValues, 0)), "DELETE", "")
    )

    Sample file: zchahin.xlsx
     
    If you're using an older version of excel
    you'll need to add another column to achieve this, (or this could also be done by creating name range)

    array formula in E2:
    =IFERROR(INDEX($D$2:$D$100, SMALL(IF($D$2:$D$100<>"", ROW($D$2:$D$100)-ROW($D$2)+1), ROW(1:1))), "")

     

    formula in F2:
    =IF(ISNUMBER(MATCH(A2, $E$2:$E$100, 0)), "DELETE", "")

    • zchahin's avatar
      zchahin
      Copper Contributor

      Rodrigo_ 

       

      Excellent - thank you very much for a simple solution to my problem.

       

      I wanted to clarify how to adjust the formula in column D so that if there are comma separated values in column C (spreadsheet initially attached contained single patent IDs in column C), it will return in column D the ID of the matching patent (between column B and C).

       

      E.g. in the screenshot below citation FR2845889A1 is a match between column B and C but is not displayed in cell D10 (Column D formula you have provided works great when single citation is present in column C, which we are checking for a match in column B).  

      • OliverScheurich's avatar
        OliverScheurich
        Gold Contributor

        zchahin 

        =XLOOKUP(TRUE,ISNUMBER(XMATCH(TEXTSPLIT(C2,"|"),TEXTSPLIT(B2,"|"))),TEXTSPLIT(C2,"|"),"")

        Does this formula return the intended result?

Resources