How do I delete all content except for some words?

Copper Contributor

Hello.  I am looking to clean up a data set and doing so manually is taking a lot of time. 

 

The issue: A column of data contains a string of words in random arrangement.  Please see below.

 

RP2024_0-1711561595883.png

 

Goal: Remove all words from each cell except those that are explicitly chosen. For example I want to be able to choose words such as "Assistant" or "Associate", etc. and have the rest of the words removed.  Here is an example of what I am looking for:

RP2024_1-1711561795632.png

 

Thank you in advance for the advice. 

8 Replies

@RP2024 

 


Goal: Remove all words from each cell except those that are explicitly chosen. For example I want to be able to choose words such as "Assistant" or "Associate", etc. and have the rest of the words removed.  Here is an example of what I am looking for:

 


So, it is really "looking for key words and if found put them next to the cell".

 

 

=IFERROR(LOOKUP(2,1/SEARCH(list_of_keywords,A1),list_of_keywords),"")

 

@Detlef Lewin 

 

Thank you for reframing the question! I am having some difficulty in making the formula work. As a specific example, let's say I am looking for words "Assistant" and "Professor". The data is column starts in A1 and I want the result to be placed in the adjacent column, B1.  However I want the key words appear in a specific order i.e. "Assistant Professor" and not "Professor Assistant" even if the latter was the manner in which the data appeared in column A. Given the above, could you please edit the formula accordingly?

@RP2024 

=IFERROR(TEXTJOIN(" ",TRUE,CHOOSECOLS(SORT(FILTER(keywords_table,ISNUMBER(SEARCH(list_of_keywords,A1))),2),1)),"")]

 

keywords_table includes the list_of_keywords and a number indicating the order.

 

@RP2024 

As variant

=BYROW(texts,
    LAMBDA(txt,
      LET(
        positions, IFERROR(SEARCH(keywords,txt),0),
        order, SORTBY( keywords, positions),
        found, FILTER(order, SORT(positions)),
        IFERROR(TEXTJOIN(" ",, found), "")
      )
) )

for

image.png

Thank you very much! Huge time saving.
Thank you, Sergei! Much appreciated.

@RP2024 , you are welcome