SOLVED

Erase words (from a dictionary) in paragraphs across multiple cells

Copper Contributor

Hi all,

 

I have two sources in two tables:

- a dictionary in column A containing a list of dictionary words (between 100 and 1000 words)

- a list of cells in column C containing copied text information in full paragraphs from a random internet page. So I have e.g. 20 cells in Column C and each cell is filled with a paragraph, each containing between 20 and 100 words or so.

 

I'd like to erase in column C all words that are in the dictionary in Column A.

 

In VBA this would be feasible by programming a double nested loop iterating through each word of each cell in Column C and comparing it with all the words in Column A. That's easy...

 

Now, since VBA macro's are not allowed at work (ban on .xlsm), I'd like to do the same thing in a formula with nested functions.

 

Microsoft has recently added some nice functions, e.g. TextJoin, TextSplit, Filter, ...

 

I 'feel' that it should be possible to have this 'dictionary attack' on any given text by removing any dictionary words, and I'm looking into this myself ongoingly.

 

But if anybody could give me some hints and tips in the meantime, these are certainly more than welcome.

 

Thank you!

 

5 Replies

@Mark_Dekeyser 

It sounds like you have access to the latest functions so this might work for you:

=MAP(paragraphs,LAMBDA(e,TRIM(TEXTJOIN(,,TEXTSPLIT(LOWER(e),LOWER(words))))))

 

@Patrick2788 Thank you Patrick!

That's (almost) the solution I'm looking for: I'd like to erase full words, so not to erase 'bed' from 'embed', but I hope to figure it out from here.

So, I have to learn two new Excel functions, being MAP and LAMBDA.

After a quick scan of these functions; could you please explain the parameter 'e'; I don't get it...

Thank you!

 

best response confirmed by Mark_Dekeyser (Copper Contributor)
Solution

@Mark_Dekeyser 

'e' is assigning a name to a parameter in the array that MAP will check.  Essentially, what MAP allows you to do is to perform a calculation on each 'element' in an array.  If you study MAP, you may also look into the functions mentioned here:

Announcing LAMBDA Helper Functions (microsoft.com)

 

I've updated the sample to account for those anomalies.  CheckWords is a LAMBDA designed to pull out words found in Column A.

 

 

'CheckWords
'This function 'explodes' a cell converting it to an array.  MAP replaces words found in the Words list with a blank.
=LAMBDA(element,TEXTJOIN(" ",,MAP(TEXTSPLIT(element, " "), LAMBDA(word, IF(ISNUMBER(XMATCH(word, words)), "", word)))))

 

  

CheckWords is then run on each 'element' within the paragraphs array.

 

=MAP(paragraphs,LAMBDA(e,CheckWords(e)))

 

 

Wow... That's not only next level, that's simply amazing ;)
Thank you so much! I'm gonna study now ;)
You're welcome! Glad I could help.
1 best response

Accepted Solutions
best response confirmed by Mark_Dekeyser (Copper Contributor)
Solution

@Mark_Dekeyser 

'e' is assigning a name to a parameter in the array that MAP will check.  Essentially, what MAP allows you to do is to perform a calculation on each 'element' in an array.  If you study MAP, you may also look into the functions mentioned here:

Announcing LAMBDA Helper Functions (microsoft.com)

 

I've updated the sample to account for those anomalies.  CheckWords is a LAMBDA designed to pull out words found in Column A.

 

 

'CheckWords
'This function 'explodes' a cell converting it to an array.  MAP replaces words found in the Words list with a blank.
=LAMBDA(element,TEXTJOIN(" ",,MAP(TEXTSPLIT(element, " "), LAMBDA(word, IF(ISNUMBER(XMATCH(word, words)), "", word)))))

 

  

CheckWords is then run on each 'element' within the paragraphs array.

 

=MAP(paragraphs,LAMBDA(e,CheckWords(e)))

 

 

View solution in original post