Forum Discussion
Mark_Dekeyser
Nov 25, 2022Copper Contributor
Erase words (from a dictionary) in paragraphs across multiple cells
Hi all,
I have two sources in two tables:
- a dictionary in column A containing a list of dictionary words (between 100 and 1000 words)
- a list of cells in column C containing copied text information in full paragraphs from a random internet page. So I have e.g. 20 cells in Column C and each cell is filled with a paragraph, each containing between 20 and 100 words or so.
I'd like to erase in column C all words that are in the dictionary in Column A.
In VBA this would be feasible by programming a double nested loop iterating through each word of each cell in Column C and comparing it with all the words in Column A. That's easy...
Now, since VBA macro's are not allowed at work (ban on .xlsm), I'd like to do the same thing in a formula with nested functions.
Microsoft has recently added some nice functions, e.g. TextJoin, TextSplit, Filter, ...
I 'feel' that it should be possible to have this 'dictionary attack' on any given text by removing any dictionary words, and I'm looking into this myself ongoingly.
But if anybody could give me some hints and tips in the meantime, these are certainly more than welcome.
Thank you!
'e' is assigning a name to a parameter in the array that MAP will check. Essentially, what MAP allows you to do is to perform a calculation on each 'element' in an array. If you study MAP, you may also look into the functions mentioned here:
Announcing LAMBDA Helper Functions (microsoft.com)
I've updated the sample to account for those anomalies. CheckWords is a LAMBDA designed to pull out words found in Column A.
'CheckWords 'This function 'explodes' a cell converting it to an array. MAP replaces words found in the Words list with a blank. =LAMBDA(element,TEXTJOIN(" ",,MAP(TEXTSPLIT(element, " "), LAMBDA(word, IF(ISNUMBER(XMATCH(word, words)), "", word)))))
CheckWords is then run on each 'element' within the paragraphs array.
=MAP(paragraphs,LAMBDA(e,CheckWords(e)))
- Patrick2788Silver Contributor
It sounds like you have access to the latest functions so this might work for you:
=MAP(paragraphs,LAMBDA(e,TRIM(TEXTJOIN(,,TEXTSPLIT(LOWER(e),LOWER(words))))))
- Mark_DekeyserCopper Contributor
Patrick2788 Thank you Patrick!
That's (almost) the solution I'm looking for: I'd like to erase full words, so not to erase 'bed' from 'embed', but I hope to figure it out from here.
So, I have to learn two new Excel functions, being MAP and LAMBDA.
After a quick scan of these functions; could you please explain the parameter 'e'; I don't get it...
Thank you!
- Patrick2788Silver Contributor
'e' is assigning a name to a parameter in the array that MAP will check. Essentially, what MAP allows you to do is to perform a calculation on each 'element' in an array. If you study MAP, you may also look into the functions mentioned here:
Announcing LAMBDA Helper Functions (microsoft.com)
I've updated the sample to account for those anomalies. CheckWords is a LAMBDA designed to pull out words found in Column A.
'CheckWords 'This function 'explodes' a cell converting it to an array. MAP replaces words found in the Words list with a blank. =LAMBDA(element,TEXTJOIN(" ",,MAP(TEXTSPLIT(element, " "), LAMBDA(word, IF(ISNUMBER(XMATCH(word, words)), "", word)))))
CheckWords is then run on each 'element' within the paragraphs array.
=MAP(paragraphs,LAMBDA(e,CheckWords(e)))