Forum Discussion

Mark_Dekeyser's avatar
Mark_Dekeyser
Copper Contributor
Nov 25, 2022
Solved

Erase words (from a dictionary) in paragraphs across multiple cells

Hi all,

 

I have two sources in two tables:

- a dictionary in column A containing a list of dictionary words (between 100 and 1000 words)

- a list of cells in column C containing copied text information in full paragraphs from a random internet page. So I have e.g. 20 cells in Column C and each cell is filled with a paragraph, each containing between 20 and 100 words or so.

 

I'd like to erase in column C all words that are in the dictionary in Column A.

 

In VBA this would be feasible by programming a double nested loop iterating through each word of each cell in Column C and comparing it with all the words in Column A. That's easy...

 

Now, since VBA macro's are not allowed at work (ban on .xlsm), I'd like to do the same thing in a formula with nested functions.

 

Microsoft has recently added some nice functions, e.g. TextJoin, TextSplit, Filter, ...

 

I 'feel' that it should be possible to have this 'dictionary attack' on any given text by removing any dictionary words, and I'm looking into this myself ongoingly.

 

But if anybody could give me some hints and tips in the meantime, these are certainly more than welcome.

 

Thank you!

 

  • Patrick2788's avatar
    Patrick2788
    Nov 25, 2022

    Mark_Dekeyser 

    'e' is assigning a name to a parameter in the array that MAP will check.  Essentially, what MAP allows you to do is to perform a calculation on each 'element' in an array.  If you study MAP, you may also look into the functions mentioned here:

    Announcing LAMBDA Helper Functions (microsoft.com)

     

    I've updated the sample to account for those anomalies.  CheckWords is a LAMBDA designed to pull out words found in Column A.

     

     

    'CheckWords
    'This function 'explodes' a cell converting it to an array.  MAP replaces words found in the Words list with a blank.
    =LAMBDA(element,TEXTJOIN(" ",,MAP(TEXTSPLIT(element, " "), LAMBDA(word, IF(ISNUMBER(XMATCH(word, words)), "", word)))))

     

      

    CheckWords is then run on each 'element' within the paragraphs array.

     

    =MAP(paragraphs,LAMBDA(e,CheckWords(e)))

     

     

    • Mark_Dekeyser's avatar
      Mark_Dekeyser
      Copper Contributor

      Patrick2788 Thank you Patrick!

      That's (almost) the solution I'm looking for: I'd like to erase full words, so not to erase 'bed' from 'embed', but I hope to figure it out from here.

      So, I have to learn two new Excel functions, being MAP and LAMBDA.

      After a quick scan of these functions; could you please explain the parameter 'e'; I don't get it...

      Thank you!

       

      • Patrick2788's avatar
        Patrick2788
        Silver Contributor

        Mark_Dekeyser 

        'e' is assigning a name to a parameter in the array that MAP will check.  Essentially, what MAP allows you to do is to perform a calculation on each 'element' in an array.  If you study MAP, you may also look into the functions mentioned here:

        Announcing LAMBDA Helper Functions (microsoft.com)

         

        I've updated the sample to account for those anomalies.  CheckWords is a LAMBDA designed to pull out words found in Column A.

         

         

        'CheckWords
        'This function 'explodes' a cell converting it to an array.  MAP replaces words found in the Words list with a blank.
        =LAMBDA(element,TEXTJOIN(" ",,MAP(TEXTSPLIT(element, " "), LAMBDA(word, IF(ISNUMBER(XMATCH(word, words)), "", word)))))

         

          

        CheckWords is then run on each 'element' within the paragraphs array.

         

        =MAP(paragraphs,LAMBDA(e,CheckWords(e)))

         

         

Resources