Lambda Formula for Replacing Substrings

Copper Contributor

As part of a project I am working on I need to remove the corporate entity type from a large list of companies.  The list of entities is about 40 abbreviations, ranging from GmbH to Inc. (or Inc) and also includes short identifiers such as A and B.  The entity name almost always (always for all intents and purposes) as the final word of the name.  

 

 

=REDUCE(N14;'Entity Abbreviations'!$A$3:$A$77;LAMBDA(a;b;SUBSTITUTE(a; b; OFFSET(b;0;1))))

 

 


I have been using the formula but have encountered several blocking issues:  

 

- Single characters are not being removed
- Short names (the abbreviation for Company, 'Co,' for example) are being removed from the end of the name when occurring as a single string of text but also when part of a larger word ('Communications,' for example).

- My formula isn't removing both punctuated and non-punctuated versions (Inc and Inc.).  This seems to be order dependent, as Inc. will be removed if listed first in my array but all Inc occurrences remain.  If Inc is list

- Removal is leaving me with an issue of orphaned punctuation marks in the form of commas and periods.


Any suggestions are much appreciated, thank you in advance!

2 Replies

@johnvoorheis 

Cosmetic comment, in current formula I'm not sure why do you need OFFSET, empty string is enough.

 

But before writing the formula (or Power Query, or VBA) it's better to define the logic what shall be removed. For example, single character - is that the character surrounded by space or any of punctuation character s(which ones) or start/end of the paragraph. What shall be removed in such case.

Similar for other cases.

Having such logic in any text-like notation it'll be easier with formula.

@SergeiBaklan cosmetic commentary, well received! 
Empty string "" produces the same result as offset, so kudos for the simplification :) 

Re: defining the logic behind the removal, is there a way I could use a formula to specify to only remove these characters (and character sets) when they occur as the final word or standalone character in a cell?  


I've tried denoting a space before " " and after " ", such as " "A" " but to no avail