Forum Discussion
DellMachine
Aug 21, 2023Copper Contributor
How to extract text from a string of text that follows a random pattern
Hi All,
Newbie here . I have an Excel sheet with over 50k rows. I am trying to extract Merchant names from the Transaction ID. Since the Transaction ID doesn't follow any specific patterns I am not able to use the std. if/search. Please refer the below table for reference.
Thanks, appreciate all your help.
Transaction ID | Return Value= Merchant Name |
CColaUK_121324 | CocaCola |
CoCola UK*124214 | CocaCola |
Pepsi USA | Pepsi |
PepcXCDA_ 12346789 | Pepsi |
Prime * ACREFEDF | Prime |
Dr. Pep8423049283 | Dr Pepper |
Pepsi | |
FantaQWOERT | Fanta |
DrPepper_DFSDF |
|
MountainDew*121314324EWR |
|
Sprite_GDFGHDFG |
|
Diet Pepsi_1243124 |
|
Coke Zero_345945873849 |
|
- mtarlerSilver Contributor
I don't think it is possible to exactly succeed as there are way too many potential random variations. If you had a list of potential variations it would be easy (relatively) but I assume you do not. You might play with AI but not sure how will it will do. I do have a nifty NearMatching Lambda function I wrote that will look at the first x characters from 1 array and search the other array and return with item matched the most characters. It can be set to go 'forward' or 'backward' or 'both' as to which array to start with and which to compare to. In the above examples if the transaction was Pep456 it would return a list including both Pepsi and Dr. Pepper if set to forward or both; but only return Pepsi in the reverse (because Dr. Pepper starts with Dr.). Also "CCola" would not find nor be found by "CocaCola" assuming a min match of 3 characters. So although I think it is extremely helpful it is not a silver bullet.
I am attaching the example file and look at column G with this formula:
=TEXTJOIN(", ",TRUE,NearMatches2([@Vendor],ASL,3,"Backward"))
since NearMatches2 returns an array I wrapped it with the TEXTJOIN
the parameters are Array1, Array2, Min # characters, Direction