Forum Discussion

DellMachine's avatar
DellMachine
Copper Contributor
Aug 21, 2023

How to extract text from a string of text that follows a random pattern

Hi All,

Newbie here . I have an Excel sheet with over 50k rows. I am trying to extract Merchant names from the Transaction ID. Since the Transaction ID doesn't follow any specific patterns I am not able to use the std.  if/search.  Please refer the below table for reference. 

Thanks, appreciate all your help. 

 

 

Transaction ID

Return Value= Merchant Name

CColaUK_121324

CocaCola

CoCola UK*124214

CocaCola

Pepsi USA

Pepsi

PepcXCDA_ 12346789

Pepsi

Prime * ACREFEDF

Prime

Dr. Pep8423049283

Dr Pepper

www.Dietpepsi.com

Pepsi

FantaQWOERT

Fanta

DrPepper_DFSDF

 

MountainDew*121314324EWR

 

Sprite_GDFGHDFG

 

Diet Pepsi_1243124

 

Coke Zero_345945873849

 

 

  • mtarler's avatar
    mtarler
    Silver Contributor

    I don't think it is possible to exactly succeed as there are way too many potential random variations. If you had a list of potential variations it would be easy (relatively) but I assume you do not. You might play with AI but not sure how will it will do. I do have a nifty NearMatching Lambda function I wrote that will look at the first x characters from 1 array and search the other array and return with item matched the most characters. It can be set to go 'forward' or 'backward' or 'both' as to which array to start with and which to compare to. In the above examples if the transaction was Pep456 it would return a list including both Pepsi and Dr. Pepper if set to forward or both; but only return Pepsi in the reverse (because Dr. Pepper starts with Dr.). Also "CCola" would not find nor be found by "CocaCola" assuming a min match of 3 characters. So although I think it is extremely helpful it is not a silver bullet.

    I am attaching the example file and look at column G with this formula:

    =TEXTJOIN(", ",TRUE,NearMatches2([@Vendor],ASL,3,"Backward"))

    since NearMatches2 returns an array I wrapped it with the TEXTJOIN

    the parameters are Array1, Array2, Min # characters, Direction

     

Resources