Forum Discussion

Jay_Ridley's avatar
Jay_Ridley
Copper Contributor
Jun 12, 2022

Partial text matching of cell fields between two columns in different sheets to insert another cell

HI excel guru's, 

Im a novice looking for assistance in creating a normalised database (currently using excel first to achieve the normalisation). I have two tables from different sources (excel spread sheets) with relationship to each other. Player_Bio and Player_Stats. Each table has an attribute (column) P_Name, but the text field in one (Player_Bio) gives full name and in the other (Player_Stats) it gives mostly initial plus surname with some full names. (Player_Bio.P_Name) only has +-600 instances and Player_Stats.P_Name) has >13000. Im trying to match (left join(Partial Match)- (Player_Bio.P_Name) to (Player_Stats.P_Name) and at the same time create a new attribute (column) in (Player_Bio.P_Age) that inserts the P_Age from Player_Stats to the matched (P_Name) in Player_Bio. Any assistance on how to do this would be appreciated. I have provided a sample of the two tables below.

Player_Bio Table:

P_NameP_Age
GabrielNul
Aaron RamsdaleNul

 

Player_Stats Table:

P_NameP_Age
G. Pasquale33
Luis GarcĂ­a37
  • mathetes's avatar
    mathetes
    Silver Contributor

    Jay_Ridley 

     

    Those examples (a) don't fit with your descriptions (e.g., Player_Bio gives only one full name and another first only) (b) provide no reliable way to "join" based on the name, given the inconsistencies.

     

    How would you know to connect "Gabriel" with "G. Pasquale"? The latter's name may actually be George Pasquale.

     

    Is there not some other identifier, membership number for example, in those two tables?  ... with thousands of names in a given population, even relying on FULL (and reliable) names is going to lead to ambiguities in some cases, as there are in fact more than one individual named James Smith or Mary Brown. That's the reason that the corporate HR database I managed for a while used Social Security Numbers as the reliable basis for joins such as you want to attempt.

     

    In the absence of some reliable identifier, you're going to be faced with mountains of manual work.

     

Resources