Forum Discussion

J_MacDuff's avatar
J_MacDuff
Copper Contributor
May 25, 2023

Distinguish between words and acronyms in excel

I use an Excel spreadsheet to separate long descriptions from vendors or manufacturers into 35-character blocks in PROPER format for entry into my ERP system.  As the process works now acronyms in the descriptions also are changed to proper, ie: PSI (Per Square Inch) becomes Psi.  Can Excel somehow check the data strings against a dictionary to determine if it is a word or not?  I know this is a lot to ask and I have done considerable research finding no solution.  Any help would be greatly appreciated.

  • SnowMan55's avatar
    SnowMan55
    Bronze Contributor

    J_MacDuff 

    You might eventually find it desirable to use VBA (or another scripting solution) to accomplish this, but the results you can get from a formula might be acceptable. You do not say which version of Excel you are using, so I am going to assume Excel 365 for Windows.


    First, some terminology:

    • "PSI" is an abbreviation (specifically, an initialism), but not an acronym. Microsoft has been a significant force in the misuse of the term "acronym" (which is an abbreviation that has become a pronounceable word, such as sonar).  But there are other types of abbreviations, possibly appearing within your data, so I will refer to PSI and other units of measure and similar as initialisms.
    • It appears that you are using "data string" for the unmodified word/initialism – already split out from the full description – and "block" for the modified word/initialism. But I can't be sure. Please clarify if necessary. Your mention of "35-character blocks" has no special meaning to me. Does that have any relevance (beyond mentioning the maximum length of a word/initialism)?


    So yes, Excel can do a lookup (XLOOKUP, for example) to determine if a string of characters is a word or an initialism. But you will have to supply and maintain the list of words or initialisms for it to use. And I suggest you use a list of initialisms, because that will be shorter and you can be sure that some vendor will accidentally or intentionally misspell words, and possibly include their own abbreviations of words.  A list of initialisms can also help standardize the entries.  See the attached workbook.


    Even then, this can't be perfect, because some vendor may name their product "Psi" or another initialism, and refer to that in a item's description. The description may refer to a company name that should not be proper-cased (e.g., TRW), but those company names could be added to the initialism list.


    Nikolino's solution has no special relation to a language dictionary. It may conflate that meaning with the use of a COM object (a memory construct) that is named Dictionary and is similar to a Collection object. While a Dictionary (or Collection) object could potentially be useful in VBA code, his example includes very inefficient code for that object, re-creating and reloading it for every word that is being checked.  And the reloading is from a worksheet range anyway, so that is likely wasted processing.  About the only advantage I could see for using a Dictionary object is that it can distinguish upper-case and lower-case letters for its stored values. But I do not suspect you need that capability.

     

  • NikolinoDE's avatar
    NikolinoDE
    Gold Contributor

    J_MacDuff 

    In Excel, there is no built-in function specifically designed to check if a string is a valid word or acronym against a dictionary. However, you can utilize custom VBA code and external resources to achieve this.

    Here's an approach you can follow:

    1. Prepare a dictionary of valid words and acronyms that you want to distinguish. This can be a text file or a separate sheet in your Excel workbook. Each entry should be on a new line or in a separate cell.
    2. In Excel, press Alt+F11 to open the Visual Basic for Applications (VBA) editor.
    3. Insert a new module by clicking on "Insert" in the menu bar and selecting "Module."
    4. In the module, write the following VBA code:

     

    Function CheckWordOrAcronym(inputString As String, dictionaryRange As Range) As String
        Dim word As Variant
        Dim dictionary As Object
        
        Set dictionary = CreateObject("Scripting.Dictionary")
        
        ' Load dictionary from the specified range
        For Each word In dictionaryRange
            dictionary(word.Value) = True
        Next word
        
        ' Check if inputString is a word or acronym
        If dictionary.exists(inputString) Then
            CheckWordOrAcronym = "Word"
        Else
            CheckWordOrAcronym = "Acronym"
        End If
    End Function

     

    1. Save the VBA code and close the VBA editor.
    2. In your Excel worksheet, you can use the CheckWordOrAcronym function to determine if a string is a word or an acronym. Assuming the input string is in cell A1, and your dictionary range is in cells B1:B100 (adjust the range as needed), enter the following formula in a neighboring cell:

    =CheckWordOrAcronym(A1, B1:B100)

    The formula will check the input string against the dictionary range and return "Word" if the string is found in the dictionary, or "Acronym" if it is not found.

    Please note that this solution requires maintaining and updating your dictionary manually. Also, keep in mind that the accuracy of the distinction depends on the completeness and quality of your dictionary.

    • J_MacDuff's avatar
      J_MacDuff
      Copper Contributor
      Thanks Nikolino.
      This looks great. I can't wait to implement. I'll post results when it's done.
      • NikolinoDE's avatar
        NikolinoDE
        Gold Contributor

        J_MacDuff 

        As I said before, "In Excel there is no built-in function specifically designed to check if a string is a valid word or acronym against a dictionary. However, you can achieve this using custom VBA code and external resources.”

        There are certainly other ways, to find a suitable solution that might suit you, I just described one way. You know better than anyone here whether one of the trails is right for you or not.

        I would never downgrade or criticize an suggestion to profile my suggestion like some do.

         

        Of course you can also use formulas instead of VBA, but that would be too cumbersome in my opinion. If you are looking for a third-party solution to differentiate between words and acronyms in Excel, you may consider using add-ins or plugins that provide advanced text analysis or natural language processing capabilities.

        Here are two of  third-party or MS AddIns options:

        1. Power Tools for Excel: Power Tools is an Excel add-in that provides various text manipulation and analysis features. It includes functions for identifying and categorizing words, acronyms, and other text patterns. You can explore their features to see if they meet your specific needs.
        2. Ablebits Ultimate Suite: Ablebits Ultimate Suite is a collection of Excel add-ins that enhance Excel's functionality. It offers a variety of tools for data analysis and manipulation, including text processing capabilities. While it may not have a specific feature for distinguishing words and acronyms, it provides functions for advanced text manipulation that could potentially be adapted to your requirements.

        These are just two examples of third-party solutions, that provide advanced text analysis capabilities in Excel. You can learn about their features, compatibility with your version of Excel, and user reviews to determine which solution best suits your specific needs and preferences.

        ...and as I said, in the end you have to decide for yourself what is acceptable to you and what is not, without having to obstinately listen to everyone who is anonymous.

Resources