Blog Post

Exchange Team Blog
2 MIN READ

How the wordo list helps us bust typos in Exchange documentation

The_Exchange_Team's avatar
Jul 26, 2006

It's amazing what we Exchange editors find in our documentation.

 

When the XDOCS UE team was reorged in August 2000, many of us were assigned to work on the Exchange 2000 Server SDK. We inherited a mammoth HTML Help file with roughly 4,500 topics.

 

The content, originally written in Microsoft Word version 6, needed to first be converted into XML, which took about 15 people nearly 4 months. After the SDK was reorganized and converted, the SDK editors began the task of editing the individual topics. Research had shown that approximately 60% of our readers who used the SDK spoke English as a second language, and so we implemented strict guidelines for making the language and terminology as clear and consistent as possible.

 

One day, while editing, I noticed a topic that referred to "Widows 2000 Server" instead of Windows 2000 Server. At the next editorial meeting, we joked that this referred to the spouses of the Microsoft employees who worked nights and weekends to deliver Windows 2000.

 

Another day I found an instance of "Exchange Sever". Curious whether there were more, I searched the entire SDK, and discovered some 200 instances of Widows 2000 and more than 100 instances of Exchange Sever.

 

After that, the SDK editors were on the lookout for such malapropisms. We found references to "massages", "pubic folders", and "steaming media".

 

We clearly needed a way to scan the SDK for these sorts of terms. We ran Policheck on the SDK before we released it each quarter. Policheck is a content-scanning tool designed to check for sensitive geopolitical terms, profanity, and trademark terms in Microsoft products. But Policheck didn't target the kinds of errors we were finding.

 

Cathy Anderson, an Exchange editor at the time - who is now one of our release managers - noticed that Policheck allows users to create their own term lists. That's when she conceived the idea of the "wordo" list.

 

Cathy's wordo list is used by all Exchange editors and includes obviously inappropriate malapropisms like "pubic folders" and misspelled Microsoft Exchange words like "polices", "truss", "covert", and "manger". It also includes commonly-misspelled words like "occured".

 

We made the decision to release all of our Help topics for Exchange Server 2007 Beta 2 so that customers will have the benefit of as much information as possible. Thus, some of our topics are not content complete, have not been edited, and you might just find some fun new terms!

If you happen to find misspelled words (or any other errors) in our documentation, be sure to let us know. We're always looking for more terms to add to the wordo list. To send feedback, just click the Send Feedback button at the bottom of the topic that contains the error, type a description of the problem, and then click Send.

- Lindsay Pyfer

Updated Jul 01, 2019
Version 2.0
  • 1. Spell check the entire documentation set
    2. Break documentation set into individual words generating a list of all words and the number of times the word appeard
    3. Sort by number of times
    4. Look at the words with the least occurences and look for words correctly spelled but inappropriate for the documentation (e.g., manger instead of manager)
    5. Do the same check for the source code by extracting all strings in the code, resources, etc.,
  • Chris' method seems apt to be efficient in catching a lot.  Further, looking for words in such a list with high occurrences that are preceded or followed by words that vary by just one character and have very low occurrences should also yield some hits.

    What's next after typos and wordos?  Phrasos?
  • Words differing by 1 letter are good candidates as Joe commented.

    There is a whole history of editing checks you can do.

    One that comes to mind is detecting adjacent words that are the same.

    There are some old old Unix System V scripts to do a large number of word and content checking tests against a text file.