The Zen of Spell-Checking

Read Time:
2m 19sec

The source: “Who Checks the ­Spell-­Checkers?” by Chris Wilson, in Slate, Dec. 31, ­2008.

Even the cockiest grammarian can be intimidated by the wavy red underline that signals a misspelled word in most word processing programs. But ­when Microsoft Word’s ­spell-­check routinely suggested that future president Barack Obama’s last name be “corrected” to “Boatman” well into 2007, it made the ­widely ­used software program seem ­ridiculous.

Spell-­checking doesn’t need to be so backward, writes Chris Wilson, an assistant editor at Slate. All the technology needed to produce a timely spelling database already exists in search engines such as Google ­and Microsoft’s own Live Search. Part of the reason for the disparity between the nimbleness of Google and the torpor of Microsoft Word’s ­spell-­check—and even that of Google’s online word processor Google ­Docs—­is that word processors and search engines try to do different things. Search engines tackle inquir­ies as broad as human curiosity; word processors are conserva­tive, limiting their lexicons to words that are strictly ­kosher.

The two technologies update their dictionaries differently, Wilson says. Ten years ago, ­word ­processor spelling lists were compiled from web pages or old Internet queries and scrutinized by human editors in software companies. Now, Microsoft keeps on top of change by scanning trillions of words in ­e-­mail messages sent through its Hotmail service, gleaning such terms as “Netflix,” “Radiohead,” “Lipitor,” and “­all-­nighter,” but its spell checker—still overseen by relatively slow-moving humans—makes surprising ­errors.

Google automates its word harvesting, trolling the Web to discover new words that show up with “any appreciable frequency.” Wilson found that Google offered alternate spellings for a word after it appeared only a small number of times, and was able to correct several mis­spellings of the unusual word “theo­thanatology”—the study of the death of God—when it had appeared on­line only 829 ­times.

A word is spelled correctly more often than not, so frequency of its usage is Google’s first cut for cor­rectness. The best algorithms can identify a ­mistake—­and suggest a ­cure—­even when each word is spelled correctly but the context is wrong. Typing “golf war” into a Google search box returns some results for “Gulf war” as well, Wilson notes. The method does have its pitfalls, though. If it were used as a spell-checker, more naughty words might make it through; plus, a few in­stances of “Dalmation” (coast or dog) might turn up be­cause the incorrect spelling with an o is almost as common as the correctDalmatian.”

But it would produce much better results than the primary “edit distance” method used by most word processors. That method offers corrections by changing the fewest number of boatman” for “Obama.”

More From This Issue