Most English words are short

This chart was inspired by a piece of folk wisdom from an old email forwarding-spam:

Aoccdrnig to rseearch at Cmabrigde Uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit plcae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.

The form of this email appears at first glance to provide direct evidence of its own “azanmig” claim. But something’s a little fishy: a lot of the words aren’t actually scrambled. Short words aren’t affected by the message’s middle-muddling, and if a paragraph mostly consists of words under six letters, there’s just not that much unscrambling to do!

So is the chain letter’s prose typical, or has it been carefully written to exaggerate our anagramming abilities? This depends on how English words are distributed according to their length. Although the median word has seven letters, shorter words are used more frequently:

  • Over a third (38%) of the words in an average paragraph have three letters or fewer — too short to scramble.
  • Four- and five-letter words, which admit only easy rearrangments, account for a further quarter (25%) of written words.
  • The average word length is just 4.95 letters.

By these metrics, the email is on the easy side. Almost half of its words don’t need to be deciphered; 79% of the words have fewer than six letters; and the average word length is 4.10. A random Wikipedia article will take a fair bit more effort to unscramble. Unfortunately, we don’t have amazing anagram abilities — at best, we can claim a form of error correction against small numbers of transpositions.