Most English words are short

An piece of old email forwarding-spam says:

Aoccdrnig to rseearch at Cmabrigde Uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit plcae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.

The form of this email appears at first glance to provide direct evidence of its own “azanmig” claim. But something’s a little fishy: a lot of the words aren’t actually scrambled. Short words aren’t affected by the message’s middle-muddling, and if a paragraph mostly consists of words under six letters, there’s just not that much unscrambling to do!

So is the chain letter’s prose typical, or has it been carefully written to exaggerate our anagramming abilities? To find out, I stopped by the Leipzig Corpora Collection to work out the distribution of English words by their length.

The distribution of word lengths in English

Although the median word has seven letters, shorter words are used more frequently:

By these metrics, the email is on the easy side. Almost half of its words don’t need to be deciphered; 79% of the words have fewer than six letters; and the average word length is 4.10. A random Wikipedia article will take a fair bit more effort to unscramble. Unfortunately, our anagram abilities are not as amazing as the paragraph — at best, we can claim a form of error correction against small numbers of transpositions. An actual researcher at Cambridge has more to say about the jumbled science behind this early internet meme.