Did you hear the one about the geology costume contest?

The gneiss guise finished last.

One of my favourite video games of all time is the inexplicable Katamari Damacy. Its quirky premise involves, as Wikipedia puts it, “a diminutive prince rolling a magical, highly adhesive ball around various locations, collecting increasingly larger objects until the ball has grown great enough to become a star.” In other words, it’s the most successful game ever made about exponential growth.


Katamari makes you explore a world at many different scales, all in the same level. You might start by dodging mice under a couch; just a few minutes later, you’re rolling up the family cat, the furniture, and everything else in the room. It’s an even better playable version of the Powers of 10 video, made possible by the differential equation:

dRdt=s(t)RkR\frac{\text{d}R}{\text{d}t} = s(t)\cdot R \approx k R

You make your magically sticky katamari bigger by rolling stuff up; the bigger you are, the bigger the things you can pick up. So we would expect the radius RR of the katamari to grow at a rate which roughly proportional to RR itself. The exact rate of change is governed by some function s(t)ks(t) \approx k, which depends on how good you are at finding a route filled with objects of just the right size for you to pick up. The solution to this differential equation

Rexp(1ts(u) du)ektR \propto \exp \left( \int_1^t s(u)\ \text{d}u \right) \approx e^{k t}

gives a formula for the katamari’s size at a given time tt.

How justified are we in saying that s(t)s(t) is roughly constant? I charted the minute-to-minute progress of four let’s players on YouTube. If the exponential model is a good one, then katamari size should trace out a straight line on a log scale. And so it is:

The runs keep up a remarkably consistent exponential pace, with a couple visible exceptions — one at the end of the level, when the world starts running out of stuff, and one at roughly the ten-minute mark, when a couple of the players struggled to find items at the right scale to roll up.

I’m not sure if this proves anything other than the fact that I like to do strange things in my spare time. But if you’re a calculus teacher with a bit of time and a PlayStation 2, I suspect this would make a very interesting three-act problem for your class.

Based on corpus data, over half of the words in a typical page of English text has four or fewer letters, with the average word length being slightly less than five.

Length of…MeanMedianMode
Unique words7.5277
Words weighted by frequency4.9543

Source code
import pandas
wd = pandas.read_csv(
"eng-ca_web_2002_1M-words.txt",
"\t",
header=None,
names=["Rank", "Word", "Frequency"]
)
wd.Word = wd.Word.apply(str).str.lower()
wd = (
wd.query("Frequency > 5") # Remove rare words,
.query("Word.str.match('^[A-Za-z]\*$')") # words with non-letters,
.query("Word.str.contains('[aeiouy]')") # abbrvns w/o vowels,
.groupby("Word") # and duplicates
.sum()
)
wd["Length"] = pandas.Series(wd.index, index=wd.index).str.len()
sm = wd.groupby("Length")
result = pandas.DataFrame({
"NormalizedCount": sm.Frequency.count() / sm.Frequency.count().sum(),
"NormalizedFrequency": sm.Frequency.sum() / sm.Frequency.sum().sum()
})

According to an old piece of email forwarding-spam, it’s easy to read text even if you scramble all but the first and last letters in each word. But the truth is a bit more complicated.

The ancient meme reads:

Aoccdrnig to rseearch at Cmabrigde Uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit plcae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.

The form of this paragraph appears at first glance to provide direct evidence of its own “azanmig” claim. But something’s a little fishy: a lot of the words aren’t actually scrambled. Short words aren’t affected much if at all by the message’s middle-muddling, and most English words are short!


Matt Davis, an actual researcher at the University of Cambridge, wrote an informative response to the meme:

There are elements of truth in this, but also some things which scientists studying the psychology of language (psycholinguists) know to be incorrect.

I’m going to list some of the ways in which I think that the author(s) of this meme might have manipulated the jumbled text to make it relatively easy to read. This will also serve to list the factors that we think might be important in determining the ease or difficulty of reading jumbled text in general.

  1. Short words are easy.
  2. Function words (the, be, and, you etc.) stay the same - mostly because they are short words.
  3. Of the 15 words in this sentence, there are 8 that are still in the correct order. However, as a reader you might not notice this since many of the words that remain intact are function words, which readers don’t tend to notice when reading.
  4. Transpositions of adjacent letters are easier to read than more distant transpositions.
  5. None of the words that have reordered letters create another word.
  6. Transpositions were used that preseve the sound of the original word (e.g. toatl vs ttaol for total).
  7. The text is reasonably predictable.

If you want to test your own permutation powers against realistic examples, I whipped up a bookmarklet that you can use to scramble the words on any website you want to challenge!

Scramble text!