Language

The word “distribute” is etymologically the opposite of “tribute”.

early 15c., distributen, “to deal out or apportion, bestow in parts or in due proportion,” from Latin distributus, past participle of distribuere “to divide, deal out in portions,” from dis- “individually” and tribuere “to pay, assign, grant,” also “allot among the tribes or to a tribe,” from tribus (see tribe)

In geography, a tributary is a stream or river that feeds into a larger body of water; for example, the Thompson River (Snek’w7étkwe) is a tributary of the Fraser (Sto:lo). When a river bifurcates into multiple downstream branches, such as the North and South Arms of the Fraser, those branches are called distributaries.

“Hydrogen” in German is Wasserstoff, which sounds hilarious except it’s just a literal translation of the Greek hydro-gen!

Most chemical elements are more or less the same in German and English. The fun exceptions are:

  • Wasserstoff (Hydrogen); “water stuff” is a literal translation from Greek.
  • Kohlenstoff (Carbon); “coal stuff” is a literal translation from Latin.
  • Stickstoff (Nitrogen); “suffocation stuff”, apparently because it’s the non-oxygen part of air, is a German original.
  • Sauerstoff (Oxygen); “sour stuff” is a literal translation from Greek.

German also has Natrium (Sodium), Kalium (Potassium), Wofram (Tungsten), Quecksilber (Mercury), and Blei (Lead).

My favourite etymology fact is that “helicopter” is helico-pter — Greek for “spiral wing”. It’s obvious when pointed out, but I’d never have realized it on my own since in English it’s always broken down as heli-copter!

Relatedly, Magic: The Gathering has a creature type called Thopter, which is a rebracketed abbreviation of the word “ornithopter” (from ornitho- meaning bird, and pter meaning wing).

Birds carry the number 280 in the style of Twitter's old Fail Whale graphic

The length distribution of tweets has shifted in response to raised character limits, but it’s still the case that a disproportionate number of tweets use all the characters they’re given.


A sample of tweets gathered in 2019 still exhibit a telltale spike approaching the character limit, but it is smaller than the tweet distribution from a decade earlier. The peak of the curve has also shifted leftwards, to 15 characters, due to a separate change in 2016 that excluded media attachments and certain at-mentions from the character count.

The most interesting feature of the above graph is unfortunately an artifact of the dataset — the massive spike at 105 characters can be blamed on a spambot network broadcasting identical copies of the same tweet when the dataset was collected.

The general particulars chart of the vessel Queen of Surrey

“General particulars” is an excellent phrase that deserves to catch on more widely than its current context of legally-mandated notices on boats.

(Boats are required by international law to have a wheelhouse poster listing their “general particulars”, i.e., a list of statistics, properties, and other bits of information necessary to get a basic view of the vessel.)

May you live in interesting times” is typically claimed to be a Chinese expression, but it actually originated with the British. Joseph Chamberlain — Neville’s dad — used the phrase “interesting times” frequently in speeches:

I think that you will all agree that we are living in most interesting times. I never remember myself a time in which our history was so full, in which day by day brought us new objects of interest, and, let me say also, new objects for anxiety.

Joseph Chamberlain

Joseph’s other son Austen was the first to claim it originated as a Chinese saying. Quote Investigator theorizes that Austen, in conversation with his diplomat colleagues, learned about a Chinese proverb that expresses apprehension about living in what his father would call “interesting times” and assumed that was the source of Joseph’s phrase. But the wording of the real proverb is entirely different:

寧為太平犬,莫作亂離人

Better to be a dog in days of peace, than a human in times of chaos.

Feng Menglong

The word pea was originally pease in the singular and peasen in the plural. Eventually, speakers understandably interpreted the -s in pease as the plural suffix rather than just a sound in the original Latin pisum/pisa and Greek πίσον, and the English singular pea was born.

For example, a 15th-century cookbook has the following recipe for what we would today call pea soup:

Take grene pesyn, an washe hem clene an caste hem on a potte, an boyle hem tyl þey breste, an þanne take hem vppe of þe potte, an put hem with brothe yn a-noþer potte, and lete hem kele; þan draw hem þorw a straynowre in-to a fayre potte, an þan take oynonys…

Harleian manuscript 279

Pease also functioned as a mass noun, like bread or oatmeal.

Yisterday I ete cale and pes, & to-day I eete pes & cale, & to-morn I mon eate pess with cale, & after to-morn I mon eate cale with pease.

Alphabet of Tales

Unfortunately, the latter quote is taken from a religious anecdote promoting a moderate and uniform diet, and not a hilariously sarcastic comment by a medieval peasant.

A bunch of Twitter logos, a cursor, and the number 140

A disproportionate number of my tweets are exactly 140 characters. I don’t know whether that means I’m really good at Twitter or really bad. Sometimes it’s the result of a too-long idea being meticulously edited down to size; sometimes it’s purely chance. Either way, I find 140-character tweets oddly satisfying — and based on a large dataset of tweets, it looks like I’m not the only one.


The dataset paints a fascinating picture of the distribution of tweet lengths. Extremely short tweets are understandably very rare, but it doesn’t take long for the distribution to reach its first mode at 35 characters. The curve gradually and smoothly trails off to a local minimum around 116 characters, before positively spiking after 135. The average length is a bit more than 68 characters and the median a bit lower at 62.

Based on corpus data, over half of the words in a typical page of English text has four or fewer letters, with the average word length being slightly less than five.

Length of…MeanMedianMode
Unique words7.5277
Words weighted by frequency4.9543

Source code
import pandas
wd = pandas.read_csv(
"eng-ca_web_2002_1M-words.txt",
"\t",
header=None,
names=["Rank", "Word", "Frequency"]
)
wd.Word = wd.Word.apply(str).str.lower()
wd = (
wd.query("Frequency > 5") # Remove rare words,
.query("Word.str.match('^[A-Za-z]\*$')") # words with non-letters,
.query("Word.str.contains('[aeiouy]')") # abbrvns w/o vowels,
.groupby("Word") # and duplicates
.sum()
)
wd["Length"] = pandas.Series(wd.index, index=wd.index).str.len()
sm = wd.groupby("Length")
result = pandas.DataFrame({
"NormalizedCount": sm.Frequency.count() / sm.Frequency.count().sum(),
"NormalizedFrequency": sm.Frequency.sum() / sm.Frequency.sum().sum()
})

According to an old piece of email forwarding-spam, it’s easy to read text even if you scramble all but the first and last letters in each word. But the truth is a bit more complicated.

The ancient meme reads:

Aoccdrnig to rseearch at Cmabrigde Uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit plcae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.

The form of this paragraph appears at first glance to provide direct evidence of its own “azanmig” claim. But something’s a little fishy: a lot of the words aren’t actually scrambled. Short words aren’t affected much if at all by the message’s middle-muddling, and most English words are short!


Matt Davis, an actual researcher at the University of Cambridge, wrote an informative response to the meme:

There are elements of truth in this, but also some things which scientists studying the psychology of language (psycholinguists) know to be incorrect.

I’m going to list some of the ways in which I think that the author(s) of this meme might have manipulated the jumbled text to make it relatively easy to read. This will also serve to list the factors that we think might be important in determining the ease or difficulty of reading jumbled text in general.

  1. Short words are easy.
  2. Function words (the, be, and, you etc.) stay the same - mostly because they are short words.
  3. Of the 15 words in this sentence, there are 8 that are still in the correct order. However, as a reader you might not notice this since many of the words that remain intact are function words, which readers don’t tend to notice when reading.
  4. Transpositions of adjacent letters are easier to read than more distant transpositions.
  5. None of the words that have reordered letters create another word.
  6. Transpositions were used that preseve the sound of the original word (e.g. toatl vs ttaol for total).
  7. The text is reasonably predictable.

If you want to test your own permutation powers against realistic examples, I whipped up a bookmarklet that you can use to scramble the words on any website you want to challenge!

Scramble text!