What's in a name? Part II

One of my motives for digging into baby names is to better understand how they have been used to assign and embody gender over the years. In my last post, we saw that

In this post, we’ll focus more on gender-nonspecific names, starting with some more generational Top 5 lists. This time around, it wasn’t so clear how to compare two names for the ranking: for example, should Quinn rank higher because it’s more balanced, or should Dylan because it’s more common overall?


I decided to strike a balance between parity and popularity by listing names according to how often they were recorded with their secondary gender. (In the above example, I would place Quinn higher because 24 > 9.)


As it turns out, the ranking method doesn’t make much difference for the first list — only one name appears on both tables for more than one year between 1925 and 1934! Thanks to the lower birthrate during this period, it’s possible that some less common names are missing from the list, but it’s also the case that gender-nonspecific names have become considerably more common since then.

Percent of newborns given a gender-nonspecific name (at least 10 births/year in both datasets)

In my grandparents’ day, Francis was in the top half of female names despite being less common than the alternate spelling Frances; it also ranked 39th of 262 among male names.

Proportion of newborns given Francis as a female name (top) or as a male name (bottom, shaded)

As BC’s population boomed, so did the percentage of babies given names spanning multiple genders. This trend was led by Leslie, a previously male-exclusive name that gained popularity as a female name during World War II.

Proportion of newborns given Leslie as a female name (top) or as a male name (bottom, shaded)

In the 1950s and ’60s, the previously-unknown name Kelly rode a gender-symmetric wave of popularity.

Proportion of newborns given Kelly as a female name (top) or as a male name (bottom, shaded)

Taylor really caught on in my generation: in addition to topping our charts, it was one of the most common names in the ’90s, period.

Proportion of newborns given Taylor as a female name (top) or as a male name (bottom, shaded)

Over the last decade, Riley has earned the top spot among gender-nonspecific names through a slow but consistent accumulation of namesakes.

Proportion of newborns given Riley as a female name (top) or as a male name (bottom, shaded)

When it comes to gender-neutral names, the above lists only tell part of the story. Vital Statistics only records people’s names at birth, not their eventual preferred name, so the half-dozen people I know of as “Chris” show up in different rows of the dataset.

In practice, I suspect that even more people use the same shortened form of names with more heavily gendered variants. I found no fewer than fourteen names — each exclusive to one sex if you go by birth certificates — that could plausibly be shortened to a homophone of Chris. In aggregate, they make a remarkably gender-balanced family!

I’d love to see someone run with this. It’d be pretty neat to see how clusters of names measure up to one another — are there more Chrises or Alexes in my generation? — or a tool to suggest groups of similar-sounding names across multiple genders.

What's in a name?

Ever since I first heard about it, I’ve been itching to play around with the name data released by the BC Vital Statistics Agency — a list all of the names that appear on BC’s birth certificates for each of the last hundred years! I was curious to see what I could find out about the popularity and gender distribution of given names, so I decided to dedicate a couple blog posts to exploring it. If you’re interested in playing with the data yourself, you can find them here and here.

To get a feel for how the data looks, let’s plot the historical frequency of some name, which I chose completely at random and not at all to satisfy my personal curiosity. ☺

Frequency of the name "Ross" among newborns over time

As you might have guessed even before you looked at it, the graph has two peaks centered around the baby boom and millenial generations. Before we draw any conclusions about the relative popularity of the particular name, we’ll have to compare it to the overall birth rate (c.f. xkcd 1138).

Total births recorded in the dataset

Although they start off looking pretty similar, the top curve has a much shorter peak during the echo boom. This suggests that my name has gradually been declining in popularity since the 1960s or so, which jibes with my experience: I know exactly one other Ross in my generation, two from my parents’, and none younger than me.

A few other assorted facts about my name:

Next, let’s look at how the most popular names have changed over time. I compiled lists of the most common names in each of four decades: the most recent years in the dataset (2005–2014), my own generation (1985–1994), my parents’ generation (1955–1964), and my grandparents’ generation (1925–1934).


It’s interesting to note that, while the most popular female names are completely different from generation to generation, the same is not true for male names. John, James, and Robert all appear in the first two lists, while Michael hops from number three in 1955–64 to number one in 1985–94. It’s not until the current generation that we see an entirely new batch of male names in the top five.

The historical frequency of four chart-topping names

That pattern is not the only reason why I kept the top lists separated by assigned gender. As you can see by comparing the above chart with the one below, the top female names account for a smaller share of total births than the top male names. For example, Susan was the most common female name in 1955–64, but would have been in a virtual tie for seventh in a combined list.

The historical frequency of four more chart-topping names

I’m not sure whether the top-heaviness of male names is because there have been fewer of them historically for parents to choose from, or whether there are additional cultural factors at work, like the primarily-male “Junior” naming convention. Regardless of the reasons, it seems that the effect is decreasing over time, as names for both assigned genders are becoming more evenly distributed.

Proportion of recorded birth certificates bearing one of the top ten male names or top ten female names (respectively blue and pink, sorry)

The next thing I’d like to investigate is how individual names are gendered. You can read about that in part 2.

The 140-character spike

I don’t intentionally set out to do so, but I’ve noticed my tweets gravitating towards Twitter’s character limit. Sometimes it’s the result of a too-long idea being meticulously edited down to size; sometimes it’s purely chance. Either way, it’s oddly satisfying to post a tweet with exactly 140 characters.

How often do tweets max out their character limits? What’s the average length of a tweet? To answer these questions, I turned to a set of tweets collected by Chang, Caverlee, and Lee in the fall of 2009. Filtering out retweets, I was left with over four million of them, of which over 2% used their entire 140 characters.

The shape of the character distribution is fascinating. One-word tweets are understandably very rare, but it doesn’t take long for the distribution to reach its first mode at 35 characters. The curve gradually and smoothly trails off to a local minimum around 116 characters, before positively spiking after 135. The average length is a bit more than 68 characters and the median a bit lower at 62. It looks like a lot of tweetable ideas can be expressed in five or ten words, and there’s a lot of people valiently trying to squeeze in something that’s slightly too big for the text box.

I’m now curious whether the spike at 140 characters has more to do with where the character limit is set or the very fact that there is one at all. If Twitter had set the maximum at (say) 160 characters, would the distribution eventually drop off to zero, or would we gobble up the extra four-or-so words and get stuck at the different character limit?

Colonialism and the Four-Colour Theorem

The Four-Colour Theorem about planar graphs implies that you only need four colours to properly draw a map, making sure that neighbouring countries are coloured differently. But this application comes with a caveat: the theorem is only guaranteed to work if the countries are all connected. This is essentially true of modern countries, so current political maps of the Earth only need four colours. But it wasn’t always this way; things were a lot more complicated back in the age of empires.

If we look back to the late 19th century, most of the planet was ruled by only a few countries, each having many satellite colonies scattered across the globe. Each colony was a new opportunity for empires to violate the hypotheses of the Four-Colour Theorem (as well as the basic principles of human decency). In sub-Saharan Africa, for example, the British, French, German, Portugese, and Belgian empires bordered each other — a configuration requiring five colours on a map.

Can you find a historical map that requires more than five colours?

Who else has an Erdos-Bacon-Sabbath number?

A little while ago, I did some sleuthing to find out the Erdős number of Brian May, astrophysicist and guitarist from Queen. My travels led me to Timeblimp, who threw together three measures of professional collaboration to make a rather fun parlour game. Assuming that the people in your parlour are three kinds of nerds and enjoy long and complicated internet scavenger hunts. Which I am and I do.

The game is to find a well-known person who has published academically, released a song, and been involved in a movie or TV show. Then, you play three versions of Six Degrees of Kevin Bacon: find a series of movies to connect them to prolific actor Kevin Bacon, a series of coauthored papers to connect them to the eccentric mathematician Paul Erdős, and a series of musical collaborations to get to Black Sabbath. Add up all the links and you get the Erdős-Bacon-Sabbath number.

To even have an Erdős-Bacon-Sabbath number puts you in quite an exclusive club. Only four people — Richard Feynman, Natalie Portman, Stephen Hawking, and the aforementioned Brian May — are known to be on the list. Until now. In this post, I’m going to throw out a few more potential names and put in the legwork to add two of them to the list of People at the Center of the Universe.


How to catch legendary Pokémon

The year was 2000, a few years after Nintendo made Pokémon Red and Blue. I was in grade 7, and had spent much of the last few years finding, capturing, training, and battling every Pokémon I could. Finally, I had caught all 150 available species and completed my Pokédex, and I desperately wanted Nintendo to make more.

That Christmas, I got my wish. My 12-year-old eyes lit up when I unwrapped Pokémon Silver at my grandparents house: Colour graphics! A whole new world to explore! And a hundred new Pokémon! As soon as I could escape from present-opening, I raced downstairs and started playing. By the time it was time to leave at the end of the week, I already had four badges — and then, on the car ride home, I encountered a new legendary Pokémon with a very unique quality.

Normally, each species of Pokémon can be found in a handful of fixed habitats; for example, Jigglypuff can always be found on Route 46. But this new encounter, the legendary beast Entei, didn’t stay put. It ran away as soon as I stumbled across it, and moved to a different route every time I stepped into a new location. Catching this roaming Pokémon would be an interesting challenge.

Months passed, and though I had long since beaten the rest of the game, I still hadn’t succeeded in catching Entei. I had spent hours chasing it around the world map, only to have it run away each time I threw a Pokéball at it. Exasperated, I wondered: What strategy would catch the roaming Pokémon as quickly as possible?


Brian May has an Erdős number

If you’ve heard of Erdős Numbers, Erdős-Bacon Numbers, and the fact that Queen lead guitarist Brian May has a PhD, you may have wondered whether Brian May has a well-defined Erdős-Bacon number. As a matter of fact, he does. Here’s how the rock legend is connected to the centres of cinema and academia.

Bacon number: 3

Thanks to IMDB and the Oracle of Bacon, Bacon numbers are easy to find. The guitarist’s credited voice role as “Massed Peasant Chorus/Chamberlain” in The Adventures of Pinnochio makes him only three films away from Kevin Bacon.

Brian MayThe Adventures of Pinnochio
Martin LandauEd Wood
Bill MurrayWild Things
Kevin Bacon 

Erdős number: 7

In mathematics, the equivalent tool to the Oracle of Bacon is MathSciNet’s collaboration distance tool. Unfortunately, it does not catalogue the astrophysics journals Brian May has published in, so his Erdős number has to be found manually. The best previous attempt I found was a path of length eight, through a popular science book cowritten by May. However, I managed to find a shorter path, starting with a letter published in Nature.

Brian MayMgI emission in the night sky spectrum
T.R. HicksThe structure of NGC 7027
J.P. PhillipsQCD: quantum chromodynamic diffraction
K. Golec-BiernatIntegrable Hamiltonian system in 2N dimensions
Th.W. RuijgrokOn the dynamics of a continuum spin system
C.J. ThompsonOn the mathematical mechanism of phase transition
Mark KacThe Gaussian law of errors in the theory of additive number theoretic functions
Paul Erdős 

This gives Brian May an Erdős-Bacon number of at most 10, and the smallest known Erdős-Bacon-Sabbath number of 11.

Isotopes of Bismuth 176: Elect Ron

Isotopes of Bismuth 172: Platol Bismol

Isotopes of Bismuth 167: Shapes in the Plane

Exponential growth in Katamari Damacy

YouTubers' progress on the 'Make the Moon' level of Katamari Damacy

Katamari Damacy is a wonderful game: simple, fun, delightfully bizarre, and deceptively mathematical. Katamari and its sequels follow the tiny Prince of All Cosmos as he rolls a magical sticky ball (called a katamari) around Japan. As things stick to the katamari, it becomes bigger, enabling it to pick up larger and larger objects. Eventually, the Prince builds up a massive enough katamari to roll up people, cars, buildings, islands, rainbows, and just about everything else in the game.

Katamari’s core game mechanic is the exponential growth model. As long as the stage has plenty of objects to pick up, the katamari grows at a rate roughly proportional to its size. Katamari delivers an aesthetic experience that conveys the essential intuitions behind exponential functions, similar to short films like Powers of Ten.

In the above chart, I explore how closely katamari size tracks an exponential curve. I watched five Let’s Play videos of different YouTubers playing the final level of Katamari Damacy and plotted their progress. Sure enough, each run traces an approximately straight line on the logarithmic scale, indicating exponential growth.

Isotopes of Bismuth 55: Tanning

Isotopes of Bismuth 31: Beware of Pi

Most English words are short

Short words are more frequent in written English

This chart was inspired by a piece of folk wisdom from an old email forwarding-spam:

Aoccdrnig to rseearch at Cmabrigde Uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit plcae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.

The form of this email appears at first glance to provide direct evidence of its own “azanmig” claim. But something’s a little fishy: a lot of the words aren’t actually scrambled. Short words aren’t affected by the message’s middle-muddling, and if a paragraph mostly consists of words under six letters, there’s just not that much unscrambling to do!

So is the chain letter’s prose typical, or has it been carefully written to exaggerate our anagramming abilities? This depends on how English words are distributed according to their length. Although the median word has seven letters, shorter words are used more frequently:

By these metrics, the email is on the easy side. Almost half of its words don’t need to be deciphered; 79% of the words have fewer than six letters; and the average word length is 4.10. A random Wikipedia article will take a fair bit more effort to unscramble. Unfortunately, we don’t have amazing anagram abilities — at best, we can claim a form of error correction against small numbers of transpositions.