One of my motives for digging into baby names is to better understand how they have been used to assign and embody gender over the years. In my last post, we saw that
the most common names have historically been heavily gendered;
the distribution of female names is more even than that of male names; and
the diversity of names is increasing over time.
In this post, we’ll focus more on gender-nonspecific names, starting with some more generational Top 5 lists. This time around, it wasn’t so clear how to compare two names for the ranking: for example, should Quinn rank higher because it’s more balanced, or should Dylan because it’s more common overall?
I decided to strike a balance between parity and popularity by listing names according to how often they were recorded with their secondary gender. (In the above example, I would place Quinn higher because 24 > 9.)
As it turns out, the ranking method doesn’t make much difference for the first list — only one name appears on both tables for more than one year between 1925 and 1934! Thanks to the lower birthrate during this period, it’s possible that some less common names are missing from the list, but it’s also the case that gender-nonspecific names have become considerably more common since then.
In my grandparents’ day, Francis was in the top half of female names despite being less common than the alternate spelling Frances; it also ranked 39th of 262 among male names.
As BC’s population boomed, so did the percentage of babies given names spanning multiple genders. This trend was led by Leslie, a previously male-exclusive name that gained popularity as a female name during World War II.
In the 1950s and ’60s, the previously-unknown name Kelly rode a gender-symmetric wave of popularity.
Taylor really caught on in my generation: in addition to topping our charts, it was one of the most common names in the ’90s, period.
Over the last decade, Riley has earned the top spot among gender-nonspecific names through a slow but consistent accumulation of namesakes.
When it comes to gender-neutral names, the above lists only tell part of the story. Vital Statistics only records people’s names at birth, not their eventual preferred name, so the half-dozen people I know of as “Chris” show up in different rows of the dataset.
In practice, I suspect that even more people use the same shortened form of names with more heavily gendered variants. I found no fewer than fourteen names — each exclusive to one sex if you go by birth certificates — that could plausibly be shortened to a homophone of Chris. In aggregate, they make a remarkably gender-balanced family!
I’d love to see someone run with this. It’d be pretty neat to see how clusters of names measure up to one another — are there more Chrises or Alexes in my generation? — or a tool to suggest groups of similar-sounding names across multiple genders.
Ever since I first heard about it, I’ve been itching to play around with the name data released by the BC Vital Statistics Agency — a list all of the names that appear on BC’s birth certificates for each of the last hundred years! I was curious to see what I could find out about the popularity and gender distribution of given names, so I decided to dedicate a couple blog posts to exploring it. If you’re interested in playing with the data yourself, you can find them here and here.
To get a feel for how the data looks, let’s plot the historical frequency of some name, which I chose completely at random and not at all to satisfy my personal curiosity. ☺
As you might have guessed even before you looked at it, the graph has two peaks centered around the baby boom and millenial generations. Before we draw any conclusions about the relative popularity of the particular name, we’ll have to compare it to the overall birth rate (c.f. xkcd 1138).
Although they start off looking pretty similar, the top curve has a much shorter peak during the echo boom. This suggests that my name has gradually been declining in popularity since the 1960s or so, which jibes with my experience: I know exactly one other Ross in my generation, two from my parents’, and none younger than me.
A few other assorted facts about my name:
I was one of twenty-two Rosses born in BC in 1988, which explains why I never had to use a last initial in school.
Ross was the 306th most common of the 980 names listed for my birth year, tied with Felicia, Gabriel, Martin, and George, among others.
Recently, Ross has struggled to reach the five-births-per-year threshold for statistics to be publicly released; it’s only made the list in three years since the TV show Friends ended.
Next, let’s look at how the most popular names have changed over time. I compiled lists of the most common names in each of four decades: the most recent years in the dataset (2005–2014), my own generation (1985–1994), my parents’ generation (1955–1964), and my grandparents’ generation (1925–1934).
It’s interesting to note that, while the most popular female names are completely different from generation to generation, the same is not true for male names. John, James, and Robert all appear in the first two lists, while Michael hops from number three in 1955–64 to number one in 1985–94. It’s not until the current generation that we see an entirely new batch of male names in the top five.
That pattern is not the only reason why I kept the top lists separated by assigned gender. As you can see by comparing the above chart with the one below, the top female names account for a smaller share of total births than the top male names. For example, Susan was the most common female name in 1955–64, but would have been in a virtual tie for seventh in a combined list.
I’m not sure whether the top-heaviness of male names is because there have been fewer of them historically for parents to choose from, or whether there are additional cultural factors at work, like the primarily-male “Junior” naming convention. Regardless of the reasons, it seems that the effect is decreasing over time, as names for both assigned genders are becoming more evenly distributed.
The next thing I’d like to investigate is how individual names are gendered. You can read about that in part 2.
I don’t intentionally set out to do so, but I’ve noticed my tweets gravitating towards Twitter’s character limit. Sometimes it’s the result of a too-long idea being meticulously edited down to size; sometimes it’s purely chance. Either way, it’s oddly satisfying to post a tweet with exactly 140 characters.
How often do tweets max out their character limits? What’s the average length of a tweet? To answer these questions, I turned to a set of tweets collected by Chang, Caverlee, and Lee in the fall of 2009. Filtering out retweets, I was left with over four million of them, of which over 2% used their entire 140 characters.
The shape of the character distribution is fascinating. One-word tweets are understandably very rare, but it doesn’t take long for the distribution to reach its first mode at 35 characters. The curve gradually and smoothly trails off to a local minimum around 116 characters, before positively spiking after 135. The average length is a bit more than 68 characters and the median a bit lower at 62. It looks like a lot of tweetable ideas can be expressed in five or ten words, and there’s a lot of people valiently trying to squeeze in something that’s slightly too big for the text box.
I’m now curious whether the spike at 140 characters has more to do with where the character limit is set or the very fact that there is one at all. If Twitter had set the maximum at (say) 160 characters, would the distribution eventually drop off to zero, or would we gobble up the extra four-or-so words and get stuck at the different character limit?
The Four-Colour Theorem about planar graphs implies that you only need four colours to properly draw a map, making sure that neighbouring countries are coloured differently. But this application comes with a caveat: the theorem is only guaranteed to work if the countries are all connected. This is essentially true of modern countries, so current political maps of the Earth only need four colours. But it wasn’t always this way; things were a lot more complicated back in the age of empires.
If we look back to the late 19th century, most of the planet was ruled by only a few countries, each having many satellite colonies scattered across the globe. Each colony was a new opportunity for empires to violate the hypotheses of the Four-Colour Theorem (as well as the basic principles of human decency). In sub-Saharan Africa, for example, the British, French, German, Portugese, and Belgian empires bordered each other — a configuration requiring five colours on a map.
Can you find a historical map that requires more than five colours?
A little while ago, I did some sleuthing to find out the Erdős number of Brian May, astrophysicist and guitarist from Queen. My travels led me to Timeblimp, who threw together three measures of professional collaboration to make a rather fun parlour game. Assuming that the people in your parlour are three kinds of nerds and enjoy long and complicated internet scavenger hunts. Which I am and I do.
To even have an Erdős-Bacon-Sabbath number puts you in quite an exclusive club. Only four people — Richard Feynman, Natalie Portman, Stephen Hawking, and the aforementioned Brian May — are known to be on the list. Until now. In this post, I’m going to throw out a few more potential names and put in the legwork to add two of them to the list of People at the Center of the Universe.
The year was 2000, a few years after Nintendo made Pokémon Red and Blue. I was in grade 7, and had spent much of the last few years finding, capturing, training, and battling every Pokémon I could. Finally, I had caught all 150 available species and completed my Pokédex, and I desperately wanted Nintendo to make more.
That Christmas, I got my wish. My 12-year-old eyes lit up when I unwrapped Pokémon Silver at my grandparents house: Colour graphics! A whole new world to explore! And a hundred new Pokémon! As soon as I could escape from present-opening, I raced downstairs and started playing. By the time it was time to leave at the end of the week, I already had four badges — and then, on the car ride home, I encountered a new legendary Pokémon with a very unique quality.
Normally, each species of Pokémon can be found in a handful of fixed habitats; for example, Jigglypuff can always be found on Route 46. But this new encounter, the legendary beast Entei, didn’t stay put. It ran away as soon as I stumbled across it, and moved to a different route every time I stepped into a new location. Catching this roaming Pokémon would be an interesting challenge.
Months passed, and though I had long since beaten the rest of the game, I still hadn’t succeeded in catching Entei. I had spent hours chasing it around the world map, only to have it run away each time I threw a Pokéball at it. Exasperated, I wondered: What strategy would catch the roaming Pokémon as quickly as possible?
If you’ve heard of Erdős Numbers, Erdős-Bacon Numbers, and the fact that Queen lead guitarist Brian May has a PhD, you may have wondered whether Brian May has a well-defined Erdős-Bacon number. As a matter of fact, he does. Here’s how the rock legend is connected to the centres of cinema and academia.
Bacon number: 3
Thanks to IMDB and the Oracle of Bacon, Bacon numbers are easy to find. The guitarist’s credited voice role as “Massed Peasant Chorus/Chamberlain” in The Adventures of Pinnochio makes him only three films away from Kevin Bacon.
In mathematics, the equivalent tool to the Oracle of Bacon is MathSciNet’s collaboration distance tool. Unfortunately, it does not catalogue the astrophysics journals Brian May has published in, so his Erdős number has to be found manually. The best previous attempt I found was a path of length eight, through a popular science book cowritten by May. However, I managed to find a shorter path, starting with a letter published in Nature.
Katamari Damacy is a wonderful game: simple, fun, delightfully bizarre, and deceptively mathematical. Katamari and its sequels follow the tiny Prince of All Cosmos as he rolls a magical sticky ball (called a katamari) around Japan. As things stick to the katamari, it becomes bigger, enabling it to pick up larger and larger objects. Eventually, the Prince builds up a massive enough katamari to roll up people, cars, buildings, islands, rainbows, and just about everything else in the game.
Katamari’s core game mechanic is the exponential growth model. As long as the stage has plenty of objects to pick up, the katamari grows at a rate roughly proportional to its size. Katamari delivers an aesthetic experience that conveys the essential intuitions behind exponential functions, similar to short films like Powers of Ten.
In the above chart, I explore how closely katamari size tracks an exponential curve. I watched five Let’s Play videos of different YouTubers playing the final level of Katamari Damacy and plotted their progress. Sure enough, each run traces an approximately straight line on the logarithmic scale, indicating exponential growth.
This chart was inspired by a piece of folk wisdom from an old email forwarding-spam:
Aoccdrnig to rseearch at Cmabrigde Uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit plcae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.
The form of this email appears at first glance to provide direct evidence of its own “azanmig” claim. But something’s a little fishy: a lot of the words aren’t actually scrambled. Short words aren’t affected by the message’s middle-muddling, and if a paragraph mostly consists of words under six letters, there’s just not that much unscrambling to do!
So is the chain letter’s prose typical, or has it been carefully written to exaggerate our anagramming abilities? This depends on how English words are distributed according to their length. Although the median word has seven letters, shorter words are used more frequently:
Over a third (38%) of the words in an average paragraph have three letters or fewer — too short to scramble.
Four- and five-letter words, which admit only easy rearrangments, account for a further quarter (25%) of written words.
The average word length is just 4.95 letters.
By these metrics, the email is on the easy side. Almost half of its words don’t need to be deciphered; 79% of the words have fewer than six letters; and the average word length is 4.10. A random Wikipedia article will take a fair bit more effort to unscramble. Unfortunately, we don’t have amazing anagram abilities — at best, we can claim a form of error correction against small numbers of transpositions.