What's in a name?

Ever since I first heard about it, I’ve been itching to play around with the name data released by the BC Vital Statistics Agency — a list all of the names that appear on BC’s birth certificates for each of the last hundred years! I was curious to see what I could find out about the popularity and gender distribution of given names, so I decided to dedicate a couple blog posts to exploring it. If you’re interested in playing with the data yourself, you can find them here and here.

To get a feel for how the data looks, let’s plot the historical frequency of some name, which I chose completely at random and not at all to satisfy my personal curiosity.

Frequency of the name 'Ross' among newborns over time

As you might have guessed even before you looked at it, the graph has two peaks centered around the baby boom and millenial generations. Before we draw any conclusions about the relative popularity of the particular name, we’ll have to compare it to the overall birth rate (c.f. xkcd 1138).

Total births recorded in the dataset

Although they start off looking pretty similar, the top curve has a much shorter peak during the echo boom. This suggests that my name has gradually been declining in popularity since the 1960s or so, which jibes with my experience: I know exactly one other Ross in my generation, two from my parents’, and none younger than me.

A few other assorted facts about my name:

  • I was one of twenty-two Rosses born in BC in 1988, which explains why I never had to use a last initial in school.
  • Ross was the 306th most common of the 980 names listed for my birth year, tied with names like Felicia, Gabriel, Martin, and George.

Next, let’s look at how the most popular names have changed over time. I compiled lists of the most common names in each of four decades: the most recent years in the dataset (2005–2014), my own generation (1985–1994), my parents’ generation (1955–1964), and my grandparents’ generation (1925–1934).

1925–1934 1955–1964 1985–1994 2005–2014
1 Mary Susan Jessica Olivia
2 Margaret Karen Amanda Emma
3 Dorothy Sandra Sarah Emily
4 Shirley Linda Ashley Ava
5 Patricia Deborah Jennifer Sophia
1925–1934 1955–1964 1985–1994 2005–2014
1 John David Michael Ethan
2 William Robert Matthew Liam
3 Robert Michael Christopher Jacob
4 James John Ryan Lucas
5 George James Kyle Benjamin

It’s interesting to note that, while the most popular female names are completely different from generation to generation, the same is not true for male names. John, James, and Robert all appear in the first two lists, while Michael hops from number three in 1955–64 to number one in 1985–94. It’s not until the current generation that we see an entirely new batch of male names in the top five.

That pattern is not the only reason why I kept the top lists separated by assigned gender. As you can see by comparing the above chart with the one below, the top female names account for a smaller share of total births than the top male names. For example, Susan was the most common female name in 1955–64, but would have been in a virtual tie for seventh in a combined list.

I’m not sure whether the top-heaviness of male names is because there have been fewer of them historically for parents to choose from, or whether there are additional cultural factors at work, like the primarily-male “Junior” naming convention. Regardless of the reasons, it seems that the effect is decreasing over time, as names for both assigned genders are becoming more evenly distributed.

Proportion of recorded birth certificates bearing one of the top ten male names or top ten female names (respectively blue and pink, sorry)

So far, although I’ve tried to be careful with my language, this post has largely accepted the dataset’s assumed gender framework. But I’m very curious to learn about names that challenge this framework. In particular, what are the most common gender-nonspecific names?

Before I compile a list, I have to choose how to compare names: for example, between Quinn (24♂ + 47♀ = 71 births) and Dylan (85♂ + 9♀ = 94 births), should Quinn rank higher because it is more balanced between assigned genders? Or should Dylan be higher because it’s more common overall? Let’s strike a balance between parity and popularity by listing names according to how often they were recorded with their secondary gender. (In the above example, Quinn places higher because 24 > 9.)

1925–1934 1955–1964 1985–1994 2005–2014
1 Francis Kelly Taylor Riley
2 n/a Leslie Jamie Taylor
3 n/a Terry Jordan Avery
4 n/a Kim Morgan Quinn
5 n/a Robin Devon Jordan

As it turns out, the ranking method doesn’t make much difference for the first list — only one name appears on both tables for more than one year between 1925 and 1934! Thanks to the lower birthrate during this period, it’s possible that some less common names are missing from the list, but it’s also the case that gender-nonspecific names were just rare back then.

In my grandparents’ day, Francis was in the top half of female names despite being less common than the alternate spelling Frances; it also ranked 39th of 262 among male names.

As BC’s population boomed, so did the percentage of babies given names spanning multiple genders. This trend was led by Leslie, a previously male-exclusive name that gained popularity as a female name during World War II.

In the 1950s and ‘60s, the previously-unknown name Kelly rode a gender-symmetric wave of popularity.

Proportion of newborns given Kelly as a female name (top) or as a male name (bottom, shaded)

Taylor really caught on in my generation: in addition to topping our charts, it was one of the most common names in the ‘90s, period.

Proportion of newborns given Taylor as a female name (top) or as a male name (bottom, shaded)

Over the last decade, Riley has earned the top spot among gender-nonspecific names through a slow but consistent accumulation of namesakes.

Proportion of newborns given Riley as a female name (top) or as a male name (bottom, shaded)

When it comes to gender-neutral names, the above lists only tell part of the story. Vital Statistics only records people’s names and assigned genders at birth, not what they choose to go by once they’re old enough to establish their own identities. Because of this, the half-dozen people I know of as “Chris” actually show up in different (heavily gendered) rows in the dataset.

It would be very neat to see an analysis of name data that ranks clusters of related names against each other!