As regular readers will know, last year I was hired by a film producer to “write” a feature film screenplay entirely using artificial intelligence (more on that here).
One of the challenges that I, and my collaborator Dr Eliel Camargo Molina, faced along the way was getting the AI stuck in self-created ruts. These were aspects of the creation process whereby it would keep giving very similar results.
For a time, about half of the ideas it generated were about characters called Sarah. Sarah feels like a common name but it certainly was over-indexing in our outputs. So I conducted a quick data study to determine just how atypical this kind of ‘Sarah spree’ was.
I studied 440,956 cast credits across 8,415 movies released between 2000 and 2021 (more info in the Notes at the end). This produced a dataset of just over 50k character names, although only around 6k were what most people would think of as first names. The rest were a mix of job roles, descriptions or non-human identifiers.
With the data ready, let’s see which names can be named the most named among named characters.
John In Sixty Seconds
John was the most commonly cited character first names, accounting for a whopping 2.71% of named characters over the past two decades – one in every 37 characters.
The first thing you might note is the sausage fest of male names. This is largely the result of two factors:
- Firstly, there were more named characters with male names – 59% of characters with a name had a traditionally male name.
- Secondly, there was more clustering around male names. The 50 most popular male names accounted for 28% of male roles, whereas the 50 female characters only account for 22% of female characters.
This data confirms that the AI calling half of all leads “Sarah” was definitely way off base, given that just 0.38% of movie characters have shared that name over the past two decades.
But we don’t need to end our exploration there…
Mary Pops In
I thought some might appreciate splitting the most common names by gender.
Note: In the two charts below, the percentage is of characters within that gender. I.e. While “Mary” only accounted for 0.46% of all roles, it was 1.13% of all characters with a traditionally female name.
Mary, Sarah and Anna take top spots, together accounting for 2.8% of all female-named characters.
John leads the pack of male-named characters, followed by Jack, David and Paul.
Masters of the Unisex
At this point, you might be wondering about unisex names. When I started this research I thought it would be more of an issue than it turned out to be. As I knew the gender of the performers, I could look at the gender of most characters with a traditional unisex name. In the vast majority of cases, they are heavily skewed towards one gender or another.
Within the criteria of at least 100 appearances, the ‘most unisex names’ were Kelly (87% of portrayals were by a woman), Jean (44% by women) and Pat (26% by women).
Jobs for the boys?
Before we finish up, I wanted to share a little off-shoot from the dataset. One of the first tasks I had in this project was to determine what was a first name and what was a character description. I.e. “Stephen” is a first name whereas “Researcher” would not be (I appreciated the irony of using my own name as an example given the fact that my surname is a verb, but luckily this is only a study of first names).
Given that I know the gender of many of the performers playing each role, we can do a quick study into the gender of key jobs on screen.
Among the most frequently described roles, few will be surprised to see that “Waitresses” were almost always portrayed by women and “Waiters” by men.
However, we see a concerning gender bias among roles which do not imply gender. For example, 93.2% of characters called “Nurse” were portrayed by women, versus just 11.8% of those just called “Doctor”. Men also took the vast majority of roles of authority figures, such as “Detective” (91.8% of portrayals were by a man) and “Police” (96.2%).
I would like to note that this topic has been studied to a much greater (and better!) degree by others in the past. I would highly recommend the work of Geena Davis Institute on Gender in Media (here too) and The Center for the Study of Women in Television and Film at San Diego State University to name just two.
Today’s research is looking at live-action movies released in domestic cinemas between 2000 and 2019, and those movies released by one of the major studios on any platform in 2020 and 2021. The raw data came from IMDb, The Numbers and Wikipedia.
I used databases of ‘baby names’ to determine what is and is not a name (as opposed to a description of an unnamed character), as well as my own judgment. I tried to identify all names, not just those common in Western cultures. Given the nature of the global film business, and my criteria of requiring a US theatrical release, Western names hugely predominate. I don’t think this is a consequence of my methodology but inherent in the movies.
Along the way, I had to make a few subjective calls about names. For example, I excluded “Guy” as although it is a not-too-uncommon first name for a man, it’s more commonly used in screenwriting as a generic identifier for an unnamed male character. Sorry Guys. This was also true of descriptors or other roles, such as “Angel”, “Hunter” or “Rich”.
Gender was determined via a mix of baby name datasets, pronouns in actors’ biographies, online classifications (e.g. some movie databases spit ‘actors’ and ‘actresses’) and my past work on gender.
I have split the gender stats into two categories as this reflects the way data has been captured over the period I studied. In the future, there may more nuanced databases which capture gender fluidity and non-binary identities, but until then we’re left with a rather crude binary approach. I used the gender the person currently publicly identifies as. I am always looking for better and more detailed ways to study and report on gender. If you think I can do better, reach out and we can chat.
The situation I described above in relation to the first name of “Guy” also happened with “Art”. Art is a valid first name but was most frequently used in the context of “art student” or “art gallery patron”.
This reminded me of a situation last year when I was watching a movie with a group of friends, only some of whom work in film. We kept noting how good the matt painting was, and afterwards one of the non-film types asked where they could see more movies starring the hugely talented Mat Painting. I’m sad to say, we were not kind in letting him down. Sorry, Jez.
Marine is the fourth most popular female name?
Ah yes. That’s another ‘more often job not name’. I’ve corrected that. Thanks for pointing that out
Ah, that explains it.
Interesting that John remains so popular. In the US it fell from 14th most popular baby name in 2000 to 27th most popular in 2021.
Have you controlled for characters named “John Doe”?
Interestingly, I only found one credited role of John Doe! Minority Report.
Phew. I thought that was Kevin Spacey’s character in Se7en as well, but perhaps not.
Yes, but before my criteria of 2000+ films
Interesting stuff as always Stephen. Coincidentally, I did a very similar study almost exactly 10 years ago, although I was a little lazier than you and didn’t do any filtering and sorting on the names. So the names at the top are people who only go by “Jack”, “Sarah” etc., rather than having their full names in the credits.
Here’s what we got:
Interestingly, “Sarah” is in the top 3 in our raw query, which may explain why the AI was keen on it!