Is a movie’s box office gross connected to its budget?

In the last couple of weeks, I have looked at a number of topics around movie budgets.

To complete this trilogy, I am turning to the connection between a movie’s budget and the amount of money it collects at the box office.  Specifically, I’ll be looking at the “Domestic” gross, i.e. all the money spent on movie tickets in US and Canadian cinemas.

I’ll be using my dataset of 5,713 feature films released domestically for which I could find a public budget figure.  See the Notes section for details and caveats of budget information. To measure the extent to which the budget and box office gross are correlated, I’ll be using the Pearson correlation coefficient.

A coefficient of minus one means they’re perfectly negatively correlated (i.e. when one goes up, the other always goes down) and a result of one means they’re perfectly correlated (i.e. they both rise/fall together).  To be statistically significant, we are only interested in figures below -0.2 or above 0.2.

Are budgets and box office grosses connected?

The first result is that there is a connection – a strong one.  Across all movies released in the past twenty years, there is a Pearson correlation of 0.744 between budget and domestic gross.

Interestingly, this has been rising over the years, albeit slowly.  In the chart below I have added an average trendline to make the change easier to spot.

Which genres have the strongest link between budget and theatrical success?

The previous chart lumps all movies together so we should dig a little deeper and see how different types of films fare.

It turns out that there are some pretty big differences between genres. The strongest correlations are found among Musicals (0.974), Westerns (0.965) and Music-based films (0.884).  This means that in the vast majority of cases, a bigger budget has also meant a bigger domestic box office gross.

At the other end of the spectrum are films with a weak connection, including Horror (0.282), Sport (0.456) and Romance (0.457).

How are genres changing over time?

In the same way that each genre has a different overall figure, the 20-year trends differ between genres.  Two genres which are more correlated are Comedy and Action.

The trends for Romance and Fantasy movies are static (although within a fairly volatile year-on-year picture).

And both Horror and Drama movies are seeing a weakening of the connection between their budgets and cinema takings.


The data for today’s research came from the OpusData / The Numbers, IMDb, Wikipedia, Box Office Mojo and the film trade press.  I manually fixed any suspect figures I found, such as the Chinese war epic which IMDb claims cost $18.

There are a few notes and caveats over today’s research which are worth bearing in mind:

  • How far can you subdivide and keep meaningful results?  While the film industry may feel that there are a large number of movies released each year, on a data science scale it’s extremely slim pickings. The more we subdivide our categories (i.e. ‘comedies costing between $2m and $5m released in 1999 starring a dog named Skip’) the smaller the number of films in each cohort.  Small sample sizes make it more likely that randomness or other factors will pollute the result and lead to extreme results.  For example, if only two such films a year are released, then the correlation values for each year are going to be either minus one or positive one – the two ends of the spectrum – even if the films have similar budgets and/or box office gross.  Despite the extreme result, it’s pretty much meaningless.
  • Reliability of budget data. The publicly available figure should be regarded as a rough ballpark, rather than a precise number for a bunch of reasons.  We can’t trace the original source of the figure, we don’t know if they are including soft money/rebates, the filmmakers may not be telling the truth, etc.  A while ago, I gained access to the full, real costs and income of 29 Hollywood movies budgeted over $100m and so was able to compare their true cost with the figure stated on Wikipedia.  I found that on average these movies cost 12.5% more than their Wikipedia entry stated.  I don’t know if this pattern is reflected with lower budget movies.
  • Correlation is not causation.  All we are seeing today is the fact that two figures are correlated – not what’s causing it and why.  It’s possible that:
    • Budgets lead box office, because the public wants to see big films and so pay more and/or go more often to more expensive fare; or
    • Box office leads budget, because the industry makes films in response to market demand, thereby avoiding films too costly for their likely returns; or
    • Other factors are leading both budget and box office, such as the level of confidence the industry has in a project, which raises both the budget and the marketing spend when it reaching cinemas.
  • Genre is complicated. The genres come via IMDb, where films are permitted to have up to three genres.  I appreciate that the IMDb genre model leaves a lot to be desired (i.e. over-classifying projects as ‘Drama’, stating ‘Animation’ as a genre rather than a production method, etc) but I fear that unpicking this will have to wait for a future research project!