Posted by Brian Wong on 10/22/14 9:30 AM
Today, we’ll take a last look at some movie data and apply predictive analytics to predict a big Hollywood actor’s earning power. The top grossing movie for the weekend of September 26th, 2014 was "The Equalizer," whose main star is Denzel Washington. Because he is a prolific actor (i.e. lots of data available), we chose him as our subject to see if this latest release will follow major trends for his releases.
We limited our analysis dataset to movies where Denzel had a starring role (i.e. was the first person listed in the credits on IMDB), and to movies that have been out of theaters long enough to obtain an accurate estimate of box office earnings. In the end, we selected 29 movies released between 1990 and 2013. We also looked at data gathered for the previous two posts, including the genre of the movie, movie budget, earnings, profit percentage, and total profit.
Before we performed any analysis, we first took a look at some summary statistics for our dataset; this step (commonly referred to as data validation) is extremely important, especially when dealing with third party data. The first table below shows us that Denzel Washington primarily releases his movies from late summer to winter, avoiding the spring and early summer months when larger action blockbusters are typically released. Although somewhat surprising given his star power, this is most likely a byproduct of the genre of movies he tends to star in (see second table), which are thrillers and dramas that happen to be Oscar-worthy or adult-themed.
Of Denzel’s 29 movies, only 19 were able to turn a profit. This initially seems strange for a big star like him; however, the thrillers and dramas he tends to star in are movies that are low-earning and high-risk. Once we completed the data validation step, we could finally build a predictive model to determine whether Denzel would produce blockbusters or duds in 2013-2014.
We compared how his drama and thriller movies did in the box office relative to movies released in 2013. Comparing all his movies to movies from 2013 would not make sense because the different stages of his career (ascent, peak commercial appeal, etc.) are not representative of his star power now and in 2013. Therefore, we want to predict Denzel’s earning power in 2013-2014, using only dramas and thrillers. Since the profit percentage is a heavily non-linear equation corresponding to career stages, we used a locally weighted regression for this problem. These non-linear models can be useful if there are no clear-cut seasonal trends in the data.
As you can see in the chart, the variance of this regression is high due to the small number of variables we used to try and capture the relationship between time and earning power. However, we can somewhat observe a resurrection in his career in the later years, and can use this to predict if his next film will do well, which is the case with our model. Comparing our prediction to actual box office outcomes allows us to see how accurate our model is (this is commonly referred to as model validation).
Our model predicts that Denzel's 2014 film releases will do well. According to our model validation and the box office success of "The Equalizer" this fall, that assessment is correct.
For the next series of posts, we’ll switch gears again and venture into the world of college football! We’ll try to predict playoff champions retroactively, given the new playoff format that was implemented this season. Will the previous results hold using the current format? Stay tuned to find out!