Predicting the Oscars with uberVU 2 comments
On the 27th of February 2011 the 83rd Academy Awards took place. As with all the previous years there have been innumerable speculations of who will win, experts predicting, search engines like Yahoo and Google pounding their chest that their indexes can predict the winners, all kind of polls, from Webtrends to Yahoo. Some use the chatter on Twitter and other social networks, some their indexes and most failed miserably.
Google, who claimed that in the previous 3 years the buzz around the movies, as recorded by Google Trends, were a good indicator of the winner. They even made a special website, not available anymore, where you could compare the most likely winners. If you looked at the graphic below, the King’s Speech is dead last, with Black Swan marching towards victory.
So Google was wrong, Yahoo was wrong, pre-Oscar chatter was wrong, but is there something measurable that could predict the winners? Well, it turns out there is something,
Before going on a small warning is needed. Everything needs to be taken with a grain of salt. The jury that decides who gets what award is made out of humans with subjective opinions and with different qualification than that of the masses that generate all this data. So it’s normal, in a way, that what is popular is not viewed as the most award-worthy. If you go back and look at stats you will see that The Clash of the Titans, not a masterpiece for the ages, had more buzz than some of the excellent movies that were nominated. So take everything with a grain of salt and make your own judgments.
With the help of the tools the uverVU provides I was able to get sentiment data for 6 of the movies I thought were most likely to win. I also gathered the number of mentions. The aim was to see if the sentiments about a movie are able to predict which one will win. The categories I aimed at were Best Picture, Best Actor in Leading Role and Best Actress in Leading role. Because the same movies, more or less, were nominated for Best Sound or Best Costume, and I had no way to differentiate between the criteria, only these 3 categories were picked.
For these movies I calculated a simple index, that basically normalized the data in a range between 0 an 100 and the bigger the index the more positive the chatter was. The results are the following
| Film \ Date | 20-Feb | 21-Feb | 22-Feb | 23-Feb | 24-Feb | 25-Feb | 26-Feb | 27-Feb | AVG (ex. 27) |
| The King’s Speech | 76.50 | 77.55 | 78.10 | 77.05 | 76.65 | 75.35 | 77.15 | 91.95 | 76.91 |
| Back Swan | 67.45 | 68.40 | 68.40 | 69.15 | 69.05 | 71.90 | 72.10 | 81.10 | 69.49 |
| The Social Network | 75.80 | 76.45 | 77.40 | 77.30 | 76.45 | 75.95 | 75.80 | 90.55 | 76.45 |
| 127 Hours | 64.70 | 66.75 | 66.70 | 66.95 | 63.85 | 56.45 | 55.90 | 70.00 | 63.04 |
| Inception | 79.10 | 66.85 | 67.25 | 68.45 | 67.70 | 67.30 | 66.20 | 78.55 | 68.98 |
| True Grit | 76.25 | 76.60 | 75.35 | 73.35 | 76.30 | 77.90 | 76.45 | 88.15 | 76.03 |
The table gives the sentiment index for these dates and the last column is the average, without the 27th, when everything sky-rocketed for some of the movies. So this data, which you can get and calculate, predicted the winner, granted with the smallest of margins on The Social Network and True Grit, but it still did.
For best Actor the story is the same. The data is fro the period 14-26th February.
| Best Actor Leading | Positive | Neutral | Negative | Index |
| Javier Bardem | 94 | 1.8 | 4.2 | 94.9 |
| Jeff Bridges | 85.3 | 10.8 | 3.9 | 90.7 |
| Jesse Eisenberg | 88.1 | 5.3 | 6.6 | 90.75 |
| Colin Firth | 94.7 | 3.8 | 1.5 | 96.6 |
| James Franco | 81 | 17.3 | 1.7 | 89.65 |
And for Bes Actress… not so much, Natalie Portman being the last. Arguably because of her weird role, but still, the data did not prevail.
| Best Actress Leading | Positive | Neutral | Negative | Index |
| Annette Bening | 92.8 | 0 | 7.2 | 92.8 |
| Nicole Kidman | 99.5 | 0 | 0.5 | 99.5 |
| Jennifer Lawrence | 96.4 | 0.7 | 2.9 | 96.75 |
| Natalie Portman | 79.1 | 19.2 | 1.7 | 88.7 |
| Michelle Williams | 99.1 | 0.1 | 0.8 | 99.15 |
But still, with a little magic from uberVU and some sentiment analysis you had better chances to pick the winner by using this techniques than with going with the Google approach.
To Google’s defense, if you followed the actual number of mentioned on uberVU you would of gotten a similar result as with their Google Trends, no matter how you plotted those trend lines. It just wasn’t the year of mentions.
Hope you found the article interesting, if you have any questions drop me a line, and before I end I’ll write the story of how I came to do my little study.
A huge thanks to uberVU for giving me an account to play with the data for a project I’m working on!
So how come I ended up using uberVU for this and what’s the back story. In an attempt to figure out if there is any connection between the data that you can pick up from social networks and actual economic results I went back to a research that stirred up some attention a few months earlier. In a paper called “Twitter mood predicts the stock market”, Johan Bollen and his colleagues used sentiment analysis and Twitter data to improve existing algorithms for predicting the stock-market. In Bollen’s own words “We were pretty astonished that this actually worked … Including this mood information leads to higher accuracy”.
By using the same logic I thought to gather information about a field that is narrow enough to be easily filtered and popular enough to generate a ton of messages, so movies were chosen. I went and built my own little app that gathered messages from Twitter about 30 movies, and made use of a service called Tweet Sentiments to figure out if the tweets were positive, negative and neutral. By running the app for 2 months and gathering and analyzing I ended up with more than 850.000 messages that were classified based on their sentiments. Using a simple formula I computed the “Sentiments Index”, which shows on a value from 0 to 100 if the tweet is positive or negative, 100 being completely positive and 0 completely negative.
With this I went to IMDb and got all the ratings for these 30 movies, as I figured that they are a good indicator of how well received the movies are and then crawled various sites to find the box-office earnings. Armed with this, it was time for some simple correlations and surprise-surprise, there is a correlation between the sentiments and the ratings or box-office. The number of movies is small enough not to be extremely accurate, sentiment analysis is not very accurate and messages from Twitter are not always very meaningful, so the data could be better, but it was a start,
My own application gathered data that was somewhat inaccurate so I went and used uberVU, a company I greatly admire and gathered the same data, as they give a very nice breakdown of overall sentiment in positive, negative and neutral.
There data turned out to be more accurate than my little app could gather and I bet that my hosting provider was happy I stopped harassing the servers.
So now I had an idea that sentiments about movies have a connection to the ratings and the box office, a way to get it and the Oscars were coming up. So, why not gather data about those movies and see if the winners come on top. As you saw, my results were more accurate that the more elaborate attempts.

