Movies You'll Love
A Berkeley graduate student takes on the Netflix challenge (view PDF)
By Daniel Gillick

In September 2006, Netflix issued a computer programming challenge: improve on their individualized recommendation system by 10 percent and win one million dollars. Some 25,000 teams from 167 countries downloaded the Netflix data and began work on the problem. One year later, the night before the "progress prize" deadline, UC Berkeley graduate student Lester Mackey and two friends from his undergraduate days at Princeton found their team, Dinosaur Planet, in second place.

Netflix distributes 1.6 million DVDs per day to some seven million customers throughout the United States. To help users select from their collection of 90,000 titles, the website provides personalized recommendations based on the users' ratings of movies they have seen. To comprehend the vastness of the recommendation problem, imagine a giant table where each user is a column and each movie a row. Some of the entries are filled in with values from one to five. These are the star ratings left by users. Most of the entries are empty. Netflix would like to predict these values.

Netflix reports that 60 percent of customers select movies based on automatic recommendations, suggesting that there is significant value in improving the system. Amazon, the online retailing pioneer, also uses the five-star rating system and provides recommendations based on previous purchases. With the wealth of information collected on the Internet regarding all sorts of products, the need for organization is growing, and the related computer science fields—data mining and machine learning—have never been livelier. Mackey says that part of his team's motivation came from the one million dollar prize, but "the biggest draw was the challenge and excitement of tackling an open problem in a field largely unknown to us."

The dataset that Netflix provides for competitors to work with is free to download, and contains only a fraction of the total set of ratings collected since the company began shipping DVDs in 1999. "The three most interesting aspects of the data," says Mackey, "were the size—over 100 million ratings; the sparsity—only 1.1 percent of all possible user-movie combinations were released; and the variability—ten percent of users rated fewer than 16 movies. Meanwhile, two users each rated over 17,000 of the 17,770 possible movies." By way of comparison, the MovieLens data set, which has served as a standard for recommendation research, includes 1,000 times fewer ratings and is around six times less sparse. The Netflix data set therefore represents both significant computational challenges—in dealing with so much data—and algorithmic challenges—in inventing statistical techniques for estimating models with so much missing information.

Before choosing or designing any prediction algorithm, it can be helpful to collect a handful of surface statistics to humanize the massive data table. A group from AT&T Labs reports that the most-rated movie, Miss Congeniality, was rated by almost half of all users. The highest-rated movies, averaging 4.7 stars, were the three installments of The Lord of the Rings, followed by The Shawshank Redemption and The Empire Strikes Back. The most variable included Fahrenheit 9/11, Napoleon Dynamite, Lost in Translation, and most seasons of Friends.

Over the past year, team Dinosaur Planet has implemented a diverse array of algorithms, at times working nearly full-time on the problem. "One of the most intuitive is the K-nearest neighbors algorithm," Mackey explains. "If you want to predict how Alice will rate a movie, you can consider all users who already rated the movie and take the average rating of those users who are most similar to Alice. The similarity between Alice and another user is usually a function of the number of movies rated by each and the ratings given to those movies held in common." Alice's nearest neighbors help predict Alice's ratings. Neighborhood-based approaches are appealing both because of their simplicity and because they make it easy to explain the source of a recommendation to a user.

A second class of algorithms has the more holistic goal of discovering underlying "factors" that explain the ratings. For example, if we assume two factors, comedy and violence, then a movie can be represented as a pair of numbers indicating how comical and how violent it is, respectively. A movie like Rush Hour would have large values for both, whereas Braveheart would have a small value for comedy and large value for violence. Similarly, a user is represented by a pair of numbers indicating his or her preference for comedy and violence, respectively. The model says that the rating for a particular movie by a particular user is the product of their comedy values plus the product of their violence values. A statistical algorithm estimates the values for all movies and all users simultaneously. Then, these learned values can be used to predict unknown ratings. Such purely statistical techniques are powerful and work, at least in theory, by producing a compact representation of movies and users that captures the essence of rating preference.

Successful contest participants implemented many algorithms, developed careful data-normalization techniques, and found ways to merge many results into one, often discussing the intricacies of their research on the Netflix Prize Forum. According to Mackey, "a very supportive and collabo-rative culture evolved out of the Netflix Forum, which allows registered users to openly discuss the contest and any related topics." Participants from all over the world discuss "everything from how best to store and load the contest data to how to implement some of the most popular algorithms." Despite the prize money, "forum frequenters routinely share their progress, troubleshoot their code, and bounce ideas off of one another, all to the benefit of the machine learning community."

To participate in the contest, a team provides estimated ratings for a test-set of 20 million user-movie pairs. Netflix scores submissions by comparing the estimates with the true ratings, which they withhold from the participants, and computing an average error rate. The Netflix "Cinematch" system is off, on average, by 0.951 stars. To win the million dollar prize, you must achieve an average error of 0.856. Mackey's team error rate is currently 0.875. The contest stipulates that a progress prize of $50,000 will be awarded at the end of each year (until 2011) to the team in first place, as long as they have improved on the previous year's mark by at least one percent.

Mackey describes the last-second drama: "A week before the deadline, Dinosaur Planet was in second place on the leaderboard; Gravity, a team from Hungary, was in third; and BellKor, a team from AT&T Labs, was in first place with a comfortable lead. On September 30, the day before the final submission deadline, teams basho and Arek Paterek combined their predictions and shot past both Dinosaur Planet and Gravity into second place. A few hours later, Dinosaur Planet and Gravity made a combined submission as When Gravity and Dinosaurs Unite (WGDU) and jumped into first place. Only seconds later, BellKor submitted and achieved the same score as WGDU, leaving the two teams tied for first with 24 hours to go in the competition. My teammates and I got very little sleep that night as we wracked our brains for new techniques to add into our mix. We managed to code up and run a few new procedures and even to improve our score, but when the dust settled, BellKor came out on top."

So will anyone win the million dollar prize? Most participants are optimistic and submissions remain lively, even though there have been no major breakthroughs since the progress prize announcement. Of the contest as a whole, Mackey says it is "well designed and well executed." By way of recommendation, he adds that "the contest largely introduced me to machine learning and encouraged me to continue exploring the field in graduate school."

Daniel Gillick is a graduate student in electrical engineering and computer sciences.

Want to know more? Check out:
The Netflix Prize website: netflixprize.com
AT&T Research page: research.att.com/~volinsky/netflix/


Comments on this article? Drop us a line at with 'letter to the editor' in the subject!






Home | Read | Blog | Join us | About us
© 2009 Berkeley Science Review