An overfitting tale from the trenches
In 2006, before Netflix (NFLX +2.22%) unveiled its digital video-streaming catalog as a free add-on feature to its red DVD mailer service, the company launched an ambitious data-mining competition. The Netflix Prize taught the company many lessons over the next three years -- but not exactly the ones it wanted to learn in the first place.
The reason for this unexpected outcome was, of course, overfitting.
Netflix wasn't blind to the overfitting risk, of course. Competitors had access to 100 million movie ratings of 17,000 movies provided by 480,000 Netflix subscribers. Another 3 million ratings were held in a separate list that the programmers never saw directly. This was the cross-validation data set where the computing models were tested and scored against a clean data set. The task at hand was to come up with a movie recommendation algorithm that could outperform Netflix's existing system by at least 10%.
The winning team, BellKor's Pragmatic Chaos, was a coalition of three elite performers, combining the machine learning approaches of more than 50 radically different analytic approaches. In second place, more than 30 teams combined under the self-descriptive name, "The Ensemble," with 48 sophisticated machine learning models under its belt.
Attempts involving just a few analytic approaches never stood a chance against these diverse giants. The two top teams tied the final scoring round with a 10.06% improvement over Netflix's own movie recommendations model. BellKor won by submitting its final entry 22 minutes before The Ensemble.
However, Netflix never adopted BellKor's recommendation system. As you might expect at this point, both BellKor and The Ensemble performed slightly worse in the validation round, proving that even the best machine learning systems couldn't deliver truly data-agnostic prediction models.
Instead, the company was happy to spend the $1 million prize money in return for many technical ideas, along with a demonstration of diversified analytic approaches crushing single-minded methods.
"You look at the cumulative hours, and you're getting PhDs for a dollar an hour," then-CEO Reed Hastings told The New York Times. But Netflix didn't exactly get a drop-in upgrade for its movie recommendation system, as it might have hoped.