NY Times has a good article on the state of recommender systems: “If You Liked This, Sure to Love That “. This is a description of one of the problems:
But his progress had slowed to a crawl. [...] Bertoni says it’s partly because of “Napoleon Dynamite,” an indie comedy from 2004 that achieved cult status and went on to become extremely popular on Netflix. It is, Bertoni and others have discovered, maddeningly hard to determine how much people will like it. When Bertoni runs his algorithms on regular hits like “Lethal Weapon” or “Miss Congeniality” and tries to predict how any given Netflix user will rate them, he’s usually within eight-tenths of a star. But with films like “Napoleon Dynamite,” he’s off by an average of 1.2 stars.
The reason, Bertoni says, is that “Napoleon Dynamite” is very weird and very polarizing. [...] It’s the type of quirky entertainment that tends to be either loved or despised.
And here is the stunning conclusion by fortunately anonymous computer scientists:
Some computer scientists think the “Napoleon Dynamite” problem exposes a serious weakness of computers. They cannot anticipate the eccentric ways that real people actually decide to take a chance on a movie.
Actually, computers do quite a good job modeling probability distributions for those more eccentric and unpredictable of us. Yes, the humble probability distribution, the centuries-old staple of statisticians is enough to model eccentricity! The problem is that Netflix makes it hard to use sophisticated models the scoring function is the antiquated and not just pre-Bayesian but actually pre-probabilistic root mean squared error or RMSE. For all practical purposes, the square root in RMSE is a monotonic transformation that won’t affect the ranking of recommender models, and we can drop it outright.
So, if one looked at the distribution of ratings for Napoleon Dynamite on Amazon, it has high variance:
On the other hand, Lethal Weapon 4 ratings have lower variance:
If we use the average number of stars as the context-ignorant unpersonalized predictor (which I’ve discussed before), ND will give you mean squared pain of 3.8, and LW4 will give you the mean squared pain of 2.7. Now, your model might choose not to make recommendations with controversial movies – but this won’t help you on Netflix Prize – you’re forced to make errors even when you know you’re making them. (R)MSE is pre-probabilistic: it gives no advantage to a probabilistic model that’s aware of its own uncertainty.