Many of you buy and rank books, movies on the web, you click on links, bookmark them, blog about them. By doing this, you are leaving traces behind. The traces are of great help to those who will find themselves in the same situation as you. Personalization technology tries to help you navigate the choices using the actions of people who were there before you, and with the the implicit (clicks or purchases you’ve made) or explicit (preferences you’ve expressed) knowledge about yourself.
Greg Linden’s blog is an excellent source of insightful posts on personalization technology. A while ago he posted a link to a collection of material from KDD about the Netflix Prize: a challenge where one has to predict how much you will like a particular movie based on your history of movies you’ve seen and based on others’ ratings of movies they’ve seen.
What’s notable is that some of the current competition leaders have written extensive papers about their approach. BellKor’s approach is quite simple and combines nearest-neighbor ideas with a more global factor model. On the other hand, Gravity employs a diverse collection of tools, including matrix factorization, neural networks, nearest neighbor models and clustering. The Gravity team provides an interesting picture of their factor model for movie Constantine:
They assume adjacent factors to be correlated, they infer this matrix purely from the ratings data, and they named some of the factors manually in the end. Compare their annotations with the hand-designed list of genres (Action / Drama / Fantasy / Horror / Thriller) or keywords (Androgyny / Twin / Black Cat / Christian Horror / Electrocution / …) assigned to the movie by human editors. Many of these keywords might rarely be relevant for determining whether the movie is worth seeing or not.
In recent weeks, the two currently leading groups (judging from the leaderboard) were formed by fusing the AT&T Labs group BellKor with another researcher into KorBell, and the Hungarian BME group with the Princeton into Gravity-Dinosaurs) have consolidated their efforts: as it is becoming clear that several models are better than just one. I am not sure either group used background knowledge that can be obtained by crawling the web or using other databases. At the moment, both groups are at around 8.4% improvement over baseline. They need 10% improvement over baseline to win the $1e6 grand prize, and a year ago the improvement was 4.5%.
This is just another piece of support for the Epicurus’ principle of multiple explanations (“Keep all hypotheses that are consistent with the facts!”, often used by proponents of algorithmic probability as a summary of a longer statement from his letter to Herodotus, “If then we think that an event could happen in one or other particular way out of several, we shall be as tranquil when we recognize that it actually comes about in more ways than one as if we knew that it happens in this particular way.”) leads to superior practical results. This corresponds to the Bayesian practice of integrating over all possible parameter settings (but assigning them weights in proportion to their predictive power – likelihood – and in proportion to our a priori trust in them – prior) instead of picking just the best one. To the famous statement “All models are wrong, but some are useful.” we should add “but use them all!”
[Correction 10/12:] Chris Volinsky kindly corrected my statement that BellKor combined their predictions with others groups (although other groups offered this). They did not do that, and they argue against this approach. They did, however, combine researchers Bell, Koren with researcher Volinsky into the KorBell approach.