My major pet peeve are rankings on various web sites. This picture of restaurant rankings should clarify the problem:

Shouldn’t the 4-review 5-star ranking be above the 1-review 5-star ranking? Four excellent reviews should increase our confidence in the quality more than a single review.

The mistake lies in using the frequentist mean instead of a Bayesian posterior mean for aggregating the reviews, which disregards the number of reviews. The Bayesian prior would inform us that all rankings tend to be centered at, say, 3. The reviews would then be able to push the estimate away from 3.

How to implement this? We can assume that the prior for rankings of restaurants is N(3,1) – a Normal distribution with the mean 3 and the variance of 1. Let us also assume that the rankings for a particular restaurant have a standard deviation of 1: under this assumption our prior is conjugate. Second, we have the average *y* of the *n* rankings. The posterior mean (see page 49 of the second edition of the BDA book) will then take the following form:

*μ*_{n} = [ 3 + n*y* ] / [ 1 + n ]

If we take the averages and the counts from the table above, compute *μ*_{n} and sort by it, we get the following table:

μ_{n} | y | n |
---|---|---|

4.6 | 5.0 | 4 |

4.5 | 5.0 | 3 |

4.3 | 4.5 | 9 |

4.3 | 4.5 | 5 |

4.1 | 4.5 | 3 |

4.0 | 5.0 | 1 |

4.0 | 5.0 | 1 |

4.0 | 5.0 | 1 |

4.0 | 4.5 | 2 |

Now, the 5-star rankings with a single review appear even below the 4.5-star rankings with multiple reviews. In a practical application, the

*μ*

_{n}column could be converted into a graphical representation.

Some websites do first sort by average rating and then by the number of reviews, but such a solution is problematic: a restaurant rated only with 4 and 5 will appear above a restaurant rated 99 times as 5 and 100 times as 4.

A less questionable conjugate prior would be the Normal-Gamma, but I won’t go into this here.

check out beeradvocate.com; their best-of rankings are bayesian:

———

The Best of BeerAdvocate (BA) lists are generated using statisical formulas that pull data from hundreds of thousands of user reviews. They are not hand-picked by any one person. The general formula uses a Bayesian estimate:

weighted rank (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C

where:

R = review average for the beer

v = number of reviews for the beer

m = minimum reviews required to be listed (currently 10)

C = the mean across the list (currently 2.5)

The formula normalizes scores, that is pulls (R) to the mean (C) if the number of reviews is not well above (m). So if a beer has only a few reviews above (m), its (WR) is decreased a little if it is above the mean (C), or increased a little if it is below the mean (C) in accordance with the normal distribution rule of statistics.

Currently, a beer must have 10 or > reviews to be included in any calculations. And (m) is calculated by averaging the number of reviews for beers that have 10 or > reviews within the list being viewed, while (C) is the mean (average) overall score for all beers that have or > reviews within the list.

———-

RottenTomatoes uses the same formula as BeerAdvocate to rank their "Worst of the Worst" films. Would be interesting to think about how that formula is derived.

NU, the formula corresponds to using the Gaussian model with known scale, and a Gaussian prior. The full expression is (s1 x R + n s2 C)/(s1 + n s2). Here, s1 is the prior scale (1/variance) and s2 is the posterior scale. Above, it has simplified for me because I assumed that s1 = s2 = 1.

So, the BeerAdvocate/RottenTomatoes formula corresponds to being Bayesian and making some assumptions about the variances.

I would love to see this on more review pages. I often end up sorting by # of reviews because the 'sort by rating' tends to be so useless with all the 1 review 5 star ratings on top.

Aleks,

Thanks, I'll try working through the derivation. Upon further investigation, it seems that this formula is used fairly often on Internet ranking sites, and many of them credit the Internet Movie Database (IMDB) as inspiring their use of it.

One question I have is, what motivates their choice for the "minimum reviews to be listed" criterion?

NU, after sufficiently many reviews, you have enough data to drown the prior. So, if you're not able to work with the prior, you can just wait until you have enough data.

Additionally, having more data also limits the problem of unfaithful reviews and irrelevant restaurants.

Alex,

Sorry, I wasn't clear. I can see why you might want to have a minimum reviews criterion, but how do you choose its

value?