Prior information . . . about the likelihood

I read this story by Adrian Chen on Gawker (yeah, yeah, so sue me):

Why That ‘NASA Discovers Alien Life’ Story Is Bullshit

Fox News has a super-exciting article today: “Exclusive: NASA Scientist claims Evidence of Alien Life on Meteorite.” OMG, aliens exist! Except this NASA scientist has been claiming to have evidence of alien life on meteorites for years.

Chen continues with a quote from the Fox News item:

[NASA scientist Richard B. Hoover] gave FoxNews.com early access to the out-of-this-world research, published late Friday evening in the March edition of the Journal of Cosmology. In it, Hoover describes the latest findings in his study of an extremely rare class of meteorites, called CI1 carbonaceous chondrites — only nine such meteorites are known to exist on Earth. . . .

The bad news is that Hoover reported this same sort of finding in various low-rent venues for several years. Replication, huh? Chen also helpfully points us to the website of the Journal of Cosmology, which boasts that it “now receives over 500,000 Hits per month, which makes the Journal of Cosmology among the top online science journals.”

So where’s the statistics?

OK, fine. So far, so boring. Maybe it’s even a bit mean to make fun of this crank scientist guy. The guy is apparently an excellent instrumentation engineer who is subject to wishful thinking. Put this together with tabloid journalism and you can get as many silly headlines as you’d like.

The statistics connection is that Hoover has repeatedly made similar unverified claims in this area, which suggests that this current work is not to be trusted. When someone has a track record of mistakes (or, at best, claims not backed up by an additional evidence), this is bad news.

In this case, we have prior information. But not the usual form of prior information on the parameter of interest, theta. Rather, we have prior information on the likelihood–that is, on the model that relates the data to the underlying phenomenon of interest. Any data comes with an implicit model of measurement error–the idea that what you’re measuring isn’t quite what you want to be measuring. Once we see a researcher or research group having a track record of strong but unsupported claims, this gives us some information about that measurement-error model.

In this particular example, I can’t see a good way of quantifying this prior information or this measurement-error model (or any reason to do so). More generally, though, I think there’s an important point here. People spend so much time arguing about their priors but often there are big problems with the likelihood. (Consider, for example, the recent ESP studies.)

7 thoughts on “Prior information . . . about the likelihood

  1. I love the 500,000 hits a month claim, I wonder if that was worked out before or after the 14 million hits they say they have had in March.

    On your point about prior informtion on the likelihood. Would the same apply to psychological tests or scales whose use has been shown to be inappropriate in certain instances? Such as test x being unreliable or biased when used in context y.

  2. Are we talking about Chicken Little/Boy Who Cried Wolf Here?

    There is discussion about Bayesian refereeing over at Sabermetric Research.

    At the Sloan Sports Analytics Conference, I asked an NFL ref whether he employs a Bayesian approach on known miscreants or superstars. He said that the actual call was based simply on what he saw, but he did admit to allocating his focus based on the expectation that there might be a hold (he was the ref in charge of watching the offensive linemen).

  3. Isn't it a prior probability on the model as a whole? That is, the whole hypothesis that these things be treated as evidence of life? I guess that is not different than saying it is a prior probability on the likelihood, but a model or hypothesis is more than just the likelihood. (That could work for or against me, I guess.)

  4. David:

    No, not a prior probability on the model as a whole. See my article with Shalizi on why I don't like that. In short, my prior probability on just about any model is exactly zero.

    I'm not saying that I have a low prior on the likelihood, that my Pr(likelihood) is 0.001 or whatever. What I'm saying is that I have a prior on one of the parameters in the likelihood, and my prior is that there is substantial measurement error somewhere in there.

  5. Parinella:

    You bring up a point which I've raised on occasion (probably somewhere on this blog, in fact): the tension between providing information and providing an assessment.

    Your example of refereeing is mirrored in grading. For example, suppose you have two students in your college calculus class: Al and Ben each get 50 on their midterm, 90 on homeworks, and 80 on the final. So they get the same grade, right?

    But here's one more piece of information: Al got 800 on his math SAT and Ben got 600. Now, what should their grades be? If the goal is to summarize performance in the calc class, no change. If the goal is to estimate who is better at Calculus, then just about all reasonable models will say that Al should get a higher grade.

    If you really want to complicate things, you can also add that one of these students is white and one is black. But I won't go there.

    Anyway, the point is that ratings have multiple purposes.

    P.S. Regarding boy who cried wolf etc., my point is that any model has (or should have) a measurement error term. In this case, this dude keeps finding things that no one else does. This gives me some information suggesting to me that his measurement error is bigger than he thinks it is. From a statistical perspective, this point is important to me because people don't always realize that prior information can be informative about the "likelihood" (that is, the model that relates underlying truth to data).

  6. I think your comments re: prior information about the likelihood are nicely illustrated by Ed Jaynes in his book "Probability Theory: The Logic of Science" (chapter 5). He shows an example where two people (consistent reasoners), Mr. A and Mr. B, each hold different prior opinions about the veracity of Mr. N – essentially, different likelihoods. When Mr. N presents data, Mr. A and Mr. B use their different likelihoods on the common data to reach different conclusions. Moreover, if their likelihoods differ enough, all additional data from Mr. N leads to divergent posterior opinions. Interesting, que no?

    I have not done justice to Dr. Jaynes arguments. However, this seems (to me at least) to be a relevant framework for evaluating Hoover's claims.

Comments are closed.