One of my favorites, from 1995.

Don Rubin and I argue with Adrian Raftery. Here’s how we begin:

Raftery’s paper addresses two important problems in the statistical analysis of social science data: (1) choosing an appropriate model when so much data are available that standard P-values reject all parsimonious models; and (2) making estimates and predictions when there are not enough data available to fit the desired model using standard techniques.

For both problems, we agree with Raftery that classical frequentist methods fail and that Raftery’s suggested methods based on BIC can point in better directions. Nevertheless, we disagree with his solutions because, in principle, they are still directed off-target and only by serendipity manage to hit the target in special circumstances. Our primary criticisms of Raftery’s proposals are that (1) he promises the impossible: the selection of a model that is adequate for specific purposes without consideration of those purposes; and (2) he uses the same limited tool for model averaging as for model selection, thereby depriving himself of the benefits of the broad range of available Bayesian procedures.

Despite our criticisms, we applaud Raftery’s desire to improve practice by providing methods and computer programs for all to use and applying these methods to real problems. We believe that his paper makes a positive contribution to social science, by focusing on hard problems where standard methods can fail and exp sing failures of standard methods.

We follow up with sections on:

– “Too much data, model selection, and the example of the 3x3x16 contingency table with 113,556 data points”

– “How can BIC select a model that does not fit the data over one that does”

– “Not enough data, model averaging, and the example of regression with 15 explanatory variables and 47 data points.”

And here’s something we found on the web with Raftery’s original article, our discussion and other discussions, and Raftery’s reply. Enjoy.

Thanks, Andrew. Good stuff indeed. What was your response to Raftery’s rejoinder to your improper prior criticism that his derivation of BIC was a reasonably good (asymptotic) approximation to a valid prior?

Jonathan:

The error of the approximation is O(1) which makes it useless, not “reasonably good” at all.

That’s what I thought you’d say (and why I added the parenthetical). But if that’s right, doesn’t that explain how “worse” explanations can beat “better” ones. They can’t under a correctly calculated proper prior, but they might when the BIC is a poor approximation. Then it would remain to figure out the circumstances in which the approximation is likely to be poor… or good.

Jonathan:

When you write, “‘worse’ explanations can beat ‘better’ ones,” I think the key question is: What do you mean by “better” or “worse”? It’s not clear. Model A can be better than model B for some purposes and worse for others. That’s one point that Rubin and I made in our article: Raftery was trying to do the impossible, to have one measure that ranks models in terms of (i) posterior probability they are correct, (ii) predictive power, and (iii) applied desirability.

Actually, though, the posterior probability of a model is not always defined and, when it is defined, it can depend on aspects of the model that have not been well specified. A model with higher BIC will not necessarily have better predictive power: Raftery was flat-out wrong to claim that. Nonetheless, a simpler model can have advantages, but much of this depends on computational considerations as well as on how the model is displayed. If you display a complicated model using a long table of coefficients, it can be a mess, but if you use a canny graphical display that focuses on the important aspects, maybe it can work just fine. In any case, the point is that BIC and Bayes factors and predictive errors don’t “know” that a model is expensive to compute or difficult to summarize. So, for all these reasons, there’s no reason that the winners in (i), (ii), and (iii) should be the same. In that sense, Rubin and I are criticizing not just BIC but any method of model choice that claims to do all these three things at once. I’m similarly frustrated by claims of Occam’s razor etc. I think it’s more helpful to recognize that we have multiple goals when selecting models.

I’m thinking we should avoid Bayesian methods altogether. The major philosopher of statistics put it better than I can on her blog today: http://errorstatistics.com/2014/08/23/has-philosophical-superficiality-harmed-science/#comment-91666

“The action today is by the consumer of statistics who is increasingly refusing to fund “trust me” science wherein we are not allowed to know the method of data dredging, data ransacking, cherry picking, and optional stopping, simply because it violates someone’s favorite statistical philosophy (Bayesian or likelihood). Replication and responsibility–spanking new ideas!– turn on holding the Bayesians feet to the fire, and this requires knowing just how often they produce hunky dory models even if they’re wrong. The root of Bayesianism is a “gentleman’s” logic where the untutored masses are in no position to hold the “experts” accountable. It won’t stand.”

If it’s true that science students who were usually taught classical statistics exclusively, and whose papers use classical statistics exclusively, are refusing to divulge the details of their work because of Bayesian ideology, then that’s a travesty of science. Fortunately, there some who wont stand for this “gentleman’s” logic.

Anon:

I think you’re trolling but I’m not sure. In any case it is ridiculous to associate Bayesians with secrecy or hidden methods. The whole point of Bayes is that the assumptions are all open. Also, we discuss issues of stopping rules etc in BDA (in chapter 7 of the first two editions and chapter 8 of the third edition).

“Anonymous” sounds like a sock puppet for a certain person to me.

I take offense to that. I’m not a certain person. I’m a general person.

No person is a “general person”. I am a certain person, who has a name and can be looked up on the web.

Who are you and why do you not want your actual name to be known?

Troll.

I take offense to that too. My mother’s family are trolls. Why do you keep insulting my heritage? My name is Rufus P. Quimby for the record, and it’s not trollish to give a fair airing to important and well researched anti-Bayesian ideas.

This ‘Anonymous’ is a pretty consistent commenter around here, likely the new identity of someone who has been around for a while. Don’t think this counts as a ‘sock puppet’, especially as they are clearly mocking the linked author.

PS thanks for posting Raftery’s original article along with your response. Interesting stuff. Can’t comment further without reading in more detail though.

If it is a “new identity” of someone who has been around for a while but who decides to hide the previous identity, then it is a sock puppet.

Sock puppets are people who assume new identities on the web, often associated with people whose identities are known, so as to make comments that won’t be associated with the original identity.

http://en.wikipedia.org/wiki/Sockpuppet_(Internet)

What you describe is the definition of a sock puppet.

The key to sockpuppetry is that there is an attempt to deceive people into believing that the sockpuppet is a separate unrelated person. I don’t think that’s going on here.

I’m not sure why this person took down their website in fairly dramatic fashion, but the website that the original persona linked to IS gone, and so it doesn’t quite make sense to insist that this person should continue to use the associated pseudonym.

Clearly subjective beliefs and priors whose meaning is unclear to everyone caused the crises of reproducibility. Its not going to stand!

I would bet very very near 100% of p-values published by preclinical researchers (ever and everywhere, but lets say this year in Science and Nature) are either accompanied by insufficient information on the sample size (eg “n=3-6 for all experiments”) and/or are not accompanied by a description of the stopping rule used. Doesn’t this make them uninterpretable?