Avoiding model selection in Bayesian social research

The other day I happened to come across this paper that I wrote with Don Rubin in 1995. I really like it—it’s so judicious and mature, I can’t believe I wrote it over 20 years ago!

Let this be a lesson to all of you that it’s possible to get somewhere by reasoning from first principles based on specific applications. This is not the only way to do research—I have a lot of respect for the “machine learning” approach of evaluating success on a corpus of data, or various more methodology-driven approaches—but I think much can be learned by studying how we kept pushing back and asked what were the goals of the statistical methods in question, and how we brought in theory where relevant (for example, in dismissing the claim that BIC is an approximation to the Bayes factor, and in dismissing the claim that BIC-chosen models have lower predictive error). When trying to understand and evaluate statistical methods, it’s often useful to step back and consider why they’re being used in the first place, partly to see to what extent the underlying goals are being achieved, and partly to understand how a method such as BIC can be useful in practice even if it doesn’t live up to the theoretical claims being made for it. One can be critical of specific claims while being generous in a larger sense.

In this particular paper, I think one breakthrough we made was to recognize that model selection was being used to solve two completely different problems: (1) Giving users permission to use a simple model that didn’t fit the data but explained most of the variance, in a large-sample-size setting, and (2) Giving users a simple way to regularize and do some sort of partial pooling, in a small-sample-size setting. Recognizing that these are two completely different problems gives us a way to think about moving forward on both of them, without overloading a crude tool such as BIC.

P.S. In rereading this paper that I absolutely love, I guess I can see how my colleagues at Berkeley didn’t want me around back then. They had no way of thinking about a paper such as this that wasn’t written in the Definition, Theorem, Proof format. It’s just a language thing. Even my colleagues such as David Donoho who, I assume, would have had the ability and maybe even the inclination to understand this work, wouldn’t’ve known what to make of it, because it was written in such a different style than their papers. Also it was published in a sociology journal so I guess it didn’t count. It as easy enough for them to just conclude that my work was no good and not try to figure out exactly what I was doing.

To be fair, I have difficulty reading papers written in the Definition, Theorem, Proof style: I usually have to ask the authors to explain to me what they are doing, or else I have to struggle through on my own in order to then re-interpret things in my own language, as here. Communication is difficult, and I guess it can be awkward to have someone around who speaks a different language. It’s funny that I went to Berkeley in the first place, but at the time I had the naive view that I could compel them to like me. I was able to read their work, after all (with effort), and I think I underrated the difficulty they would have in reading my work, or perhaps I overrated the value they would place on my work: had they thought it important, they would’ve put in the work to figure out what I was doing, but because they were told not to care about it, they didn’t bother. That’s all fine too, I suppose: had I really cared myself about explaining my ideas on model selection and averaging to the Definition, Theorem, Proof crowd, I could’ve written some papers in that format. But it wasn’t really necessary; they’re doing fine on their own path.

15 thoughts on “Avoiding model selection in Bayesian social research

  1. You might appreciate this anecdote:
    When my (math) Ph.D. advisor handed back my draft of my Ph.D. thesis, he had only one comment: “Put more words between the theorems.”

  2. The piece makes great points that apply more generally to problems besides model selection. In general, statistical problems are decision problems of one sort or another, and for a decision problem to be well-specified, you need to make assumptions about the data generating process, develop a loss function, and either identify a prior distribution or frequentist desiderata. All of these judgments must be context-specific, and different statisticians will reasonably disagree about how to make these choices even in a specific context. The desire of non-statisticians to eliminate any sensitivity to context or judgment has led to a proliferation of conventions, rules of thumb, defaults, and multi-purpose tools that aren’t really optimal in any particular problem. The idea that there would be a single procedure for model selection, when the reasons for performing model selection vary from situation to situation, is not so different from the idea that p-values should always be less than some threshold, regardless of the relative importance of different types of errors or the relative plausibility of different hypotheses. You’ve often written that people need to learn to live with uncertainty, and to think continuously rather than discretely. I’d add that people need to learn to live with judgment calls, and accept that very few statistical problems can be properly addressed without careful consideration of all of the inputs required by the relevant decision problem.

  3. “Too Much Data, Model Selection, and the Example of the 3x3x16 Contingency Table with 113,556 Data Points” is one awesome section title.

    I also learn a lot more from a seminar about a paper than I do from reading the paper. And I think that much of that is because of the formal structure expected of papers – it becomes a kind of excuse to not think and write clearly about your work, because you can just shoe-horn it into a prefabbed structure and rely for support on the epistemic buttressing that provides. You don’t have to be convincing rhetorically when you can go through the formal motions of scientific argument. The format looks like Science, and so the paper looks like True.

    But mostly here, I admit to being frightened that you consider a lack of ability and desire to adhere to the formal structures and rhetorical standards of the discipline to be part of the explanation for why you didn’t get tenure. Frightened in the sense of what it portends for me – I’ve never been able to do it either. I remember rebelling against the 5 paragraph essay in middle-school, and I think I wrote my SAT essay on “Why this prompt is stupid” or some such thing – you know, because it was like “Which is right?” and I was like “Neither, but both can be useful.”

    That said, I think writing essays I’ll be proud of 20 years later is probably a good ambition. Better than writing papers I don’t like just because I think they will advance my career. Then again, I’m honestly not sure I’m capable of the latter – I’d just be too bored to even bother writing it.

    • If it’s any comfort, I’ve heard a bunch of academics rail against the 5 paragraph essay (which I thankfully had never encountered in my schooling). And I’d guess (at least hope!) your SAT essay got you high points in the critical thinking category.

      • Actually I got a perfect score on that essay. But probably because I got lucky and they asked me a question about epistemology. So I guess it must have been the GRE, not the SAT, because I didn’t study philosophy until college and apparently the SAT didn’t add writing until 2005, and that was long after I took it.

        The 5 paragraph essay is sort of like diagramming sentences – it can’t be the most effective way to learn discipline in writing, but at least it is a way, and I guess I’m glad I got that instead of nothing or free-for-all. I wonder what the alternatives are for teaching structured, logical writing flow. I guess I think it is helpful to have a structure, even if it is just a structure against which to rebel (theory: rebelling against something is a necessary condition towards originality, but then you have to overcome your rebelling to get somewhere really new – you know, Hegel).

        Someday soon I’m gonna follow up on the “teaching with a sub-optimal textbook” conversation we had. It went OK. I got pretty good reviews. I’m not super happy with the limitations it put on the course.

  4. Interesting. My only complaint about BDA(1,2,3) is that it has too many words. It takes so long to read each chapter. So I guess I’m more in the D/T/P camp. But I’m willing to put in the extra time it takes to wade through the words!

  5. Seems like you got your intuitive approach from the physics undergrad major, though (by your own statement) you didn’t have good physics information. If you’d had an undergraduate math major you may have been more inclined to follow the mathematical statistics route but maybe you wouldn’t have been as creative. My own feeling is that one needs to think of a problem in an intuitive sense but then to make certain one is correct to use the def/theorem/proof process.

  6. I think what I enjoyed most from the paper was the (continued?) insight about the pressing questions on why hypothesis testing, why discrete choices, why information criteria, and why evaluations on the parameter space and not the data space. That said, I personally prefer the write-up in GMS (1996) + the discussions!

    Also, funny coincidence: I was just reading your old blog posts on Occam’s razor and David MacKay! (http://statmodeling.stat.columbia.edu/2011/12/04/david-mackay-and-occams-razor/)

Leave a Reply

Your email address will not be published. Required fields are marked *