The other day I happened to come across this paper that I wrote with Don Rubin in 1995. I really like it—it’s so judicious and mature, I can’t believe I wrote it over 20 years ago!
Let this be a lesson to all of you that it’s possible to get somewhere by reasoning from first principles based on specific applications. This is not the only way to do research—I have a lot of respect for the “machine learning” approach of evaluating success on a corpus of data, or various more methodology-driven approaches—but I think much can be learned by studying how we kept pushing back and asked what were the goals of the statistical methods in question, and how we brought in theory where relevant (for example, in dismissing the claim that BIC is an approximation to the Bayes factor, and in dismissing the claim that BIC-chosen models have lower predictive error). When trying to understand and evaluate statistical methods, it’s often useful to step back and consider why they’re being used in the first place, partly to see to what extent the underlying goals are being achieved, and partly to understand how a method such as BIC can be useful in practice even if it doesn’t live up to the theoretical claims being made for it. One can be critical of specific claims while being generous in a larger sense.
In this particular paper, I think one breakthrough we made was to recognize that model selection was being used to solve two completely different problems: (1) Giving users permission to use a simple model that didn’t fit the data but explained most of the variance, in a large-sample-size setting, and (2) Giving users a simple way to regularize and do some sort of partial pooling, in a small-sample-size setting. Recognizing that these are two completely different problems gives us a way to think about moving forward on both of them, without overloading a crude tool such as BIC.
P.S. In rereading this paper that I absolutely love, I guess I can see how my colleagues at Berkeley didn’t want me around back then. They had no way of thinking about a paper such as this that wasn’t written in the Definition, Theorem, Proof format. It’s just a language thing. Even my colleagues such as David Donoho who, I assume, would have had the ability and maybe even the inclination to understand this work, wouldn’t’ve known what to make of it, because it was written in such a different style than their papers. Also it was published in a sociology journal so I guess it didn’t count. It as easy enough for them to just conclude that my work was no good and not try to figure out exactly what I was doing.
To be fair, I have difficulty reading papers written in the Definition, Theorem, Proof style: I usually have to ask the authors to explain to me what they are doing, or else I have to struggle through on my own in order to then re-interpret things in my own language, as here. Communication is difficult, and I guess it can be awkward to have someone around who speaks a different language. It’s funny that I went to Berkeley in the first place, but at the time I had the naive view that I could compel them to like me. I was able to read their work, after all (with effort), and I think I underrated the difficulty they would have in reading my work, or perhaps I overrated the value they would place on my work: had they thought it important, they would’ve put in the work to figure out what I was doing, but because they were told not to care about it, they didn’t bother. That’s all fine too, I suppose: had I really cared myself about explaining my ideas on model selection and averaging to the Definition, Theorem, Proof crowd, I could’ve written some papers in that format. But it wasn’t really necessary; they’re doing fine on their own path.