In an email exchange regarding the difficulty many researchers have in engaging with statistical criticism (see here for a recent example), a colleague of mine opined:
Nowadays, promotion requires more publications, and in an academic environment, researchers are asked to do more than they can. So many researchers just work like workers in a product line without critical thinking. Quality becomes a tradeoff of quantity.
I think that many (maybe not all) researchers are interested in critical thinking, but they don’t always have a good framework for integrating critical thinking into their research. Criticism is, if anything, too easy: once you’ve criticized, what do you do about it (short of “50 shades of gray” self-replication, which really is a lot of work)? One thing I like about hierarchical modeling is that is not just about criticism. It’s a way to improve inferences, not just a way to adjust p-values.
The point is that in this way criticism can be a step forward.
When we go through the literature (or even all the papers by a particular author) and list all the different data-coding, data-exclusion, and data-analysis rules that were done (see comment thread from above link for a long list of examples of data excluded or included, outcomes treated separately or averaged, variables controlled for or not, different p-value thresholds, etc.), it’s not just about listing multiple comparisons and criticizing p-values (which ultimately only gets you so far, because even correct p-values bear only a very indirect relation to any inferences of interest), it’s also about learning more from data, constructing a fuller model that includes all the possibilities corresponding to the different theories. Or even just recognizing that a particular dataset with a particular small sample and noisy, variable measurements, is too weak to learn what you want to learn. That can good to know too: if it’s a topic you really care about, you can devote some effort to more careful measurement, or at least know the limitations of your data. All good—the point is to make the link to reality rather than to try to compute some correct p-value, which has little to do with anything.