Subtleties with measurement-error models for the evaluation of wacky claims

A few days ago I discussed the evaluation of somewhat-plausible claims that are somewhat supported by theory and somewhat supported by statistical evidence. One point I raised was that an implausibly large estimate of effect size can be cause for concern:

Uri Simonsohn (the author of the recent rebuttal of the name-choice article by Pelham et al.) argued that the implied effects were too large to be believed (just as I was arguing above regarding the July 4th study), which makes more plausible his claims that the results arise from methodological artifacts.

That calculation is straight Bayes: the distribution of systematic errors has much longer tails than the distribution of random errors, so the larger the estimated effect, the more likely it is to be a mistake. This little theoretical result is a bit annoying, because it is the larger effects that are the most interesting!”

Larry Bartels notes that my reasoning above is a bit incoherent:

I [Bartels] strongly agree with your bottom line that our main aim should be “understanding effect sizes on a real scale.” However, your paradoxical conclusion (“the larger the estimated effect, the more likely it is to be a mistake”) seems to distract attention from the effect size of primary interest-the magnitude of the “true” (causal) effect.

If the model you have in mind is b=c+d+e, where b is the estimated effect, c is the “true” (causal) effect, d is a “systematic error” (in your language), and e is a “random error,” your point seems to be that your posterior belief regarding the magnitude of the “systematic error,” E(d|b), is increasing in b. But the more important fact would seem to be that your posterior belief regarding the magnitude of the “true” (causal) effect, E(c|b), is also increasing in b (at least for plausible-seeming distributional assumptions).

Your prior uncertainty regarding the distributions of these various components will determine how much of the estimated effect you attribute to c and how much you attribute to d, and in the case of “wacky claims” you may indeed want to attribute most of it to d; nevertheless, it seems hard to see why a larger estimated effect should not increase your posterior estimate of the magnitude of the true causal effect, at least to some extent.

Conversely, your skeptical assessment of the flaws in the design of the July 4th study may very well lead you to believe that d>>0; but wouldn’t that same skepticism have been warranted (though it might not have been elicited) even if the estimated effect had happened to look more plausible (say, half as large or one-tenth as large)?

Focusing on whether a surprising empirical result is “a mistake” (whatever that means) seems to concede too much to the simple-minded is-there-an-effect-or-isn’t-there perspective, while obscuring your more fundamental interest in “understanding [true] effect sizes on a real scale.”

Larry’s got a point. I’ll have to think about this in the context of an example. Maybe a more correct statement would be that, given reasonable models for x, d, and e, if the estimate gets implausibly large, the estimate for x does not increase proportionally. I actually think there will be some (non-Gaussian) models for which, as y gets larger, E(x|y) can actually go back toward zero. But this will depend on the distributional form.

I agree that “how likely is it to be a mistake” is the wrong way to look at things. For example, in the July 4th study, there are a lot of sources of variation, only some of which are controlled for in the analysis that was presented. No analysis is perfect, so the “mistake” framing is generally not so helpful.

1 thought on “Subtleties with measurement-error models for the evaluation of wacky claims

Comments are closed.