“Does reducing the heterogeneity of experimental units strengthen causal claims? Or does reducing the heterogeneity without randomizing simply reduce the standard error of a biased estimator?”

and he concludes:

“In observational studies, reducing heterogeneity reduces both sampling variability and sensitivity to unobserved bias …. In contrast, increasing the sample size reduces sampling variability, which is, of course useful, but it does little to reduce concerns about unobserved bias.” ]]>

By increasing heterogeneity one increases BOTH precision and generalizability wrt estimating the average effect. This is because the standard error of the average effect equals (simple design) sqrt( (tau2 + sigma2/n)/K ), with N the sample size of one experiment, K the numer of experiments with different settings/manipulations of the same variable, and tau2 is heterogeneity of the effect size because of different settings/manipulations. As you can see from the formula, choosing only ONE setting can never reduce se to a value below tau. However, precision is more easily increased by increasing K. So, ONE experiment with N = 1,000,000 results in a less precise estimate than three studies with N = 100, even if heterogeneity is not that large… and you may check generalizability, which is of course impossible when having one experiment with N = 1,000,000.

From this perspective, the often heard statement “we need more powerful studies” is misplaced, and may better be replaced by “we need more studies explicitly varying settings/manipulations” to obtain more knowledgde about an effect and how it depends on other factors.

]]>This was all Paul’s work. He had a conclusion that made a lot of sense, but now I’m forgetting what his conclusion was, or where he wrote it up! I think it’s in one of his books.

]]>