Erikson Kaszubowski writes:
I have recently read an article by D. R. Cox (from 1977!) on significance testing where he discusses modification of analysis in the light of data. I don’t know if the article is well known, but I didn’t see it in the references of the garden of forking paths article. His argument is that relevant changes call for a separation in exploratory and confirmatory analyses, but in some cases it should not be such a big problem. The first comment, by Prof. Spjotvoll, is not so optimistic: he mentions a social science or genetics researcher actively searching for significant hypotheses and always finding (and publishing) something “interesting” in the statistical sense. Under such setting, no conclusion is possible based on p-values.
I took a look and asked: Does he mention forking paths? I see him mentioning multiple tests on the same dataset, but I think there’s a key concern, not well understood, that even if only a single test is done on data, that the test can be contingent on data, thus invalidating the p-value without any apparent p-hacking or multiple testing.
It’s a “multiple potential comparisons” problem.
In Section 3, “Modification of analysis in the light of data”, D.R. Cox talks exactly about changing the method of analysis due to the data as something different than multiple tests on the same dataset. He claims, after enumerating four possible scenarios, that “[…] in extreme cases, choice of a null hypothesis in the light of the data makes irrelevant the hypothetical physical interpretation of the significante level” (p. 56, emphasis mine). This “hypothetical physical interpretation of the significance level” is a really strange way to say that “p_obs is the probability that H0 would be ‘rejected’ when true” (p. 50).
In the first scenario, when the whole formulation comes from exploratory works, Cox then mentions that when a specific aspect of a “haphazard dataset” is tested for significance after exploratory work, it is sometimes possible to calculate an “allowance for selection”, that is, correct the p-value for the fact that the hypothesis was defined after seeing the data, but “often the sequence of exploratory analysis is ill-defined” (p. 57), which again invalidates the interpretation of the obtained p-value.
He doesn’t frame it exactly in terms of “multiple potential comparisons”, but he does make it clear that a p-value obtained from a sequence of exploratory analyses cannot be interpreted in its original intended way.
Indeed. Cox doesn’t emphasize the point but it’s there.