Roahn Wynart asks:
Scenario: I collect a lot of data for a complex psychology experiment. I put all the raw data into a computer. I program the computer to do 100 statistical tests. I assign each statistical test to a key on my keyboard. However, I do NOT execute the statistical test. Each key will trigger the evaluation of a different statistical test. I push, say, the “B” key and I get a positive result at 98% confidence. I then stop and publish. I never push any other key.
Is there something wrong with that procedure?
1. Yes, there’s something wrong with this procedure, and the clear “something wrong” is the use of a p-value to decide whether to publish something. Even if your computer only has one key, so that your p-value is unequivocally kosher, it’s a mistake in my opinion to use statistical significance to decide what to publish. The problem is that if your signal-to-noise ratio is low, then any statistically significant estimate will be a big overestimate of the true effect, and it is likely to be in the wrong direction. This is discussed by Carlin and me in our recent paper in Perspectives on Psychological Science.
2. Is this a legitimate p-value? That’s a tough one. The easy answer is, if you choose which key to press after seeing the data then, no, in general this is not a legitimate p-value, for reasons discussed by Loken and me in our recent paper in American Scientist (the garden of forking paths). If you chooses the key completely at random, then, sure, I guess it’s an ok p-value, although this is a bit controversial in frequentist statistics as it depends on what is being conditioned on. Even then, though, I wouldn’t recommend the procedure because of point 1 above.