My reply is that it reminds me a bit of what I wrote here. Or see here for the quick powerpoint version: The short story is that if you screen for statistical significance when estimating small effects, you will necessarily overestimate the magnitudes of effects, sometimes by a huge amount. I know that Dave Krantz has thought about this issue for awhile; it came up when Francis Tuerlinckx and I wrote our paper on Type S errors, ten years ago.
My current thinking is that most (almost all?) research studies of the sort described by Lehrer should be accompanied by retrospective power analyses, or informative Bayesian inferences. Either of these approaches–whether classical or Bayesian, the key is that they incorporate real prior information, just as is done in a classical prospective power analysis–would, I think, moderate the tendency to overestimate the magnitude of effects.
In answer to the question posed by the title of Lehrer’s article, my answer is Yes, there is something wrong with the scientific method, if this method is defined as running experiments and doing data analysis in a patternless way and then reporting, as true, results that pass a statistical significance threshold.
And corrections for multiple comparisons will not solve the problem: such adjustments merely shift the threshold without resolving the problem of overestimation of small effects.