Here are some examples of good, solid, reasonable statistical advice which can lead people astray.
Good advice: Statistical significance is not the same as practical significance.
How it can mislead: People get the impression that a statistically significant result is more impressive if it’s larger in magnitude.
Why it’s misleading: See this classic example where Carl Morris presents three different hypothetical results, all of which are statistically significant at the 5% level but with much different estimated effect sizes. In this example, the strongest evidence comes from the smallest estimate, while the result with the largest estimate gives the weakest evidence.
Good advice: Warnings against p-hacking, cherry-picking, file-drawer effects, etc.
How it can mislead: People get the impression that various forms of cheating represent the main threat to the validity of p-values.
Why it’s misleading: A researcher who doesn’t cheat can then think that his or her p-values have no problems. They don’t understand about the garden of forking paths.
Good advice: Use Bayesian inference and you’ll automatically get probabilistic uncertainty statements.
How it can mislead: Sometimes the associated uncertainty statements can be unreasonable.
Why it’s misleading: Consider my new favorite example, y ~ N(theta, 1), uniform prior on theta, and you observe y=1. The point estimate of theta is 1, which is what it is, and the posterior distribution for theta is N(1,1), which isn’t so unreasonable as a data summary, but then you can also get probability statements such as Pr(theta>0|y) = .84, which seems a bit strong, the idea that you’d be willing to lay down a 5:1 bet based on data that are consistent with pure noise.
Good advice: If an estimate is less than 2 standard errors away from zero, treat it as provisional.
How it can mislead: People mistakenly assume the converse, that if an estimate is more than 2 standard errors away from zero, that it should be essentially taken as true.
Why it’s misleading: First, because estimates that are 2 standard errors from zero are easily obtained just by chance, especially in a garden-of-forking paths setting. Second, because even with no forking paths, publication bias leads to the statistical significance filter: if you only report estimates that are statistically significant, you’ll systematically overestimate effect sizes.
Maybe you could supply additional examples of good statistical advice that can get people in trouble? I think this is a big deal.
P.S. I just added example 4 above.