I had a brief email exchange with Jeff Leek regarding our recent discussions of replication, criticism, and the self-correcting process of science.
(1) I can see the problem with serious, evidence-based criticisms not being published in the same journal (and linked to) studies that are shown to be incorrect. I have been mostly seeing these sorts of things show up in blogs. But I’m not sure that is a bad thing. I think people read blogs more than they read the literature. I wonder if this means that blogs will eventually be a sort of “shadow literature”?
(2) I think there is a ton of bad literature out there, just like there is a ton of bad stuff on Google. If we focus too much on the bad stuff we will be paralyzed. I still manage to find good papers despite all the bad papers.
(3) I think one positive solution to this problem is to incentivize/publish referee reports and give people credit for a good referee report just like they get credit for a good paper. Then, hopefully the criticisms will be directly published with the paper, plus it will improve peer review.
A key decision point is what to do when we encounter bad research that gets publicity. Should we hype it up (the “Psychological Science” strategy), slam it (which is often what I do), ignore it (Jeff’s suggestion), or do further research to contextualize it (as Dan Kahan sometimes does)?
OK, I’m not planning to take that last option any time soon: research requires work, and I have enough work to do already. And we’re not in the business of hype here (unless the topic is Stan). So let’s talk about the other two options: slamming bad research or ignoring it. Slamming can be fun but it can carry an unpleasant whiff of vigilantism. So maybe ignoring the bad stuff is the better option. As I wrote earlier:
Ultimately, though, I don’t know if the approach of “the critics” (including myself) is the right one. What if, every time someone pointed me to a bad paper, I were to just ignore it and instead post on something good? Maybe that would be better. The good news blog, just like the happy newspaper that only prints stories of firemen who rescue cats stuck in trees and cures for cancer. But . . . the only trouble is that newspapers, even serious newspapers, can have low standards for reporting “cures for cancer” etc. For example, here’s the Washington Post and here’s the New York Times. Unfortunately, these major news organizations seem often to follow the “if it’s published in a top journal, it must be correct” rule.
Still and all, maybe it would be best for me, Ivan Oransky, Uri Simonsohn, and all the rest of us to just turn the other cheek, ignore the bad stuff and just resolutely focus on good news. It would be a reasonable choice, I think, and I would fully respect someone who were to blog just on stuff that he or she likes.
Why, then, do I spend time criticizing research mistakes and misconduct, given that it could even be counterproductive by drawing attention to sorry efforts that otherwise might be more quickly forgotten?
The easiest answer is education. When certain mistakes are made over and over, I can make a contribution by naming, exploring, and understanding the error (as in this famous example or, indeed, many of the items on the lexicon).
Beyond this, exploring errors can be a useful research direction. For example, our criticism in 2007 of the notorious beauty-and-sex-ratio study led in 2009 to a more general exploration of the issue of statistical significance, which in turn led to a currently-in-the-revise-and-resubmit-stage article on a new approach to design analysis.
Similarly, the anti-plagiarism rants of Thomas Basbøll and myself led to a paper on the connection between plagiarism and ideas of statistical evidence, and another paper storytelling as model checking. So, for me, criticism can open doors to new research.
But it’s not just about research
One more thing, and it’s a biggie. People talk about the self-correcting nature of the scientific process. But this self-correction only happens if people do the correction. And, in the meantime, bad ideas can have consequences.
The most extreme example was the infamous Excel error by Reinhardt and Rogoff, which may well have influenced government macroeconomic policy. In a culture of open data and open criticism, the problem might well have been caught. Recall that the paper was published in 2009, its errors came to light in 2013, but as early as 2010, Dean Baker was publicly asking for the data.
Scientific errors and misrepresentations can also have indirect influences. Consider …, where Stephen Jay Gould notoriously… And evolutionary psychology continues to be a fertile area for pseudoscience. Just the other day, Tyler Cowen posted, on a paper called “Money, Status, and the Ovulatory Cycle,” which he labeled as the “politically incorrect paper of the month.”
The trouble is that the first two authors are Kristina Durante, Vladas Griskevicius, and I can’t really believe anything that comes out of that research team, given they earlier published the ridiculous claim that among women in relationships, 40% in the ovulation period supported Romney, compared to 23% in the non-fertile part of their cycle. (For more on this issue, see section 5 of this paper.)
Does publication and publicity of ridiculous research cause problems (besides wasting researchers’ time)? Maybe so. Two malign effects that I can certainly imagine coming from this sort of work are (a) a reinforcing of gender stereotypes, and (b) a cynical attitude about voting and political participation. Some stereotypes reflect reality, I’m sure of that—and I’m with Steven Pinker on not wanting to stop people from working in controversial areas. But I don’t think anything is gained from the sort of noise-mining that allows researchers to find whatever they want. At this point we as statisticians can contribute usefully be stepping in and saying: Hey, this stuff is bogus! There ain’t no 24% vote swings. If you think it’s important to demonstrate that people are affected in unexpected ways by hormones, then fine, do it. But do some actual scientific research. Finding “p less than 0.05” patterns in a non representative between-subjects study doesn’t cut it, if your goal is to estimate within-person effects.
What about meeeeeeeee?
Should I be spending time on this? That’s another question. All sorts of things are worth doing by somebody but not necessarily by me. Maybe I’d be doing more for humanity by working on Stan, or studying public opinion trends in more detail, or working harder on pharmacokinetic modeling, or figuring out survey weighting, or go into cancer research. Ir maybe I should chuck it all and do direct services with poor people, or get a million-dollar job, make a ton of money, and then give it all away. Lots of possibilities. For this, all I can say is that these little investigations can be interesting and fruitful for my general understanding of statistics (see the items under the heading “Why then” above). But, sure, too much criticism would be too much.
“Bumblers and pointers”
A few months ago after I published an article criticizing some low-quality published research, I received the following email:
There are two kinds of people in science: bumblers and pointers. Bumblers are the people who get up every morning and make mistakes, trying to find truth but mainly tripping over their own feet, occasionally getting it right but typically getting it wrong. Pointers are the people who stand on the sidelines, point at them, and say “You bumbled, you bumbled.” These are our only choices in life.
The sad thing is, this email came from a psychology professor! Pretty sad to think that he thought those were our two choices in life. I hope he doesn’t teach this to his students. I like to do both, indeed at the same time: When I do research (“bumble”), I aim criticism at myself, poking holes in everything I do (“pointing”). And when I criticize (“pointing”), I do so in the spirit of trying to find truth (“bumbling”).
If you’re a researcher and think you can do only one or the other of these two things, you’re really missing out.