Rotten all the way through

In a conversation with a journalist regarding bad research papers, I said, “I think there are journals and years where I would guess more than half the papers have essentially fatal errors.”

The journalist asked me where this estimate came from, and I replied:

I have no systematic statistics on this. My “more than half” estimate was based on my casual glance at some Psychological Science promotional material a few years ago; see slides 14-16 here.

I’m guessing that the N = 17, 57, 42, and 47 studies are nothing but noise mining. And, as I recall, the N = 222,924 study had some serious problems too.

When you hear about 38% of psychology studies replicating, that’s just because the replicators included some serious studies in their replications. Had they just tried to replicate the junk, I expect the replication rate would’ve been something closer to 5%.

14 thoughts on “Rotten all the way through

  1. Don’t forget that repeatability is a necessary but not even close to sufficient requirement to avoid performing fatally flawed science. The much more difficult task is linking theory to observation in a way that rules out (ie renders implausible) the vast majority of explanations.

    The replication crisis indicates areas of research like psych/medicine are so far off track it isn’t even worth theorizing about the observations. It is sometimes difficult to remember that there are still people involved who take themselves seriously (as opposed to just becoming resigned to pumping out papers “to survive”).

    • Good point. Repeatability is just the sign that the issue you’re studying is something that actually happens consistently. Like for example that the sun rises each day…. But as we know from the myriad myths about Apollo’s chariot and other sun gods… having a scientific explanation for a consistent phenomenon is totally a separate thing from whether that thing does in fact regularly occur.

      • I don’t see why explanation/theory is so important. If you know that the sun rises every day, and it occurs regularly, and you can exploit that fact in day-to-day society, then that’s good enough for most people’s purposes. You wouldn’t know *why* that is the case, and that theoretical knowledge could be useful to make future predictions. But we need to first know whether something is happening in the first place, before we bother to come up with an explanation. This blog repeatedly complained about researchers engaging in “story-telling” to explain and justify the existence of noise in the first place. Before we try to theorize, let’s first confirm our observations.

        Another problem I have with this statement is that even wrong theories could still be useful. The point of science is to make predictions, and if you are able to com up with a myriad myth about Apollo’s chariot that just so happens to make useful predictions that corresponds with actual observable data, then that wrong theory is much more useful than trying to find out the “truth” (whatever that is). Epicycles are wrong now, but they served their purpose for hundreds of years.

        • Whether or not explanation/theory is important depends on the object of the research. If prediction is indeed the goal, your argument makes sense. But if the goal is to understand in order to make changes (e.g., health interventions), then theory/explanation is important, since it (at least ideally) can guide what kind of actions to take to change a situation (e.g, prevent a disease).

        • While Martha (Smith), Andrew, and Anoneuiod made good points and convinced me of the value of theory, Andrew probably had the most wit in his answer, so +1 for him.

        • >”But we need to first know whether something is happening in the first place, before we bother to come up with an explanation.”

          Yes, a major problem right now is that most grad students in psych/medicine are pumping out reports of things happening that actually are not. Then they are stuck believing in false things, and people who would bother theorizing (as opposed to “testing” ideas with NHST pseudoscience) go do something else. It is an astounding scandal.

          >”if you are able to com up with a myriad myth about Apollo’s chariot that just so happens to make useful predictions that corresponds with actual observable data, then that wrong theory is much more useful than trying to find out the “truth” (whatever that is).”

          Think of an experimentun crucis: https://en.wikipedia.org/wiki/Experimentum_crucis. How else would you ascertain the plausibility of a theory other than comparing predictions to observations? I don’t see any dichotomy there. The key is that these predictions are precise and a priori rather than vague and post-hoc. Thankfully there is a little recognition of the latter (post hoc is worse than a priori) these days, however the much more important vague prediction issue is still pretty much ignored.

          >”Epicycles are wrong now, but they served their purpose for hundreds of years.”

          You could still use epicycles (eg ptolemeic system) to predict orbits if you wanted, it is just more complicated and less accurate than Newtonian mechanics. BTW, try to find anyone doing solar system modelling that actually uses what is supposed to be the correct theory (General Relativity), no one does that. Look at the code and you will find Newton and/or post-Newton (which afaict is kind of a bizarre logical mashup that isn’t sure whether gravity is instantaneous or not). Similarly I have heard no one really uses quantum mechanics when doing molecular dynamics simulations. These theories are just too computationally expensive to implement.

      • Somewhat related, I happened upon this the other week and it seemed on-point:

        “…both machine learning (ML) and econometrics (E) prominently feature prediction, one distinction being that ML tends to focus on non-causal prediction, whereas a significant part of E focuses on causal prediction. So they’re both focused on prediction, but there’s a non-causal vs. causal distinction. [Alternatively, as Dean Foster notes, you can think of both ML and E as focused on estimation, but with different estimands. ML tends to focus on estimating conditional expectations, whereas the causal part of E focuses on estimating partial derivatives.]”

        Link = http://fxdiebold.blogspot.com/2016/10/machine-learning-vs-econometrics-iii.html

        In particular, the distinction between estimating conditional expectations and partial derivatives is noteworthy.

    • I think you’re being way to harsh. It’s typically not feasible for a given observation to rule out a vast majority of explanations, but that doesn’t render the observation useless. If that observation is part of a larger set, the collective body of data can quite effectively rule out alternatives.

  2. This is a great place for anecdotes. Two. First, many years ago I was approached to “participate” in a diet study which essentially involved adding muffins (!) either containing or not containing some substance. No controls other than “Here are your muffins” and “what did you eat?” I said you might as well examine the turds for all the value you’ll get from what you think you’re tracking. And one of my favorites, a small “study” of about 30 infants in a hospital setting who either did or did not receive some homeopathic diarrhea remedy and which managed to “observe”, let alone find significance in, a minor apparent difference in the babies’ diarrhea symptoms. I remember thinking: did they weigh the poop? Did they code it by how much it reeks of ammonia? Did they stick a meter on mom to see how much milk she was producing? It was mind boggling but there it was.

    But more significantly, if you’re talking publication then you mean two things: career and contribution. Most publication at least seems to be for career reasons, though some may argue with that, and while everyone wants to contribute to the march of science we’re more likely to oversell what we’ve done both because that advances the career and because we want to believe we’ve contributed. I see no way to untangle motives from what we do. In fact, I often think that we should take any study and assign it a prior for motive, with that standing for and breaking down into believability of the choices made in analysis and believability of claims made especially in relation to design. I think that prior would toss most work to the wrong side of the desired p value. So if you’ve asserted x, then we automatically take it to mean “you’re actually asserting something in this range and only at this end are you even getting to something with potential real meaning and effect.” But then that’s how we really read things anyway.

  3. For an interesting and vitriolic look at this type of thing see at http://www.thegwpf.org/content/uploads/2016/10/PeerReview.pdf

    The author is a raving anti-climate change nutter, the report is written for a notorious climate change denial organization and I think she is out to prove that NASA has been fabricating data for the last 200 years (or am I thinking of Senator Malcom Roberts on the fake data front?) but she has gathered a of criticisms of the publication and peer review process.

    It is not clear to me if she understands how much of the system actually works but she has assembled some interesting information and links.

Leave a Reply to Andrew Cancel reply

Your email address will not be published. Required fields are marked *