A possible defense of cargo cult science?

Someone writes:

I’ve been a follower of your blog and your continual coverage of “cargo cult science”. Since this type of science tends to be more influential and common than the (idealized) non-“cargo cult” stuff, I’ve been trying to find ways of reassuring myself that this type of science isn’t a bad thing (because if it is a bad thing, then academia itself is entering a Dark Age that it’ll never recover from).

I suppose an alternative is to hope that “cargo cult science” diminish in size and influence, but I’m not an optimist, so I’ll take the rationalization approach.

On your blog, you previously mentioned the placebo effect that this type of research can cause. If power pose help people, then it’s a good thing, even if the underlining research is bunk. I’ve recently thought about another way by which junk science could be useful: cheap decision-making.

There’s an XKCD comic where Strategy A and Strategy B are considered, but the time spent finding the better strategy is way more than the time actually implementing either of the two strategies. It would make more sense if, say, we flip a coin and blindly follow whatever that coin says. It doesn’t really matter in the grand scheme of things.

Could junk science be seen as a way of saving people time as well? Instead of being paralyzed on the most efficient way of encourage voter turnout, just look up some junk science and follow whatever they recommend you to do.

Yeah, you could probably increase the effectiveness of your voter turnout operation if you instead followed best practices, but coming up with accurate results can be incredibly expensive. Plus you need to have a community of researchers also replicate the results as well, just to make sure that this is indeed the most efficient approach to increasing voter turnout. The costs keep piling up, while the benefits of picking the optimal strategy is fairly minimal.

The junk science’s recommendations are just cheaper and more scalable than doing things the “right way”…as long as you don’t accidentally pick an strategy that significantly reduces voter turnout.

But the odds of stumbling upon a bad strategy is probably low…and even if you do pick one that reduces turnout, that reduction may be slight and not really worth worrying about. The most important thing is not what decision to make, but that a decision is made at all, so that you can move onto the hard part of actually implementing the strategy and mobilizing voters. (And if you do realize the strategy is bad, you can always throw away that “cargo cult science” paper and find a new “cargo cult science” paper to follow.)

This argument in favor of “cargo cult science” starts to fall apart when you try to apply this to the medical field…but that’s where you use the placebo argument instead.

Is there a flaw in my argument that I’m missing? Have this argument been made before in the comment sections of your blog and then dismissed by others? Or am I missing the point of research (which is to find out facts about the world, and not just make decisions)? I’m honestly curious, and like your feedback.

My correspondent ends with a paradox:

P.P.S: I know I probably could have answered my question by looking for peer-reviewed studies on “cargo cult science”. However, I’m afraid that most of those studies may very well be examples of “cargo cult science”.

I have a couple thoughts on this. First, the discussion of coin-flip decisions reminds me of this recent paper by Steven Levitt, “Heads or Tails: The Impact of a Coin Toss on Major Life Decisions and Subsequent Happiness,” which presented evidence supporting the idea that in many settings people would be better off making decisions using coin flips.

Second, I’ve blogged a bit on various potential benefits from cargo cult science. Here are a few ideas:

– In “cargo cult science,” the researchers’ ideas are being tested in a useless, unscientific way. But maybe some of the ideas are good. That suggests a division of labor in which the people who promote the ideas be separated from the people who test the ideas and from the people who study and present the evidence.

– I’ve also put forth the argument that cargo cult science can be useful in shaking people up, in getting scientists and practitioners to think about alternative explanations. Lots of ideas might be valid in some way without being easy to measure and test, and if we decide only to pursue ideas with strong scientific evidence, we could be missing out. Ultimately I think the right way to resolve this issue is not through misinterpretation of data and subsequent hype, but through decision analysis: Just as pharmaceutical companies will pursue some low-probability leads because there is some probability they could make it big, so should science and practice allow for experimentation.

– That said, junk science has social costs. So, yes, it’s not a bad idea to come up with clever ways in which junk science can be a good thing, but on balance I’d prefer that people would stop doing it and stop enabling it.

46 thoughts on “A possible defense of cargo cult science?

  1. Reminds me of Phase I/II/III clinical trials. Some trials just say “in attempting to cure X with Y drug, you won’t kill yourself.”
    It’s useful, actually, as far as it goes.

  2. Aren’t efficiency of decision making and evidence-based decision making two different things? It may well be true that the decisions are right despite junk research (whether it be power pose, nutritional nudges, etc.). But what is the purpose of the research then? In fact, if we really know what a better decision is, then why pretend to collect evidence unless there is a chance it will show our belief to be wrong? And, how can it do that if it is poor research? If the evidence does not really matter at all (say, being more confident leads to better decisions and happier decision makers and that is known to be true with probability = 1), then why do research at all?

    I think the danger of junk science is, as you say, that it does have social costs. This means we may believe that good decisions can be made even based on junk research, but how would we ever know? Economists have a saying “anything worth doing is worth not doing well,” meaning that a cost-benefit analysis will show that perfection is not the desired goal (marginal benefits decline and marginal costs increase, so less than perfection is optimal decision making). This would mean that research need not – actually should not – strive for perfection. But that does not justify doing junk science. The key is to ensure that “important” qualities are retained while “unimportant” imperfections in research plans (spanning from measurement through analysis) are fine. Isn’t the goal of research to differentiate between what is important and what is not?

    If we accept that research need not be perfect but must capture the important aspects of a problem, then I don’t think junk science is defensible. Claiming that power pose is beneficial because we believe it is true, and we find evidence based on junk research that supports that belief, would not be justifiable. The best protection we can offer against erroneous claims that the research was not junk – that the imperfections were not important – is to have the data publicly available and to have healthy post-publication review.

    • Another thing here that seems implicit is that basically any choice will do. And that’s true for a lot of really trivial questions, but their triviality is exactly why we shouldn’t bother even doing research on them. What about questions like “what kind of environmental policy will best improve or maintain health of humans, other ecosystem components, and economic growth?” You can’t answer that with a little fake science to make you feel good and a coin flip.

      One of my biggest problems with academic engineering (and a major reason I didn’t continue in academia immediately after getting a PhD in Civil Engineering) is that many of the questions actually being studied are pointless crap. We have serious problems in the world, and people are getting big grants to pretend to figure out from smart-phone location data how to best plan 50 year water supply maintenance projects… or whatever

  3. It is certainly true in many areas that randomized comparative are largely hopeless, too expensive, take too long (as the world likely changes faster). However, in such situations decisions need to be based on background knowledge and cargo cult science pollutes that as well.

    • Anon:

      I clicked through the link and read through. I don’t see much content there. The author seems to be reacting to people who are telling him to use Bayesian methods. He doesn’t want to use Bayesian methods. That’s fine with me: he should use the methods that work for him.

      • > He doesn’t want to use Bayesian methods.
        Agree and perhaps more so in ML/deep learning a division of labor will advance the field faster.

        On the other hand, I think Geoff Hinton put it best sometime back in the 1990s with his comment “ML researchers have two choices, learn statistics or make fiends with a statistician”. I [Keith] do think it is important to grasp what other’s are doing, how they do it and what they actually achieve – even if you want to do something else. And Geoff added some humor to his comment by adding “I won’t say which is easier!”

    • That was strange read. Cargo cult science is stuff like testing a null hypothesis instead of your hypothesis and relying on peer review as a substitute for direct replication. You can see clearly how the proper procedure has been replaced by something that sounds kind of similar but can’t actually achieve the original purpose.

      Also, it is clear the inappropriate practices arise because people failed to understand why things were done a certain way. I.e., you can ask them “what is the purpose of a replication study, or what is a replication, or why should we test a hypothesis?” and they will respond with something strange.

      I don’t see what it is about bayes rule he considers “cargo culty”. What is the correct procedure he is proposing it is an analogue of? It should have been possible to summarize this in a single sentence at the beginning, as it is I was just left confused. I cant even get to the point of agreeing/disagreeing with what I just read.

      • The author was clearly biased – notice he called Bayes theorem a conjecture. Although I think his point was that, just like any other tool, Bayesian approaches often get misapplied or used beyond their realm of applicability. His contention is that Bayesian approaches aren’t really appropriate in Deep Learning. I don’t know enough about Deep Learning to agree or disagree

        • Isn’t Deep Learning just using neural networks to do computing the way people like Minsky did back in the 1960’s except with computer hardware that actually has a hope of tackling largish problems?

          Maybe there’s more to it, I’d seriously like to know.

        • You can find tons of comments like this if you look:

          Most of the literature suggests that a single layer neural network with a sufficient number of hidden neurons will provide a good approximation for most problems, and that adding a second or third layer yields little benefit.

          https://stats.stackexchange.com/questions/99828/when-is-a-second-hidden-layer-needed-in-feed-forward-neural-networks

          AFAICT (I was not partaking at the time), this is what happened:
          It was proven that one layer could approximate pretty much any function, so not much research went into checking many-layered (deep) networks until it became computationally cheap play around with them. Once that happened it was quickly figured out that while a single layer may be fine if you have infinite time on your hands for tuning and training, the multi-layered networks could learn many patterns much faster/easier.

          It looks like another case where the thing that has been mathematically proven is similar-sounding to, but not, what people actually want to know. Then confusion between the two concepts this becomes a big impediment to progress.

        • Sure, I guess it depends on what you mean be “more to it”.

          I guess modern text is just “a collection of symbols that represent sounds that transmit a message”, just like they did 3000 years ago. Since then there has only been advances in transmitting the message that are many orders of magnitude more efficient. Eg, technologies like vowels, spaces, punctuation, paragraphs, word processors, bibliographies, etc. The basic idea is unchanged though.

        • Well for example I’m talking about the fact that there is little difference between the Latin alphabet and the Cyrillic alphabet when compared with say Chinese pictograms as a different technology entirely

        • Well for example I’m talking about the fact that there is little difference between the Latin alphabet and the Cyrillic alphabet when compared with say Chinese pictograms as a different technology entirely

          I’d say single layer networks are like “scriptio_continua”, while the newer architectures are analogous to modern text with all the punctuation, etc.

        • Right, so this confirms my basic belief that there’s not any brand new ideas here, just some improved methods of using old ideas in a more efficient or more practical manner. A totally different technology would be something like say evolving Java Bytecode sequences to learn how to approximately compute things… or whatever.

    • I actually found that a somewhat interesting article and worth statisticians spending some time dwelling on. It may help stats readers to realize it’s not clear that the author is really going after *Bayesian* statistical practices, but rather general statistical practices; some translation by the stats reader is required.

      In particular, I think one idea that the author dances around is the fact that the majority of statistical theorems start with “Suppose Y follows some specific distribution. Then some estimator f(Y) is optimal because…”. The issue is that Y *never* follows that given distribution, so what is the point of being extremely precise about your certainty conditional on an assumption that is false with probability 1?

      • Note that this is exactly my main complaint about frequentist statistics. Bayesian statistics on the other hand isn’t claiming a frequency of occurrence, it’s claiming a credibility based on knowledge. So while frequentist results have the flavor of “if frequency of A is D then frequency of B will be Dprime (but by the way frequency of A is never D and so… yeah not helpful)” and Bayesian results have the flavor of “if I think A will be in the high probability range of distribution D then I should also think that B will be in the high probability range of Dprime” and this can clearly be a true statement about what you should think about B.

        • Two issues:

          (1) I think you’re confusing MLE-based methods with that of Frequentist statistics. MLE-based methods are a subset of Frequentist statistics, and I would argue that the non-MLE based methods tend to take a much different approach toward unknown distributional functions. At the very least, these types of methods are usually thought about in terms of “how would these data qualities (i.e. skew, etc) affect these estimators?” (think LOESS smoother with robustness weights, for example), which I think in some cases is more realistic than “if my data is from this model (which I know it isn’t), these estimates are optimal (or near optimal)”. The argument is of course that the data should be close to the model, so the estimates are close to optimal, but in general I don’t think I’ve seen any work on how to quantify “close”, because it’s really hard.

          I recognize that these approach are entirely disjoint: I’m sure there’s *some* way to justify a LOESS smoother with robustness weights as some form of MLE. But I see it as a difference in how do you start building the estimator to address observed data issues in the first place. I would say non-MLE based method is to recognize common issues in the data and then post-hoc think about how you could make your estimator more robust to those issues. The second is to build a full generative model that describes these issues. I understand that these two approaches each have their pluses and minuses.

          (2) In comparing MLE-based methods with Bayesian, I have a lot of trouble agreeing with the idea that you’re doing something better by modeling mistaken belief. In essence, you’re saying that a Bayesian approach is better because you’re modeling “what’s rational to believe given an improper view of how the world works”. Why is that better? I mean, theoretically, this would be really helpful to know from a game theory perspective: I really want to know what my adversary is incorrectly over-certain of…but in reality, it’s not actually going to be useful, i.e. it’s not like you know the exact Stan code they used to make their decision.

          I like Bayesian statistics, but not for these reasons. Personally, I think Bayesian statistics are great *because* of their frequentist properties. Just as an example, almost all machine learning methods use penalization and a lot of time it’s helpful to think of these penalties as priors. But as Gelman says, you can just take any Bayesian estimator, show that it has good frequentist properties, and now you’ve justified it as a Frequentist estimator. So there’s no point in getting up in arms that Frequentist methods in general are no good. I guess one way to illustrate things: if you create a Bayesian estimator but after awhile, you recognize that it has very poor frequentist properties, would you not abandon it? There’s just no reason you need to see the world as “Bayesian vs Frequentist”. It still all makes sense as “Bayesian and Frequentist”.

        • “I recognize that these approach are entirely disjoint”

          I recognize that these approaches are *not* entirely disjoint.

          When’s Wordpess gonna include an “edit comment” button?

    • When I got to the first reference to Bayesian inference, where he calls Bayes rule a “conjecture”, I scrolled up to see if this piece had been published on April 1st.

      > However Bayesian inference […] has no evolution mechanism of how knowledge changes given an initial prior.

      I would say that this is precisely the only thing that Bayesian inference provides: a mechanism to update your knowledge when you obtain new information. But it’s true that Bayesian inference will not tell you by itself what is the right “likelihood” or what is the right “prior” (or what is the “data”, for that matter).

      It’s not clear to me what’s his opinion on MacKay’s “Information Theory, Inference, and Learning Algorithms”, which he says should be required reading for every Deep Learning practitioner. The “Inference” in the title is Bayesian inference, though. Maybe he considers him one of those Bayesian cargo cultists, I’m not sure.

      While I don’t fully understand what is he arguing against, I have to say that as an avid reader of Jaynes I find some of his comments extremely funny:

      > Imagine if you tried to explain Thermodynamics using Bayes rule instead of doing what the Stat. Mech folks have done.

      > Bayesian inference isn’t a method that’s acceptable for a hard science like physics. Perhaps the hand wavy stuff is acceptable for soft sciences like Psychology.

      • While Jaynes makes a good and interesting attempt, his approach is not the ‘widely accepted standard’ in stat mechanics as far as I understand. Eg people like van Kampen and others have criticised his approach.

        • The Bayesian approach is not the widely accepted standard in statistics and that doesn’t make it wrong. I will look at van Kampen’s objections (do you have a more precise reference?).

          In any case, a Bayesian approach to statistical mechanics is easy to imagine (it was done many decades ago). And Bayesian inference is an acceptable method in physics (at least people are using it in nuclear physics, astronomy and other fields).

        • (I also quite like Joel Keizer’s book on statistical thermodynamics and he makes similar comments to van Kampen, from memory)

        • From what I can see using Google Books (but mybe I’m missing something), Keizer doesn’t make any mention to Jaynes and van Kempen has just a couple of references:

          (p. 22) The twentieth century saw a revival of Laplace’s idea, in particular in the work of Jaynes. He gave it the more modern formulation of the “principle of maximum entropy”, but it is again an attempt at replacing physics with a philosophical principle, and suffers from the same shortcomings.

          (p. 56) Or one resorts to antropomorphic explanations such as the observer’s ignorance of the precise macroscopic state. This last argument is particularly insidious because it’s half true. It is true that in many cases the observer is unable to see small rapid details, such as the motion of the individual molecules On the other hand, he knows the experimental fact that there exists a macroscopic aspect, for which one does not need to know these details. Knowing that the details are irrelecant one may as well replace them with a suitable average. However, having said this one has not even begun to explain that experimental fact. The fundamental question is: How is it possible that such a macroscopic behavior exists, governed by its own equations of motion, regardless of the details of the microscopic motion?

          Essentially the same critiques appear in “Views of a Physicist”:

          (p. 20) No-one would think of applying Laplace’s formula to the survival change of Reagan. A recent revival of this old misconception can be found in the work of Jaynes and his school.

          (p. 72) Another wellknown argumentation is based on the same kind of consideration that also serves as a foundation for the usual application of statistics: If something is unknown replace it with a probability distribution. It is natural to say: “I don’t know whether or not it will rain tomorrow, hence the chances are fift-fifty”. An extreme example is the quotation from Jaynes, one of the protagonists of this view.
          “We now have reached a state where statistical mechanics is no longer dependent on physical hypothesis, but may become merely an example of statistical inference.”
          This antropomorphic statement confuses once again subjective and objective probability. This criticism was well put by Reichenbach: “Why should a physical system follow the direction of human ignorance?”

          For what is worth, other authors have a more positve view of Jaynes’ contributions to the field:
          https://physicstoday.scitation.org/doi/10.1063/1.3265239

          From the preface:

          A major reason why entropy has been conceptually controversial is that its underlying meaning transcends its usual application to thermal physics, although that is our interest here. The concept has a deeper origin in probability theory, traces of which are evident in the work of the founders of statistical mechanics, Ludwig Boltzmann and J. Willard Gibbs. The variational algorithm of Gibbs is recognized as central to most of thermal physics, and the later work of Claude Shannon and Edwin Jaynes uncovered its roots in the theory of probability itself as a Principle of Maximum Entropy (PME). There has always been some degree of controversy surrounding the fundamental nature of this principle in physics, as far back as early in the 20th century with the work of Paul and Tatyana Ehrenfest. Much of this “cloud” is due to a lack of appreciation by many theorists of the dominant role of rational inference in science, and a goal of the present work is to overcome this resistance.

        • Yeah it’s not always a direct citation of Jaynes – eg van Kampen and Keizer also discuss the Gibbs approach to non-equilibrium situations and its difficulties, and this discussion applies to Jaynes’ stuff as well.

  4. The correspondent’s hypothesis that junk science could break inertia in decision-making, creating a more optimal outcome, feels weird. I’m not a scientist. I run a business operation. I am always concerned with optimization. I see empirically that inaction is usually more costly than moving quickly with a sub-optimal course of action, as long as I measure what’s going on and retain the flexibility to alter course.

    Sometimes being decisive requires convincing my superiors that we should do something. Fortunately I work with smart enough people that presenting the idea as a balance of risks, including the risks of inaction, suffices. If I had to resort to using junk science to win support, I’d probably quit since I would lose respect for my leaders if such an approach worked.

    • Right. Amazon has a leadership principle called “Bias for Action” that’s meant to encompass this, yet strives to be one of most data-driven organizations on the planet. The cost of inaction often outweighs the cost of using suboptimal methods or making suboptimal decisions. To add to what your saying, part of the key to doing this right is to understand what decisions or projects require extensive study, and which ones don’t.

      Adding junk science to give decisionmakers false confidence in their choices won’t help those who understand the above. It’ll only help those who don’t, and hopefully the market economy works well enough that those who don’t will quickly be driven out of business.

      It’s a much more worrisome problem from a public policy or allocation of resources perspective though.

  5. “I’ve been trying to find ways of reassuring myself that this type of science isn’t a bad thing”

    You could always reason that scientists engaging in all that is bad in science do it because they want to improve matters. They might simply be sacrificing their scientific souls, should such a thing exist, in order to improve things once they will be in the postition to do so.

    In the mean time, they will have to make sacrifices by displaying (what can possibly even be seen as anti-scientific) things on their CV like:

    1) a probably impossible/non-representative ratio of “positive” vs. “negative” published results
    2) a possible long list of low-quality papers with their name on them
    3) an indication of how much tax-payer money they might have wasted via listed received grants
    4) (the possible shame of) listing individual awards received for simply doing their job (which is probably made possible by the tax-payer, and almost definitely by other researchers)
    5) a publication + reviewer history which can be interpreted as having played an active part in what can be considered to be the giant academic publication scam

    They may all have done this just to keep their job so they will one day be in the position to change all the bad incentives that made them do all the things they’ve done.

    In a way, they may be sacrificing their scientific souls for the common good. If that’s the case, i thank them because i don’t think i could ever do that…

    • My experience has been that those that play the game rarely challenge it once they “succeed.” I’ve seen far too many academics, after being abused (metaphorically or otherwise), that seem determined to force others to go through exactly what they had to put up with. It is the rare academic that, after receiving tenure (or other rewards), then endeavors to improve the system they found objectionable at first. I don’t know if this means they have been changed during the process, whether it is merely a form of rent-protection (my most likely explanation), or some other psychological explanation.

      • Dale:

        Indeed. We discussed an example of this sort of Stockholm syndrome a couple years ago. This case was particularly sad because the academic in question had to leave his job because he wasn’t granted promotion, but then he was praising their high standards! I guess he’d internalized the criticism.

      • I think it’s a combination of the people changing and “survivorship bias”. Basically people who are “amenable” to molding themselves into these roles will be the ones who make it. People who find abusive and wasteful behavior abhorrent will get themselves out of the situation. Even if everyone starts out finding it problematic, some will learn to internalize the system, and others will find another thing to do. It’s a rare few who make it through the system and then work on changing the system.

        • “Even if everyone starts out finding it problematic, some will learn to internalize the system, and others will find another thing to do. It’s a rare few who make it through the system and then work on changing the system.”

          Yes, and perhaps even those that made it through the system, and possibly want to change it, may still have internalized it too much, and/or trust it too much, and/or still want to work with it too much.

          Should it be useful and/or amusing, in light of that i possibly made a fool out of myself to try and get this point across for a 2nd time here (see my comment dated march 18th): https://www.psychologicalscience.org/observer/preregistration-becoming-the-norm-in-psychological-science

        • Thanks for the link. I agree that “peer review only registered reports” miss the point of registered reports.

  6. The canonical method (at least it used to be…) for using a coin-flip to make a decision—I have, rarely, used it myself—between two comparably attractive courses of action is to toss the coin in the air, and immediately observe closely and carefully which face you are hoping will come up.

Leave a Reply to BenK Cancel reply

Your email address will not be published. Required fields are marked *