“Furthermore, there are forms of research that have reached such a degree of complexity in their experimental methodology that replicative repetition can be difficult.”

[cat picture]

Shravan Vasishth writes:

The German NSF (DFG) has recently published a position paper on replicability, which contains the following explosive statement (emphasis mine in the quote below).

The first part of their defence against replicability is reasonable: some experiments can never be repeated under the same conditions (e.g., volcanic eruptions etc). But if that is so, why do researchers use frequentist logic for their analyses? This is the one situation situation where one cannot even imagine repeating the experiment hypothetically (cause the volcano to erupt 10,000 times and calculate the mean emission or whatever and its standard error).

The second part of their defence (in boldface) gives a free pass to the social psychologists. Now one can always claim that the experiment is “difficult” to redo. That is exactly the Fiske defence.

DFG quote:

Scientific results can be replicable, but they need not be. Replicability is not a universal criterion for scientific knowledge. The expectation that all scientific findings must be replicable cannot be satisfied, if only because numerous research areas investigate unique events such as climate change, supernovas, volcanic eruptions or past events. Other research areas focus on the observation and analysis of contingent phenomena (e.g. in the earth system sciences or in astrophysics) or investigate phenomena that cannot be observed repeatedly for other reasons (e.g., ethical, financial or technical reasons). Furthermore, there are forms of research that have reached such a degree of complexity in their experimental methodology that replicative repetition can be difficult.

Wow. I guess they’ll have to specify exactly which are these forms of research are too complex to replicate. And why, if it is too complex to replicate, we should care about such claims. As is often the case in such discussions, I feel that their meaning would be much clearer if they’d give some examples.

38 thoughts on ““Furthermore, there are forms of research that have reached such a degree of complexity in their experimental methodology that replicative repetition can be difficult.”

  1. Sure, there are forms of research that cannot be replicated due to so called “complexity”, but one cannot then call these research endeavours **science**. Even the [definition of science](https://en.m.wikipedia.org/wiki/Science) implies that replication is necessary, because you cannot make testable predictions if a design/experiment cannot be replicated.

    And if the methods are *so complex* that they cannot be replicated, how can we trust the original results? This sort of statement just gives shoddy research an excuse to continue on when it shouldn’t.

    • ^I am coming at this from the experimental science side of things. Of course, some phenomenon actually cannot be replicated, but do give us a better understanding of the world (e.g. studying volcanic erruptions). These phenomenon are not experimental by nature, though. Experimental science (e.g. our favorite PNAS papers) is certainly open to scrutiny when it fails replication–if a controlled experiment cannot be replicated, it is likely that we do not fully understand the phenomenon being studied and that the original findings were some sort of fluke. If we do not interpret failed (good) replications like this, it is hard to make the argument that what we are doing is science.

  2. Well, as Heraclitus said a long time ago, you can’t step in the same river twice.

    So, what is replication? Strictly speaking, there is no such thing. Something is always different. But now consider an investigation that is so “complex” that it cannot be replicated. If every inevitable change to the design and implementation undermines the results, then the findings have no robustness of any kind and are of no use. All attempted replications will fail to fully replicate in some detail. If a finding changes every time a different investigator tries to replicate it as closely as possible, then it would appear that the finding is a property of the investigator (or just a fluke entirely) and not the matter being investigated. If the finding holds up only when a particular way of measuring a certain variable is used, then it is an attribute of that measurement technique. Etc.

    So perhaps the concept of “replicability” is really the concept of robustness: does the finding hold up under a satisfactorily broad range of conditions and approaches that it can be considered useful and, in some practical and meaningful sense, enduring. We should not just seek exact replications of studies. Apart from being impossible, it wouldn’t be really useful. We should seek instead to understand the range of variation in the investigative approach that the finding is robust to. (And understanding, here, may well include building a model of the relationship between the sources of variation in approach and the results.)

    • “So perhaps the concept of “replicability” is really the concept of robustness…”

      GS: Otherwise known as “generality.” Reliability and generality (the hallmark of “good data”) are really on a continuum. There probably is no such thing as a “real” direct replication – there are always differences that could result in a failure to replicate even if the original finding was “true.” If that is the case, then the original finding is of such limited generality as to be mostly useless. Eventually, hard to reproduce, rare phenomena can be important, keeping in mind that science advances far more by finding similarities than differences. The latter are a dime-a-dozen.

    • Exactly what I was thinking. If you publish something and claim that it generalizes and is important, then surely it should be something you expect to replicate under slightly different circumstances. I doubt PNAS and others would get excited about “An effect in study center XXX when studied by Dr. YYY” with the conclusion that we do not expect this finding to be generalizable to any other location, setting or person making the observations. Okay, so maybe sometimes that is not always entirely true (e.g. when someone tries to figure out what exact peculiarity of a particular historical event and the setting it occured in lead to a specific outcome), but most people seem to want claim that their findings are generalizable.

    • Isn’t it often the case that, although a particular experiment may not be replicable in its entirety, parts of it can?

      The reason conclusions can be considered scientific is that components of the experiment can be replicated. We may not be able to replicate a particular volcanic explosion, but can observe multiple instances of lava or pyoclastic flows or ash ejections etc. from a variety of eruptions. Further, we may be able to replicate in labs some aspects of lava or pyroclastic flows, and use the conclusions from such (replicable) experiments to draw conclusions about the flows – this is the modeling part.

      So while many experiments cannot be replicated, our scientifically valid conclusions might come from (replicable components of the experiment) + (a model tying these components together). This is the way I usually think of it and wonder if there’s a fallacy.

      • I don’t think there’s a fallacy. My opinion is that science is (or should be) about causal modeling (ie. stylized fact x about the world at time t makes stylized fact y inevitably occur later in time essentially independent of all but the most unusual complications). We want understanding and generalizability. It’s a good thing if you use a laboratory experiment, and some observations on small scale volcanic eruptions to predict what will happen in a particular large scale eruption, and the prediction is accurate… Even if you can never replicate that large scale eruption. This *is* science. Whereas the perfectly replicable experiment on say priming (in the sense of able to be set up and performed multiple times in a lab) which nevertheless gives a different answer each time, and requires invoking moderators and mediators like the weather, or if the subject has recently visited their grandmother or not… etc etc is NOT science precisely because it simply doesn’t really predict anything reliably. But it has all the trapping of science: clear, controlled environment, p values backing up differences from null hypotheses, white lab coats, publications in PNAS etc etc.

  3. Good comments above, especially about robustness.

    To put the robustness point another way, if a result cannot be replicated under careful laboratory conditions, how do we know the result has any relevance to the real world where conditions are far less similar to laboratory conditions? For instance, why should we think that priming occurs in the real world if we cannot even get it to occur in a laboratory replication?

    • Scientific positions such as those can gain relevance in the real world through administrative politics. Promoting the use of vegetable trans-fats over butter, for example, was very useful for a small segment of people, many of whom were scientists. To narrow Science to results that can be replicated limits the impact of Science’s institutions on society.

      • Maybe I misunderstand you or it is my lack of knowledge about this specific example but I think that is a good thing. No scientist should promote anything that s/he cannot provide sufficient evidence for (and replicability is in my opinion part of sufficiency). Of course you can have any opinion you like as a person, but as a scientist you should only follow the evidence. So when you start to sell power-pose training or sell some scientifically doubtful priming to increases sales, then you are not a scientist, but a just a sleazy businessman seeking for victims to exploit. And I do not want such “scientist” to have any influence on politics or to have any relevance in the world. Not even talking about the damage to science by discrediting it in the public eye and wasted resources.

        I wish science had a larger impact on society. I am (maybe naively) convinced that it would increase the health, life expectancy, education, productivity and simply life satisfaction. But this is not achieved and not advanced by promoting claims based upon insufficient evidence.

        (I am not saying that opinions should not be heard and should not have an impact or that every policy must always be tested by scientists or that scientists will always agree on everything… but if a scientist makes a claim as a scientist, it should be based on science.)

        • I can’t tell whether Dzhaughn is being tongue in cheek or not, or has some other point that I don’t understand. Certainly the situation where food chemists got to make money by selling margarine as if it were better for you than butter, until decades later when it turns out it’s actually worse for you (maybe, I don’t know), that isn’t “good for science” (in the sense of improving the health-knowledge of the world) though of course it was “financially beneficial to some chemists” but then “selling arms to support African Bush Wars” wasn’t good for the economy in the sense of “improving the lives of people throughout the world” but it was “good for the economy” in the sense of “certain arms dealers got rich”.

          I think the moral of the story is “science isn’t just what happens to scientists”. Just like “successful pharma development” isn’t the same as “we sold a crap-ton of this drug and got super rich!”

  4. Is it difficult to grasp that each step adds complex confounding factors? And that these factors need to reduce to some value each time you go down the path? It’s not just that you have a complicated path from this side of the hill to the other side but that one trip may be extremely different from the next: you may have great weather one trip and be snowed in the next or you may be attacked by natives or not, etc. If you know the conditions – like it’s winter or not – then you can sometimes plan but in most of these studies the point is you don’t know the conditions, kind of like maps which said ‘there be dragons here’ to indicate ignorance. You may be mapping a path but you may be mapping a one time or short time thing. Real world example: lots of people moved to the Dakotas and West Texas/OK expecting the rainfall of recent years would continue … and it didn’t and all those places are now gone.

  5. You can’t replicate reality. But you can replicate a model of reality. That’s what frequentists do, and it’s unfair to pillory them for something else.

    p.s. I ain’t a frequentist.

    • Disagree here. This is the idea behind frequentist methods, but this does not say any thing about how they are used in practice. Bayesians can evaluate frequentist calibration of their models, and this is what I do regularly. The key is to remember frequetists calibration is statistical and not substantative. Give me any sample of A, variance at level one as B, variance at level two as C, and D_truth as D and this is how my model performs, irrespective of the research question.

      This is much different than frequentist NHST, in which sampling procedures in simulation are thought to provide question specific evidence; they do not.

    • Dear Bill,

      I do not get the point you are making.

      Since when is replicability about frequentist or Bayesian?
      Bayesians can also use bad measurement, hack/forking path Bayesfactors or calculate posteriors based upon small N. Bayesians surely also have presented results that were not an adequate model of the data/”truth”/”world” or claimed effects that were just not substantiate by the data or could never be found again in larger and more rigorous samples.

      Also, the studies in question are not case studies. Sure, case studies cannot really be replicated (although the analyses and conclusion should be replicateable). But the studies we are talking about make claims like: doing X will have effect Y. Or taking pill A will reduce symptoms B. It can of course be that the researchers did not know that the effect of X on Y only holds under Z and pill A only works for people who are C. But then the claim needs to be adjusted and we better know this.
      Scientists are usually not making claims about the sample or about a very specific moment in time, but try to generalize from the sample to a larger population. Otherwise, why would you do any type of modelling (frequentist or Bayesian)?

      • I was commenting on this line: “But if that is so, why do researchers use frequentist logic for their analyses? This is the one situation situation where one cannot even imagine repeating the experiment hypothetically (cause the volcano to erupt 10,000 times and calculate the mean emission or whatever and its standard error).”

  6. See journal Cell for non human animal neuroscience studies that are so complex that replication is not likely. While it is possible, if things did not replicate, there are so many things that could be used as the reason why. It’s an unfalsifisble field with no replication and samples consistently under 10. Really waste of tax payer money.

  7. There’s a telling (and common) conflation in the paper of “replicable” and “replicated”, which is it connected to the (also common) confusion of “falsifiable” and “falsified”. A failed replication does not show that a study is not replicable, in fact, the replication shows precisely that the study was replicable–it’s just that the result wasn’t replicated. Likewise, a failed falsification does not show that a study isn’t falsifiable.

    You can (i.e., should be able) decide whether a result is replicable and/or falsifiable simply by looking at how the study was done. It’s a methodological question, not an empirical one. Indeed, it should never be an empirical question since it decides whether or not the claim can be properly said to even be “empirical”.

    What happens when you drop a drinking glass on a tile floor is replicable experiment anyone can do. So is what happens when you try to shatter it with the power of your mind. The effect of war on refugee populations is a replicable observation. The effect of 6000 yogic flyers on world peace is also replicable. It’s just a matter of agreeing on a dependent variable. In all cases, we can easily imagine replicable and unreplicable studies, falsiable and unfalsifiable one, scientific and unscientific approaches to the problem.

  8. One interesting thing I am noticing when rerunning published studies is that the original credible intervals are so wide that virtually any result is consistent with the original one. The original one only got published because p less than 0.05, not because it was inconsistent with practically no effect. No matter how I present the argument to psych type people, they look at me like I am nuts.

    • When a large, well-defined group of people consistently looks at you like you are nuts, it is reasonable to conjecture that maybe they are all nuts.

    • “No matter how I present the argument to psych type people, they look at me like I am nuts.”

      GS: Mainstream “psych type people.” Just a reminder, since I hardly ever mention it, that there is a natural science of behavior that is considered “psychology.” And though the numbers doing the basic science have dropped (in psychology, the creationists – psychological creationists – won…unlike what is the case in biology-proper where the selectionists won), they are still found in psychology depts. Anyway, those that are part of the natural science of behavior do not use NHST, or even do group-designs – they directly control the behavior of individual subjects. So…when you say “psych type people” you are including 1/100th of 1.0% of psychology that doesn’t fit there, thank you very much! The natural science of behavior is “psychology” in name only.

  9. It reminds me of the first mathematical proof of the Four Color Theorem, which, because it used a computer algorithm to enumerate a host of cases was deemed controversial for quite some time. The challenge in verifying the proof, then, was to independently write another computer program to check the first program. Issues of infinite regress then began to arise, because mathematical proofs have no place for statistical uncertainties (the probability that n programs to calculate the same thing reach different conclusions.)

    No experiment is perfectly replicable, so it is fortunate that science requires no such thing.

  10. “Wow. I guess they’ll have to specify exactly which are these forms of research are too complex to replicate. And why, if it is too complex to replicate, we should care about such claims.”

    There a very simple example of this: case studies in medicine. It’s my understanding that this is the foundation of much medical science: a doctor treats a particular patient with potentially odd symptoms, records it, and then at some point someone reviews the case studies and says “hmm, there seems to be a common trend…”. Further investigations begin from there.

    As such, these case studies are the very basis of where the scientific inquiry begins, but are by no means reproducible; you’re never going to get another patient with the exact same background, symptoms, etc. But throwing out that information is a terrible idea!

    As the article indicates, this is not just in the medical field: if some sort of geological event happens, it is very helpful to gather as much data as possible to learn about the phenomena. But you can’t replicate volcanos, earthquakes, etc.! And if you could, I hope an IRB would step in and say no.

    It’s my understanding this has become a huge political issue. The “Secret Science Reform Act” requires the EPA to only make policy based on reproducible research, among other requirements. However, note that something like a longitudinal study of individuals exposed to higher levels of lead, etc., is not reproducible! So one potential ramification of this act would be to force the EPA to stop regulating the amount of lead in our water…at least until we can reproduce various catastrophes.

    • > if some sort of geological event happens, it is very helpful to gather as much data as possible to learn about the phenomena.

      What else does this mean than that there are generalisable features in geological events; i.e. issues that replicate in other events?

      With replicability, we don’t mean we need to have the same sample with the same DNA and the same experiences. Just that we figure out the important, generalisable parts.

      Note that “reproducibility” refers to the analysis and should always be the case. “Replicability” refers to the experiment, and there are different facets of similarity (see taxonomy here: https://osf.io/preprints/psyarxiv/uwmr8)

      > However, note that something like a longitudinal study of individuals exposed to higher levels of lead, etc., is not reproducible!

      So, assuming you mean replicable, why not? If the same amount of lead exposure observed in a different country or village does not lead to deteriorating health, surely that informs us about the effects of lead exposure? If one study finds out that vaccines cause autism, surely we want to know if the effect replicates or not?

      There’s a related debate with links here: https://mattiheino.com/2017/06/02/replication-is-impossible/

      • “So, assuming you mean replicable, why not? If the same amount of lead exposure observed in a different country or village does not lead to deteriorating health, surely that informs us about the effects of lead exposure?”

        It’s not really worth having a discussion about whether it’s a good idea to allow lead into the water system to double check that it’s bad for people.

        • “It’s not really worth having a discussion about whether it’s a good idea to allow lead into the water system to double check that it’s bad for people.”

          Agreed, let’s not then. I don’t think this is what was suggested anyway.

          How do we come to know that something like lead in the water system is bad for people? (It seems logical to me that replicability has something to do with coming to such a conclusion in some way or form)

        • “It seems logical to me that replicability has something to do with coming to such a conclusion in some way or form…”

          GS: More like natural selection – not “logic.” Indeed, any kind of “verificationism” relies on affirming the consequent and, of course, that’s a logical fallacy. But affirming the consequent, where the person is in control of the event in question, is closely related to the basic processes involving the consequences of behavior (reinforcement and punishment). Why does “the rat believe that its lever-presses cause food-delivery”? Well, because when it presses the lever, food shows up! Of course, in the lead-poisoning thing you are talking about observing correlations but the same principle holds, this time the analogy is to Pavlovian conditioning, rather than operant conditioning.

          What this means is somewhat obvious, but worth stating: science is (and how could it be otherwise) an “outgrowth” of basic behavioral processes. That is another way of saying, as Skinner did, “Science is the behavior of scientists.” Unfortunately, putting together complex behavior (in the sense of interpreting it, and eventually, manipulating it) using a limited number of basic behavioral processes was replaced by invoking a seemingly limitless number of “processes” to explain behavior. Even where cognitive “science” seems to be using a few basic processes, the subtypes multiply with astonishing speed. How many kinds of “memory” are there? Each “kind” being added whenever needed to “explain” behavior post hoc. But I digress…

      • “What else does this mean than that there are generalisable features in geological events; i.e. issues that replicate in other events?”

        GS: Geology, like astrophysics and much of evolutionary biology, is not an experimental science. Phenomena like this must often be interpreted in terms of facts obtained through experimental sciences, and it is those facts that are general. Not to say that studying individual events does not inform as to similar events. Interpretation is a sort of overlooked aspect of science – I often talk about prediction and control, but how an experimental science lends itself to interpreting events in the non-laboratory world is also quite important.

Leave a Reply to Matti Heino Cancel reply

Your email address will not be published. Required fields are marked *