Skip to content
 

A model for scientific research programmes that include both “exploratory phenomenon-driven research” and “theory-testing science”

John Christie points us to an article by Klaus Fiedler, What Constitutes Strong Psychological Science? The (Neglected) Role of Diagnosticity and A Priori Theorizing, which begins:

A Bayesian perspective on Ioannidis’s (2005) memorable statement that “Most Published Research Findings Are False” suggests a seemingly inescapable trade-off: It appears as if research hypotheses are based either on safe ground (high prior odds), yielding valid but unsurprising results, or on unexpected and novel ideas (low prior odds), inspiring risky and surprising findings that are inevitably often wrong. Indeed, research of two prominent types, sexy hypothesis testing and model testing, is often characterized by low priors (due to astounding hypotheses and conjunctive models) as well as low-likelihood ratios (due to nondiagnostic predictions of the yin-or-yang type). However, the trade-off is not inescapable: An alternative research approach, theory-driven cumulative science, aims at maximizing both prior odds and diagnostic hypothesis testing. The final discussion emphasizes the value of pluralistic science, within which exploratory phenomenon-driven research can play a similarly strong part as strict theory-testing science.

I like a lot of this paper. I think Fiedler’s making a mistake working in the false positive, false negative framework—I know that’s how lots of people have been trained to think about science, but I think it’s an awkward framework that can lead to serious mistake. That said, I like the what Fielder’s saying. I think it would be a great idea for someone to translate it into my language, in which effects are nonzero and variable.

And the ideas apply far beyond psychology, I think to social and biological sciences more generally.

28 Comments

  1. Andrew,

    May have read it. I would also recommend Paul Rozin’s cogent article which pertains I think. Hope link works.

    https://pdfs.semanticscholar.org/f4af/3299663cd5989a325fdcee25e8b84e5a71d7.pdf

  2. Anoneuoid says:

    I really don’t see what is so complicated about it.

    You explore some data to come up with guesses at what process may explain any patterns you see (ie abduction), then from those guesses you deduce some predictions, then you collect new data and compare it to your predictions.

    It is really a quite simple concept, so there is something else going on if so many people are having trouble with it.

    • Anonymous says:

      “I really don’t see what is so complicated about it. “

      Yes, i also don’t get what’s so difficult!

      But don’t tell that to certain psychologists. They make it (seem that it’s) all very, very hard and complicated.

      Even what they at first say is easy, turns out to be hard in the end. E.g. see here:

      http://andrewgelman.com/2018/04/15/fixing-reproducibility-crisis-openness-increasing-sample-size-preregistration-not-enuf/#comment-707143

    • Andoneuoid

      I agree that it is quite simple concept and already in play at different stages of research. How else can one hope to advance theory & practice otherwise. I think it’s also about a particular analysts conceptual breadth and depth. Only a subset has ever had such.

    • Keith O'Rourke says:

      > explore some data to come up with guesses at what process may explain any patterns you see (ie abduction)
      If you are just blindly using the patterns suggested in the data your just staging the “unexpected and novel ideas (low prior odds)” early on rather than avoiding them.

      Profitable abduction _should_ do better. Peirce tried really hard to justify that but admittedly only came up with the weak arguments of we evolved to make abductions more profitably than following apparent patterns in observations or we were built that way for our universe.

      On the other hand, there may not be a big downside in just hoping its true.

      • Anoneuoid says:

        If you are just blindly using the patterns suggested in the data your just staging the “unexpected and novel ideas (low prior odds)” early on rather than avoiding them.

        I’m not sure what you mean by “blindly”, but I would for sure say that most of these statistically significant correlations don’t represent any kind of meaningful pattern.

      • Anoneuoid says:

        As an example for how simple I would consider a meaningful pattern: The incidence of many cancers peaks at a certain age particular to that cancer, and those that seem to keep increasing could very well just be people dying from something else before the peak.

        It is something crucial that any theory of carcinogenesis must explain, perhaps as a measurement artifact.

        • Keith O'Rourke says:

          > crucial that any theory of carcinogenesis must explain
          That is not just blindly using the patterns suggested in the data, but rather an informed choice of pattern to focus on and _likely_ a more profitable abduction. Justifying the claim of likely more profitable is the challenge.

          • Anoneuoid says:

            Hmm, I’m still not sure I see what issue you are getting at with these “blind patterns”. For example, I wouldn’t make anything of a weak linear relationship like this: https://image.ibb.co/dp6xPS/avb.png

            Code for plot:
            set.seed(0)
            n = 30
            a = rnorm(n)
            b = a + rnorm(n)

            test = cor.test(a, b)
            r = round(test$estimate,3)

            plot(a, b, main = paste(“r =”, r))

            I mean I would accept that some real correlation does exist there, and obviously you can use “a” to predict “b” and vice versa… but there isn’t much to do with this type of pattern in an abductive reasoning sense.

  3. Anonymous says:

    I always have a hard time reading and understanding Fiedler. I’ve read 3 papers by him, and with every one of these i am never quite sure if he is pulling a Sokal-like hoax and/or what his main points are.

    The paper linked to in this blogpost is a nice example of this. How am i supposed to interpret the following sentences on page 53:

    “Creativity is facilitated by positive mood, not negative mood (Rowe, Hirsh, Anderson, & Smith, 2007). To be sure, not every study leads to the discovery of a stable law. However, to view the fruits of creative science, one has to separate the wheat from the chaff and to focus on the best findings rather than counting the noisy findings arising as byproducts of the discovery process. Such a reframed perspective alone results in a much more optimistic appraisal.”

    To me that reads a lot like: “cherry picking” is totally fine because you are just leaving out noisy findings that are a byproduct of discoveries, and if you view things like that everything works nicely and all in science is just peachy.

    What is he saying here?

    • Anonymous,

      LOL

      I can appreciate your view. Clarity is not the most distinguishing feature of much writing in these fields. What wheat? Which chaff?

      • Anonymous says:

        “Clarity is not the most distinguishing feature of much writing in these fields”

        Here is another example of, at least to me unclear writing in his paper. Please note however, that i may simply not be smart enough to follow his writing/reasoning. In the abstract he states the following:

        “Indeed, research of two prominent types, sexy hypothesis testing and model testing (…)”

        I wondered what he meant with “model testing” and found the section titled “model testing approach” where he states:

        “In an attempt to overcome the simplicity and the crude qualitative level of sexy-hypothesis testing, the goal of another research approach is to attain quantitative precision and commitment to clearly specified models that can be tested strictly.”

        So, am i understanding him correctly that he is saying that “to attain quantitative precision” is a “prominent type” of research? I thought research that actually predicted a certain specific effect size or some other possible version of “quantitative precision” was super rare.

        Again, i am having a hard time understanding what he is saying here.

        • Martha (Smith) says:

          Sorry if this sounds like snark, but: It sounds like he writes in a “creative” style, expressing something very subjective inside him, rather than attempting to communicate clearly something that has been well-thought-out and is clearly connected with reality outside his own mind.

        • Anonymous

          Fiedler is drawing on relatively common contrast between ‘qualitative & quantitative findings. Yet by describing the ‘qualitative [sexy hypothesis testing] as ‘crude’ and the ‘quantitative [modelling]’ as ‘precise’ or ‘precision-guided’ Fiedler then produces a false dichotomy. Not sure how he conceives modeling.

        • Guido Biele says:

          I think knowing a bit about Fiedler’s background helps to understand what he is alluding to. I think what he is saying makes sense, but I agree that it is not argued clearly enough.

          About separating the wheat from the chaff: For many research topics there is good research (precises measurements, sufficient sample size, to give example criteria) and bad research (theories with low empirical content, competing theories with largely overlapping predictions). I think that what he is trying to say is even if there is lots of sub-optimal research out their, we can still learn from a body of research if we focus on the studies that were well done.

          About research that aims to “to attain quantitative precision” and is even a “prominent type”: Here it helps to know that Fiedler has interacted a lot with people doing decision making research. And in that field quantitative models with precise prediction are indeed prominent (certainly also in some other sub-fields of psychology, but those I know less well).

  4. jrc says:

    “Creativity is facilitated by positive mood, not negative mood”

    https://en.wikipedia.org/wiki/List_of_people_with_major_depressive_disorder

    Don’t worry, he can revise the hypothesis by interacting the main effect with a separate dummy variable for “Is Isaac Newton, Friedrich Nietzsche or Bill Murray”. Then the main effect should pop back up… assuming he has properly cleaned his data and trimmed outliers from his sample (for instance, Kid Cudi and Kool Keith shouldn’t count because rhyming is a different type of creativity that isn’t affected by mood).

  5. Markus says:

    – Fiedler is both eccentric and very smart. Sometimes he veers off-topic, sometimes he has trouble getting his ideas across. He’s too old to change, the smart strategy is to grab the useful stuff and ignore the rest.
    – Without having read the article: I image he refers to the informal process of finding worthwhile lines of research. Lots of the sexy-hypothesis stuff ends up neglected because it doesn’t connect to anything else theoretically, theory itself weak and non-generative, other researchers running pilots can’t replicate, even the original author may have trouble replicating the effect and thus can’t extend and modify. And after a few years, everyone simply moves on to something else. Sometimes it doesn’t work out that way and bad science is considered established fact, but mostly it does. And while wasteful, the process ‘solves’ the coordination problem.

    • Anonymous says:

      “Lots of the sexy-hypothesis stuff ends up neglected because it doesn’t connect to anything else theoretically, theory itself weak and non-generative (…)”

      When have you ever seen a paper about a “sexy hypothesis” that was without reference to prior findings and/or theories? If i am not mistaken, usually the introduction of papers concerning these “sexy findings” is full of them.

      Just to be sure, i checked the following example of research concerning what Fiedler himself seems to call a “sexy hypothesis” when he wrote: “Typical of such research is the focus on elementary hypotheses about the impact of a single causal factor on a single dependent measure: Guilt serves to increase risky decisions (Kouchaki, Oveis, & Gino, 2014) (…)”

      From the paper by Kouchaki et al. :

      “Here, we test the idea that guilt instead increases optimism and preferences for risk as well as the likelihood of risk-taking behavior. This should occur, we reason, because guilt enhances perceived control over one’s environment. Individuals appraise guilt-inducing events with attributions of self-responsibility for failure and enhanced appraisals that events are controllable (Tracy & Robins, 2006), a feature shared by positive emotions such as joy and happiness and by the negative emotion of anger (Frijda, Kuipers, & ter Schure, 1989; Lerner & Keltner, 2001; Smith & Ellsworth, 1985). Although guilt belongs to the class of emotions featuring a negative selfevaluation(Ortony, Clore, & Collins, 1988), such that one’s actions are appraised negatively, the appraisal process signals that one was in control of doing something wrong (i.e., responsible for it); thus, the experience of guilt results in action tendencies to correct the wrongdoing and make reparations to those harmed (Frijda et al., 1989; H. B. Lewis, 1971). Guilt implies that the self is able to act to restore moral order (H. B. Lewis, 1971). A causal effect of control beliefs on guilt has been documented previously (Berndsen & Manstead, 2007), but, no study has yet demonstrated that guilt increases sense of control.”

      Now, it seems to me that there is lots of mentioning of prior research and/or associated theories that helped support and/or made possible the birth of this “sexy hypothesis”.

      (As a side note: i appreciate your, and other’s, comments here about Fiedler’s writing. It is helpful in possibly explaining my experience reading some of his work)

      • Anonymous says:

        After reading the quote above by Kouchaki, it occured to me that what Fiedler seems to view as sexy-hypothesis testing can also be seen as what Fiedler seems to view as model-testing: guilt —> (percveived control) —> risk-taking behavoir.

        I then read parts of Fiedler again where he talks about model-testing (what he appeared to view as a different type of research compared to sexy-hypothesis-testing). To my surprise the following is written there:

        “Thus, Kouchaki et al. (2014) assumed that an enhanced sense of control (Z) mediates the impact of guilt (X) on increased risk taking (Y).”

        So, am i understanding things correctly here that Fiedler uses the same paper as examples of both the 2 different research types he proposes exist?

        As i stated above, i am never quite sure if he is pulling a Sokal-like hoax and/or what his main points are.

  6. Markus says:

    ‘When have you ever seen a paper about a “sexy hypothesis” that was without reference to prior findings and/or theories? If i am not mistaken, usually the introduction of papers concerning these “sexy findings” is full of them. ‘
    You’re right. I maintain that this is just kabuki.

    Taking your quote from Kouchaki et al. as an example:
    – Lots of citations for the obvious (‘guilt belongs to the class of emotions featuring a negative selfevaluation’, ‘ the experience of guilt results in action tendencies to correct the wrongdoing ‘) that don’t really need a citation, but adding one creates an illusion of theory.
    – Lots of very old citations. Often evidence that theory is free floating and the best ‘related works’ are either very general ancient texts on the phenomenon and/or related on a very abstract level only. Clearest example here is probably the H.B. Lewis cite, which structurally should be part of building a case from the description of guilt to it’s effect on decision making, especially the willingness to make risky decisions, but instead veers off into ‘restoring moral order’.
    – Lack of a clear cause-effect structure. Ideally one would want them to build towards A -> B; B -> C; C -> D and then test A -> D or something like that. Instead we get verbiage.

    Ironically, the next paragraph from Kouchaki et al. admits it’s all just forking paths:
    ‘In this article, we argue that guilt promotes risk taking. Intuitively, however, the opposite proposition—that guilt should lead to less risky behavior—may seem more likely. Indeed, some research shows that guilt proneness is inversely related to risky behavior.’

    • Anonymous says:

      I agree that the “theory” might just well be a lot of BS and/or based on false positives, but all this talk about there being no “theory” in psychological science has always sounded strange to me. I gather the overwhelming majority (if not 100%) of psychological papers have some “theory” or previous findings mentioned that lead up to the hypothesis.

      Now, the real issue to me might be that the “theories” are in bad shape/are incapable of predicting things/etc. If this is what you, or Fiedler, means it would help me when this is clrealy written.

      Reasoning from that point onwards, i reason that things like lack of replications, low-powered studies, p-hacking, publication bias, etc. make it that the “theories”, and their testing and (re-) formulation, are in a very poor state because they have probably been based on bad quality research, and/or only a selection of the total available evidence (thanks to publication bias/the file-drawer effect).

      In my reasoning this can only be solved by exactly the kind of things Fiedler (or possibly the folks who cite this paper of him) seems to want to skip over when he writes:

      “The point here is not to argue that replication, reliability, and statistical analyses are worthless but that these technical
      issues are subordinate to more fundamental issues of research design and logic of science.”

      I reason these “technical issues” might be a necessary but nut sufficient part of good science. I don’t see how they are subordinate.

      What did Fiedler bring to the table with this paper? Or to use your words, what useful stuff could i grab from it? I seriously don’t know…

Leave a Reply