Stranger than fiction

Someone pointed me to a long discussion, which he preferred not to share publicly, of his perspective on a scientific controversy in his field of research. He characterized a particular claim as “impossible to be true, i.e., false, and therefore, by definition, fiction.”

But my impression of a lot of research misconduct is that the researchers in question believe they are acting in the service of a larger truth, and when they misrepresent data or exaggerate conclusions, that they feel they’re just anticipating the findings that they already know are correct. This is inappropriate from a scientific perspective but it doesn’t quite feel like lying either. Again, having not read any of the details I am not saying that any aspects of this apply to this person’s particular story, I’m just speaking in general.

It would be fair to characterize the typical unjustified claim in a scientific paper (pick your favorite example here) not quite as fiction (defined as “literature in the form of prose, especially short stories and novels, that describes imaginary events and people”), in that any evidence of such a claim is imaginary. But that doesn’t sound quite right to me. I’d characterize it more as “misleading exposition,” if such a literary classification could be said to exist.

My correspondent replied as follows to my comments regarding the story in his own field:

“Fiction” doesn’t quite fit, but it’s close in this case. Fiction assumes that the authors know it is untrue, I suppose, and unless you can have an honest conversation (or any conversation at all) with the authors, one can never know what is in the other guy’s head.

It’s possible that this person believes what he is writing, but as I think you have written, when it is defend defend defend in spite of all of the problems, at some point the authors have to know that none of it is true, then it crosses the line to unethical behavior, or beyond.

That could be. Again, I can see that someone can violate scientific ethics without thinking they are writing fiction. A related issue is that people will cheat when they think they’re in a fight. Once you feel that the other guy is hitting below the belt, it’s natural to feel that anything goes from your side. It’s all horrible and I wish we never had to think about any of this stuff. But we do need to look at this stuff, given the influence and reach of outlets such as NPR, Gladwell, Ted, PNAS, Freakonomics, etc., all of which are generally well-meaning (I assume) but are susceptible to the stories spun by exaggerators, fakers, and just plain confused people in the science biz.

33 thoughts on “Stranger than fiction

  1. This might be relevant:

    https://www.stoa.org.uk/topics/bullshit/pdf/on-bullshit.pdf

    “It is impossible for someone to lie unless he thinks he knows the truth. Producing bullshit requires no such conviction. A person who lies is thereby responding to the truth, and he is to that extent respectful of it. When an honest man speaks, he says only what he believes to be true; and for the liar, it is correspondingly indispensable that he considers his statements to be false. For the bullshitter, however, all these bets are off: he is neither on the side of the true nor on the side of the false. His eye is not on the facts at all, as the eyes of the honest man and of the liar are, except insofar as they may be pertinent to his interest in getting away with what he says. He does not care whether the things he says describe reality correctly. He just picks them out, or makes them up, to suit his purpose.”

    I can point to several possible recent examples of psychological scientists possibly acting along these lines.

    If it matters whether the bullshitter does so on purpose or not, (to make the claim that someone is bullshitting or has produced bullshit) is not clear to me (should this possible distinction matter).

    • Anon:

      Interesting. I think what’s going on has some elements of bullshit (as defined at your link) but that’s not the whole story.

      Let’s take a particular case, say Marc Hauser. I assume he really does believe his underlying theories about how monkeys think. So “the truth” in that sense is important to him. The trouble is that he thinks he already knows the truth: he’s in the (unfortunately) common position of being a scientist who does a study to show, or prove, something that he thinks he already knows the answer to. Then, when his data contradict his understanding, he hides the data. (I’ve never met Mark Hauser and have no idea what he thinks about anything; the above is just my guess.)

      For another example, or set of examples, consider the people who work in those areas of psychology where the theories are so flexible that they can explain any possible pattern in data. The plan: run an experiment, find some statistically significant comparisons, tell some stories, and publish in a top journal. In some sense the researchers who do this work could be said to be “bullshitting” but I don’t think what they’re doing fits the above description. They do, I think, care about the truth; they just don’t have a good grip on what exactly they believe (beyond vague concepts such as “mind over matter” or “evolution causes men to be different from women” or whatever).

      So I don’t think the “bullshit” label quite fits.

      • “For another example, or set of examples, consider the people who work in those areas of psychology where the theories are so flexible that they can explain any possible pattern in data. The plan: run an experiment, find some statistically significant comparisons, tell some stories, and publish in a top journal.”

        I would like someone smart to write some more papers about the role of theories in psychological research. Perhaps it’s the next logical step to pay more attention to, now that several other possibly crucial aspects of psychological research have been tackled (e.g. replication).

        In the meantime, i am not very smart, but that did not stop me from trying to come up with a possible solution to “force” researchers to do more with “flexible” theories, reasoning, “optimally gathered” data, etc.

        I hope someone smart can have a look at it, and see if it makes any sense, and possibly do something useful with (parts of) it. With that intention, I have already posted the following on this blog in a different thread, on other blogs of “big names”/”leaders concerning improving psychological science”, and send it to some of them via mail as well, trying to get some feedback. I really didn’t get any feedback, but perhaps that’s because it doesn’t make any sense whatsoever.

        See the idea/format below here:

        http://statmodeling.stat.columbia.edu/2017/12/17/stranger-than-fiction/#comment-628652

        • several other possibly crucial aspects of psychological research have been tackled (e.g. replication).

          Don’t they still use p < 0.05 in both studies to determine whether a result “replicated”? Amongst other things, this still shows a deep misunderstanding of the p-value as a random variable (I know Andrew likes to harp on the noisiness of p-values, not sure if he has a go-to post about it).

          As for your post: I think it won’t solve everything but sure, it sounds like it would lead to a great improvement in the quality of information generated. However, I think you will marvel at the resistance you find just trying to set things up to systematically have a single direct replication, let alone five. In many fields, the majority of researchers seem to despise replications.

        • “Don’t they still use p < 0.05 in both studies to determine whether a result “replicated”? Amongst other things, this still shows a deep misunderstanding of the p-value as a random variable (I know Andrew likes to harp on the noisiness of p-values, not sure if he has a go-to post about it)."

          I s#uck at statistics, but this might also be improved via the format i described. This is because you would have 5 p-values per study, which could be meta-analyzed and/or the p-values will become less important due to a focus on effect sizes for instance.

          "As for your post: I think it won’t solve everything but sure, it sounds like it would lead to a great improvement in the quality of information generated."

          Thank you for your comment! I think you might be only the 2nd person who gave (some) feedback on it. I appreciate that a lot!

          "However, I think you will marvel at the resistance you find just trying to set things up to systematically have a single direct replication, let alone five. In many fields, the majority of researchers seem to despise replications."

          But why do they despise replications. In my reasoning this might be because they are afraid of the outcome. But in the format i described you wouldn't be, because you would actually *want* to replicate things because you would want to gather the most "optimal" information because you are dependent on it for the "next round".

        • But why do they despise replications. In my reasoning this might be because they are afraid of the outcome.

          The only time I’ve ever seen it in print, the claim was that replications are “unattractive”:

          Apart from rigorous replication of published studies, which is often perceived as unattractive and therefore rarely done

          http://statmodeling.stat.columbia.edu/2015/11/09/using-prediction-markets-to-estimate-the-reproducibility-of-scientific-research/#comment-251396

          To me that sounds like “science is unattractive” to such people. That is fine, there is plenty of stuff other people enjoy that I don’t, but they should get a different job…

        • My guess is that replications are largely unattractive to journal editors and hiring/promotion committees, and since these are the major drivers of academic careers, we don’t get many of them.

        • “My guess is that replications are largely unattractive to journal editors and hiring/promotion committees, and since these are the major drivers of academic careers, we don’t get many of them.”

          1) If that’s the case, and that interferes with science being science, and scientists being scientists, i reason scientists should quit publishing in journals.

          2) In the format i described, you would have 4 replications per study (+ 1 original version), and 5 different studies in total.

          I reason journals would have no problem publishing them because they are replications:

          1) they could just view it as a large-sample sized single “new” study if they want to, and

          2) i reason that the paper coming from the format i described will contain 5 relatively highly-informational different studies. Any journal would be foolish not to want to publish that i reason, as i reason such a paper would possibly collect lots of citations (which is good for the journal’s IF)

        • replications are largely unattractive to journal editors and hiring/promotion committees

          Sure, but who are these “journal editors”, and who is it that gets to be on the “hiring/promotion committees”? I don’t personally propose any particular difference between this and some random sample of researchers in whatever field. If you do, what is it?

        • “For another example, or set of examples, consider the people who work in those areas of psychology where the theories are so flexible that they can explain any possible pattern in data. The plan: run an experiment, find some statistically significant comparisons, tell some stories, and publish in a top journal.”

          &

          “In the meantime, i am not very smart, but that did not stop me from trying to come up with a possible solution to “force” researchers to do more with “flexible” theories, reasoning, “optimally gathered” data, etc. ”

          The above, and the format proposed, might also be relevant concerning the following:

          “Another Look at Meehl, Lakatos, and the Scientific Practices of Psychologists” – Reuven Dar

          “I believe that scientific psychology may be paying a dear price for this bias in training: When passing null hypothesis tests becomes the criterion for successful predictions, as well as for journal publications, there is no pressure on the
          psychology researcher to build a solid, accurate theory; all he or she is required to do, it seems, is produce “statistically significant” results”

          http://psych.colorado.edu/~willcutt/pdfs/Dar_1987.pdf

      • As i wrote earlier, “I would like someone smart to write some more papers about the role of theories in psychological research. In the meantime, i am not very smart, but that did not stop me from trying to come up with a possible solution to “force” researchers to do more with “flexible” theories, reasoning, “optimally gathered” data, etc. I hope someone smart can have a look at it, and see if it makes any sense, and possibly do something useful with (parts of) it.”

        Two other things you could possibly tie the described format to, to possibly show its possible usefulness:

        http://psych.colorado.edu/~willcutt/pdfs/Dar_1987.pdf

        “Several writers have offered solutions to the problem of weak theory testing in psychology (e.g., Nunnally, 1960; Swoyer &
        Monson, 1975). The basic idea in most of these is increased emphasis on effect size in prediction and testing. Serlin
        and Lapsley’s (1985) article offered the most complete and sophisticated of these solutions to date”

        “Lakatos himself was careful to emphasize that although his philosophy was indeed more liberal than Popper’s, it was also more strict in that it demands not only that a research programme should successfiy predict novel facts, but also that the
        protective belt of auxiliary hypotheses should be largely built according to a preconceived unifying idea, laid down in
        advance in the positive heuristic of the research programme. (Lakatos, 1978b, p. 149)”

        “Lakatos did in fact acknowledge the problem himself and suggested a solution that has been recommended by many writers (e.g., Lykken, 1968; Stevens, 197!) but is rarely practiced in psychological research: replication (see Lakatos, 1978a, p. 107)

        1) the format described in this thread contains replications, and will probably result in a focus on effect sizes

        2) the format described in this thread can possibly be tied to Lakatos’ research programmes, for instance to a “consistently progressive theoretical problem shift”, and i reason in itself may already present a more accurate, and possibly useful, version and execution of what one could even call a “research program” (cf. to business-as-usual in psychological science).

  2. I just left a comment on Mayo’s blog to the effect that some of this is essentially wishful thinking. That’s different again from lying or bullshitting.

    I also remarked that if you follow the garden of forking paths to come up with a “significant” hypothesis, then you have over-fit the data. You need to expose your hypothesis to new data to test it. It’s no different than ordinary curve fitting in this way.

  3. There is well-known and award-winning paper (cited nearly 2,000 times in less than a decade) in my discipline that presents a series of results that can easily be shown to be mathematically impossible. These impossible results represent the core of the paper’s arguments and the basis for the impact it has had on the discipline. The editors know about these impossible results but have refused to act, possibly because the paper has had such an influence on the journal’s impact factor (it is now ranked first in its area).

        • https://en.wikipedia.org/wiki/Critical_positivity_ratio ?

          https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3126111/

          “Positive Affect and the Complex Dynamics of Human Flourishing” – Barbara L. Fredrickson and Marcial F. Losada

          Cited 2250 times via google/google scholar.

          What i find interesting is that it seems to me that psychology is in such a bad state, that it doesn’t seem to matter if something is true or not looking at the no. of citations of studies that later turned out to (possibly) be wrong/not true (e.g. see all the “failed” large-scale replications of “seminal” findings of the field).

          To me, this might be indicative of several bad research practices which make it so that the “input” via reasoning, evidence, hypotheses, theories, etc. doesn’t matter much for the “output”.

          A possible solution for this could be the following format, where the “input” might actually matter for the next stages of research (the “output”).

          I call it:

          “science is dependent on scientists (old flawed model?) V scientistst are dependent on science (new improved model?)”.

          I reason that with this model, many of the recent proposals (e.g. pre-registration, publishing of null-results, open data, performing replications, etc.) will follow *automatically* because of the “switch” (scientists are now actually dependent on the science for the next stages)

          1) Small groups of let’s say 5 researchers all working on the same theory/topic/construct perform a pilot study/exploratory study and at one point make it clear for themselves and the other members of the group to have their work rigorously tested.

          2) These 5 studies will all then all be pre-registrated and prospectively replicated in a round robin fashion.

          3) You would hereby end up with 5 (what perhaps often can be seen as “conceptual” replications depending on how far you want to go to consider something a “conceptual” replication) studies, that will all have been “directly” replicated 4 times (+ 1 version via the original researcher, which makes a total of 5).

          4) All results will be published no matter the outcome in a single paper: for instance “Ego-depletion: Round 1”. This paper then includes 5 different “conceptual” studies (probably varying in degree of how “conceptual” they are, e.g. see LeBel et al.‘s “falsifiability is not optional” paper), which will all have been “directly’ replicated.

          5) All members of the team of 5 researchers would then come up with their own follow-up study, possibly (partly) related to the results of the “first round”. The process repeats itself as long as deemed fruitful.

          Additional thoughts related to this format which might be interesting regarding recent discussions and events in psychological science:

          1) Possibly think how this format could influence the discussions about “creativity”, “science being messy” and the acceptance of “null-results”.

          Researchers using this format could each come up with their own ideas for each “round” (creativity), there would be a clear demarcation between pilot-studies/exploratory studies and testing it in a confirmatory fashion (“science is messy”), and this could also contribute to publishing and “doing something” with possible null-results concerning inferences and conclusions (acceptance of “null-results”).

          2) Possibly think about how this format could influence the discussion about how there may be too much information (i.c. Simonsohn’s “let’s publish fewer papers”).

          Let’s say it’s reasonable that researchers can try and run 5 studies a year (2 years?) given time and resources (50-100 pp per study per individual researcher). That would mean that a group of researchers using this format could publish a single paper every 1 or 2 years (“let’s publish fewer papers”), but this paper would be highly informational given that it would be relatively highly-powered (5 x 50-100 pp = 250-500 pp per study), and would contain both “conceptual” and “direct” replications.

          3) Possibly think about how this format could influence the discussion about “expertise” and “reverse p-hacking/deliberately wanting to find a “null-result” concerning replications.

          Perhaps every member of these small groups would be inclined to a) “put forward” their “best” experiment they want to rigorously test using this format, and b) execute the replication part of the format (i.c. the replications of the other members’ study) with great attention and effort because they would be incentivized to do so. This is because “optimally” gathered information coming from this format (e.g. both significant and non-significant findings) would be directly helpful to them for coming up with study-proposals for the next round (e.g. see LeBel et al.’s “falsifiability is not optional” paper).

          4) Possibly think about how this format could influence the discussion about “a single study almost never provides definitive evidence for or against an effect”, and problems if interpreting “single p-values”. Also see Fisher, 1926, p. 83: “A scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance.”

          5) Possibly think about how this format could influence the discussion about the problematic grant-culture in academia. Small groups of collaborating researchers could write grant proposals together, and funding agencies would give their money to multiple researchers who each contribute their own ideas. Both things contribute to psychological science becoming less competetive and more collaborative.

          6) The overall process of this format would entail a clear distinction of post-hoc theorizing and theory testing (c.f. Wagenmakers, Wetzels, Borsboom, van der Maas, & Kievit, 2012), “rounds” of theory building, testing, and reformulation (cf. Wallander, 1992) and could be viewed as a systematic manner of data collection (cf. Chow, 2002)

          7) Finally, it might also be interesting to note that this format could lead to interesting meta-scientific information as well. For instance, perhaps the findings of a later “round” turn out to be more replicable due to enhanced accurate knowledge about a specific theory or phenomenon. Or perhaps it will show that the devastating typical process of research into psychological phenomena and theories described by Meehl (1978) will be cut-off sooner, or will follow a different path.

        • Thanks!

          A non-linear chaotic model, complete with a discussion of the famous butterfly wings. Small sample size (as best I can tell). It even has an Alan Sokal cameo. Great stuff.

          Best of all, I know the ending, so I can tell myself I would have seen through it instantly.

          I eagerly await the miniseries.

        • Oh, yeah — I remember this one. I think Nick Brown and James Coyne gave critiques of it. I recall really rolling my eyes when I read the paper. It just seemed like so much — I can’t really say BSing, because that seems to imply more awareness on the part of the authors than there seems to be; I guess what I’m really thinking is that they were very naive.

          There’s some discussion on Fredrickson’s papers in Andrew’s blog in 2016 and 2015:
          http://statmodeling.stat.columbia.edu/2016/09/21/what-has-happened-down-here-is-the-winds-have-changed/

          http://statmodeling.stat.columbia.edu/2015/10/17/in-answer-to-coynes-question-no-i-cant-make-sense-of-this-diagram/

        • Hi Martha,

          The critique of the Fredrickson and Losada article, and the subsequent reply from Fredrickson, was by me, Alan Sokal, and Harris Friedman. Harris and I also have a more global commentary in press for a special edition of the Journal of Humanistic Psychology.

          With Jim Coyne and others, I was involved in critiques of Fredrickson’s work with her graduate student, Beverly Kok, on vagal tone and emotions (Carol Nickerson has a commentary in press about this as well), and also of Fredrickson’s claims that different forms of positive emotions predicted different patterns of gene expression in the immune system.

          Nick

        • Any idea how much of the paper’s popularity was due to its use of a non-linear, chaotic model (as opposed to the paper’s result)?

          I’ve never seen a successful non-linear chaotic model.

        • Consider the belief that prayers have the capacity to heal (i.e., spiritual healing). Such beliefs are taken to result from conflation of mental phenomenon, which are subjective and immaterial, and physical phenomenon, which are objective and material (Lindeman, Svedholm-Hakkinen & Lipsanen, 2015).

          […]

          Indeed, the mean profoundness rating for each item was significantly greater than 2 (“somewhat profound”), all t’s > 5.7, all p’s < .001, indicating that our items successfully elicited a sense of profoundness on the aggregate.

          If the praying people said it was only “indicating” a capacity to heal, then their argument would be fine? I don’t know how else to interpret this.

        • I wonder what would happen if people like Benjamin et al. (2017) (https://psyarxiv.com/mky9j) would write stuff like this up and publish it in Nature or some other fancy journal.

          Apparently, they have the connections and influence to publish certain ideas which gather great attention, and useful replies by other scientists (e.g. Crane, 2017 https://psyarxiv.com/bp2z4; Lakens et al., 2017 https://psyarxiv.com/9s3y6).

          My guess is that the above idea might be at least as scientifically useful as their “redefine statistical significance”.

          Should any of them read this, and think the above presented idea might be worth further thinking/writing about, please write something about it and publish it in Nature or some other fancy journal.

          That is all, thank you!

        • Here’s some of the more obvious stuff:

          1) The authors state (p. 99, Table 1) that the fit of a higher-order model with four first-order factors is better than the fit of an alternative model with four correlated first-order factors and no higher-order factor. This is mathematically impossible because the higher-order model is nested within the oblique first-order factor model. For at least one of these two models the chi-square statistics and therefore all associated fit indexes must be wrong.

          2) In Table 1 the RMSEA value for the higher-order model (US sample) is incorrect – it should be .08 – not .05. This is easily checked using the computational formula for RMSEA (RMSEA is a function of sample size, chi-square, and degrees of freedom).

          3)In Table 1, the CFI value reported for the higher-order model (US sample) is inconsistent with the CFI values reported for both the single-factor model and first-order factor model. That is, the CFI value for the higher-order model appears to be incorrectly reported. This too can easily be checked using the computational formula for CFI and the fact that they share the same null model.

          4) The SEM results reported for the predictive validity section on page 110 are almost certainly also incorrect. For example, for sample 1 the authors first report in their correlation table that ethical leadership is related to all three of the outcome variables (two of the relationships being quite strong) and also show in Figure 1 that each of these relationships are still quite strong (and significant) even when authentic leadership is included as a predictor. However, the authors then later (also on p. 110) claim that fixing the three paths from ethical leadership to the three outcomes to zero results in no significant change to the model chi-square value. That is, they are in effect claiming that there is no difference between the observed relationships and zero despite earlier showing that the relationships were significantly different from zero.

          5) Similarly strange claims as in point 4 are made for sample 2. That is, Table 4 shows that the transformational leadership facets are all quite strongly related to the three criteria, Figure 1 shows that these relationships remain quite strong and significant when controlling for authentic leadership but the authors then claim (p. 110) that the paths from transformational leadership to the three criteria are all effectively zero. This too appears to be impossible.

          6) On page 114 the authors report the fit of higher-order model for ALQ scores. The ALQ has 16 items and as such the higher-order model should have 100 degrees of freedom. The authors however report only 95 degrees of freedom. In the same sentence the authors also report 41 degrees of freedom for the same model.

          7)Also on page 114 the authors report the fit of single-factor model for job performance based on scores from a 10-item inventory. However, a single-factor CFA model with 10 indicators should have 35 degrees of freedom but the authors report 31 degrees of freedom (or is 24?).

  4. It is orders of magnitude easier to speculate about motivations than to assess arguments. And worse, since “you can never know what is in the other guy’s head,” speculation is actually the best you can do by assessing arguments based on their motivation.

    The relative difficulty of actually assessing arguments versus dismissing them based on motivation creates a Gresham’s Law of argument. And we see the success of this assumed-motivation-pseudo-counterargument every day. As long as it works, its lower cost will drive out reasoned assessment forever.

  5. “Fiction assumes that the authors know it is untrue, I suppose.”

    I suppose otherwise. A lot of historical fiction is not known to be not true.

    In Verona, there’s a stupid tourist attraction called “Juliet’s Balcony” per Shakespeare. Maybe the people who run it believe that it really was. There is no evidence that it is not Juliet’s balcony; no better evidence for any other balcony. The people who run it may say they believe it; I say it is a fictional claim, irrespective of any belief of theirs or mine.

    P.S. Borges wrote a story of charlatans who go around with what they claim are the corpses of the Perons, collecting donations from those who wish to venerate them. The story includes the suggestions that since the Perons were so much a product of illusion, the fake corpses are actually better representations of them than the real ones would be.

  6. It certainly is all horrible. Homeopathy and ESP researchers always seem to genuinely believe the unjustified claims they make in pursuit of their ‘larger truths’ and the corrupting power of such ‘larger truths’ can be utterly astonishing. Here’s “psi-ontology” turning a prominent philosopher into a brazen ‘liar’.

Leave a Reply

Your email address will not be published. Required fields are marked *