Fair Warning

[cat picture]

A few months ago we shared Rolf Zwaan’s satirical advice on how to conduct a research project in social psychology, write it up, and publicize it, under the principle of minimal effort in the research, maximum claims in the writeup, and maximal publicity in the aftermath. I called it, “From zero to Ted talk in 18 simple steps.”

I thought this was all a joke, but at least one psychology research team seems to have taken it more seriously, following Zwaan’s 18 steps almost to the letter in this paper, “Caught Red-Minded: Evidence-Induced Denial of Mental Transgressions.” I’ll leave it to the reader to count researcher degrees of freedom in this paper.

At this point, you might say: Sure, fine, so the work is speculative. Even so, interesting speculation is important. A new speculative idea is a lot more valuable than a carping blog post.

And I’d agree with you. But I’d add three things:

1. It took me 15 minutes to write this post, and it took the authors of this paper and the participants of this experiment . . . ummm, I dunno, 1000 person-hours in total? to produce that paper. I can accept that their paper is more valuable than this post, but I don’t think it’s 4000 times more valuable.

2. Here are the first and last sentences of the abstract of that paper: “We suggest that when confronted with evidence of their socially inappropriate thoughts and feelings, people are sometimes less likely—and not more likely—to acknowledge them because evidence can elicit psychological responses that inhibit candid self-reflection. . . . These results suggest that under some circumstances, confronting people with public evidence of their private shortcomings can be counterproductive.” Is this actually interesting? I don’t know but let’s just say it’s not a slam-dunk case.

3. To the extent that the idea in that paper is interesting—and I’m willing to give the authors the benefit of the doubt on this one—it’d be just as legitimately interesting if presented as theory + speculation + qualitative observation, without the random numbers that are the quantitative results, the meaningless p-values and all the rest. As I often say about this sort of work: if you think it’s important, go for it! Publish it! Promote it! Just recognize it for the qualitative work that it is, and don’t fool yourself with numbers, or you might end up making claims that don’t replicate, and making lots of other people look like fools too.

Remember, despite what some people might say, it’s not actually the case that “the replication rate in psychology is quite high—indeed, it is statistically indistinguishable from 100%.”

P.S. Hey, these Westlake titles really work!

28 thoughts on “Fair Warning

  1. “if you think it’s important, go for it! Publish it! Promote it! Just recognize it for the qualitative work that it is, and don’t fool yourself with numbers, or you might end up making claims that don’t replicate, and making lots of other people look like fools too.”

    But, what is the scientific content in a paper in which you in essence say “I had this idea about some stuff that I think is probably true about X?”

    I’d argue that what is needed here is proper theoretical work. Make a quantitative predictive model of how things might work, provide priors for the parameters, and suggest what kinds of experiments would produce data that would inform the model. In other words, go well beyond “gee we think probably people who are confronted with their shortcomings enter into a state where they are less capable or receptive to self-reflection and more defensive” and instead start thinking about predicting what people will do based on maybe some Markov model of being in one or another state, and then having some significant changes to that state occur when faced with some experience, and then gradually transitioning back to some baseline state. You can then talk about time-scales for these transitions, or degree of stimulus required to transition, or whatever… come up with some quantitative description of the theory that is at least capable of making some quantitative predictions given some parameter values whose specific values are unknown…

    • “I’d argue that what is needed here is proper theoretical work. Make a quantitative predictive model of how things might work, provide priors for the parameters, and suggest what kinds of experiments would produce data that would inform the model.”

      GS: What if the phenomenon is directly manipulable? Why is there necessarily a need to “make a quantitative predictive model of how things might work”? Why make theories about “how things might work” when you can just investigate how they work directly?

      • I’m not against descriptive experimentation “When we did X we saw Y” I just think that if you’re going to theorize (what andrew is calling “speculate”) you should do it in a scientific way, put some effort into it.

        I actually think descriptive experimentation is really important. In talking with my wife about things (she’s a molecular biologist, works with mice) I often ask her something like “well, what does wild type do when you do X” and she often answers something like “no one has ever looked at that” and when I say something like “Geez, someone should really just harvest 50 wild type mice at different time points and measure X” she says things like “No one could ever get funding for that, you have to propose grants that are hypothesis driven”

        And so I say, yes, absolutely people should be able to propose that knowing more about what normally happens in situation FOO is worthwhile, and we’re just going to collect a bunch of data on situation FOO and publish a description of what happens.

        • Daniel:

          There is the air of condescension in your “praise” of mere description and “descriptive experimentation.” No? When one analyzes the effects of imposed variables on a subject matter one is often in a position to predict phenomena and, more importantly, one is in a position to control the subject matter. But hey…it’s merely descriptive, right? I always find this amusing. One of the criticisms leveled at behavior analysis is that “it’s merely descriptive” – yet its ability to control its subject matter – behavior – is unparalleled. So mainstream psychologists are in the position of acknowledging the control of the subject matter of which behavior analysis is capable while arguing that it isn’t even “real” science. It’s all just description…no real depth.

        • I never said “mere description”. I explicitly said how frustrating it is that there isn’t enough description and you can’t get money to do it!

          My point was: if you’re going to hypothesize, take some time to do a good job, think things through and figure out what variables you think are important, how they affect the system, etc. If you’re going to describe, then go out and do a good job of it, take multiple measurements through time, quantify a lot of relevant things, and don’t pretend that by quantifying things you’ve “explained” them and “discovered” things at p < 0.05 significance level.

          I agree with Andrew, you seem to be arguing with other people. Yes, I happen to think that model building is a critical part of science. That’s why I write a whole blog about issues in building models.

          http://models.street-artists.org/

          But I would never say that a person who thinks they’ve identified control variables that can cause outcomes shouldn’t perform experiments and describe the results. Quite the opposite. Yes I believe that our ultimate goal should be understanding mechanisms, but it certainly helps to know what happens in a wide variety of situations.

        • I actually think descriptive experimentation is really important. In talking with my wife about things (she’s a molecular biologist, works with mice) I often ask her something like “well, what does wild type do when you do X” and she often answers something like “no one has ever looked at that” and when I say something like “Geez, someone should really just harvest 50 wild type mice at different time points and measure X” she says things like “No one could ever get funding for that, you have to propose grants that are hypothesis driven”

          Yep, there is no interest if you can’t plug it into NHST. How many cells of type x per tissue, how many receptors of type y per cell, what is the range of ligand concentration, what is the turnover rate of these cells/receptors/ligands? This type of basic info (totally necessary for a quantitative model) will be found in random places if at all. It wasn’t always like this…

        • Having a hypothesis before you start collecting data and being committed to using NHST as an analytical framework are two totally different things.

    • Daniel:

      I think it’s fine to present speculations, if presented as such. I speculate a lot on this blog and I feel this can be a useful contribution.

      Glen:

      You ask why need theory if the phenomenon is directly manipulable. The usual reason is that you need the theory to tell you what to manipulate. Just flipping switches at random won’t get you much. OK, it can get you published in Psychological Science and PPNAS, but it’s not likely to advance your scientific understanding.

      • I have no problem with speculations either… I just don’t think they’re particularly “scientific” unless they actually have some scientific theoretical content. I’d rather have people say “if I do X it should cause people to alter their state of mind into a state Y which then makes them less capable of doing a host of related things Z,P,D,Q, for a period of time, but eventually after another period of time they will be back in original state A and respond to ZPDQ in a different way… and the level of stimulus required to transition an individual to state Y should be variable and depend potentially on some factors we’ve identified L,M,N”

        This is useful scientific speculation. As compared to “If I do X people are likely to be really bad at Q” which is more or less at the predictive level of gossip (ie. not zero, but not really all that scientific).

      • AG: Daniel:

        I think it’s fine to present speculations, if presented as such. I speculate a lot on this blog and I feel this can be a useful contribution.

        AG: Glen:

        You ask why need theory if the phenomenon is directly manipulable. The usual reason is that you need the theory to tell you what to manipulate.

        GS: Yes, but the implication is often that what is measured is not really what is of interest. This is especially true in much of psychology where what is measured is merely an “operational definition” of the “real” independent- or dependent-variables (e.g., some actual manipulation causes various levels of “fear” which is then responsible for alterations in some behavioral measure or measures. The often unobservable “stuff” is that about which theory is concerned. But phenomenological endeavors, as it were, generate theories that directly concern the measured variables – not some alleged (and they might be real – like atoms or receptors) things and events at some other level of analysis. Traditional thermodynamics and chemistry are sciences that don’t involve assumptions about, you know, atoms. So, taking an example from behavior analysis, once you know that consequences can modify the probability of behavior, and that rate of responding becomes stable under various arrangements (say, different schedules of food reinforcement) one can ask what happens when a short delay is interposed between a response and its consequence. To say that one must have had some kind of theory – and especially one concerning what’s “behind” the measurements – in order to ask experimental questions is…well let’s just say its generality is suspect vis-à-vis science as a whole. A great many questions may spring from earlier manipulations of the subject matter without having a theory in the sense you mean it – in the sense that is implied by the term “model.”

        AG: Just flipping switches at random won’t get you much.

        GS: Ahh…needless to say, I am suggesting your view perhaps reflects a somewhat simplistic and mechanical view of science. And here “mechanical” is a double-entendre. Your view is mechanistic in the clockwork universe sense and “mechanistic” in the sense that it follows automatically from your assumptions about how science is really supposed to work. So anything outside your circumscribed view of science is merely “switch flipping.” Perhaps you should have borrowed from Rutherford and said that stamp-collecting won’t get you much.

        AG: OK, it can get you published in Psychological Science and PPNAS, but it’s not likely to advance your scientific understanding.

        GS: Well…I have already commented on the possibility that your understanding of science is in need of a slight modification.

        • Glen:

          I have the impression that Daniel and I stumbled into the middle of an argument you’re having with someone else. I write, ” Just flipping switches at random won’t get you much,” and you write, “needless to say, I am suggesting your view perhaps reflects a somewhat simplistic and mechanical view of science.”

          This is just ridiculous. I’m saying that “flipping switches” won’t work, and you’re saying that science is not merely “switch flipping.” We’re in agreement! Daniel and I really aren’t the people you should be arguing with.

        • AG: Glen:

          I have the impression that Daniel and I stumbled into the middle of an argument you’re having with someone else. I write, ” Just flipping switches at random won’t get you much,” and you write, “needless to say, I am suggesting your view perhaps reflects a somewhat simplistic and mechanical view of science.”

          This is just ridiculous. I’m saying that “flipping switches” won’t work, and you’re saying that science is not merely “switch flipping.”

          GS: No. That is not what is being said. You are saying that anything other than the hypothetico-deductive model of science, replete with testing models, is useless “switch flipping.” I am saying that that is a misrepresentation of science. There is no question that the hypothetico-deductive method has been useful – often dramatically so – but it is only one way to do science and sometimes it is employed prematurely and does more harm than good as it has in mainstream psychology and, no doubt, elsewhere.

          AG: We’re in agreement! Daniel and I really aren’t the people you should be arguing with.

          GS: No…we are not in agreement. Whether or not I should be arguing with you is, I suppose, another issue.

        • Glen:

          You write, “You are saying that anything other than the hypothetico-deductive model of science, replete with testing models, is useless ‘switch flipping.'”

          No, that’s not what I’m saying! I never said anything like this!

          Again, you’re not arguing with me or Daniel, you’re arguing with someone else. I have no idea who this “someone else” is, but it seems to be someone who is “saying that anything other than the hypothetico-deductive model of science, replete with testing models, is useless ‘switch flipping.'”

          In all seriousness, I recommend you find these people who hold these views with which you disagree, and argue with them directly. I never said that which you attribute to me, nor do I believe it. This whole thing is ridiculous, and as a start I suggest you re-examine your strategy of attributing views to me which I do not hold and have never stated.

        • AG: Glen:

          You write, “You are saying that anything other than the hypothetico-deductive model of science, replete with testing models, is useless ‘switch flipping.’”

          No, that’s not what I’m saying! I never said anything like this!

          GS: OK then…how is “…you need the theory to tell you what to manipulate. Just flipping switches at random won’t get you much” inconsistent with my assessment couched in terms of adherence to the hypothetico-deductive method? Tell me, why do you assign such power to theory? So much so that you hold that the poor researcher who just wants to directly investigate the properties of a system through manipulation of the functionally-relevant variables would be helpless to come up with experiments. He or she just wouldn’t know what to manipulate! You are right about one thing, I have had this argument before and the ol’ “you have to have a theory in order to tell you what to manipulate” is a classic line – as you would expect if someone held that science is always a matter of testing hypotheses and, thus, the associated theories are assigned monumental importance. I would add that this view of theories is a narrow one. I am not arguing against “theory,” I am arguing against the necessity of certain kinds of theories – those that involve “models” – as a ubiquitous feature of science. I think in the context of my original post to Daniel, it is clear that the type of theorizing being called for was of the “model” variety. Here is what Daniel wrote:

          “I’d argue that what is needed here is proper theoretical work. Make a quantitative predictive model of how things might work, provide priors for the parameters, and suggest what kinds of experiments would produce data that would inform the model. In other words, go well beyond “gee we think probably people who are confronted with their shortcomings enter into a state where they are less capable or receptive to self-reflection and more defensive” and instead start thinking about predicting what people will do based on maybe some Markov model of being in one or another state, and then having some significant changes to that state occur when faced with some experience, and then gradually transitioning back to some baseline state. You can then talk about time-scales for these transitions, or degree of stimulus required to transition, or whatever… come up with some quantitative description of the theory that is at least capable of making some quantitative predictions given some parameter values whose specific values are unknown…”

          GS continues: My point was that maybe what is needed is a direct experimental investigation of relevant variables instead of an elaborate theory. Why guess when you can just determine something directly through experiment?

        • Glen:

          This discussion is all in the context of a paper published in Psychological Science called, “Caught Red-Minded: Evidence-Induced Denial of Mental Transgressions.” In my opinion that paper is a bunch of unfounded extrapolations taken from the predictably noisy data that resulted from a mindless set of “switch-flipping” experiments. The whole paper seems to me to have been based on a shot-in-the-dark approach to science, where the investigators combined weak theory, sloppy experimentation, and noisy data in the vague hope that something interesting would come up. I don’t think this is a good way of doing science—even if, if you spin this sort of thing well, it can get accepted in Psychological Science or PPNAS.

          The above paragraph expresses no allegiance to the hypothetico-deductive method or whatever. It just expresses a distrust for a certain approach to scientific inquiry which unfortunately has become all too common in psychology research.

        • Hmm…I guess I’ll post what follows here since the post by Andrew to which I wish to respond has no “reply to this comment” thingy to click. The actual post to which I am responding is contained in my post:

          AG: Glen:

          This discussion is all in the context of a paper published in Psychological Science called, “Caught Red-Minded: Evidence-Induced Denial of Mental Transgressions.”

          GS: Right. But my post was a response to what Daniel said about how to proceed in order to investigate the phenomenon (or “phenomenon” as the case may be) described in the paper. His immediate response was to delve into creating a complex-sounding model and testing it. The description would have been at home – except for the technical language – in the early chapters of a Psych 101 text…you know, the parts where they tell the student what science is and why psychology is a science. My point was merely that the phenomena described (however rich or faulty), if real, might be investigated directly by manipulation of relevant independent-variables. My post certainly wasn’t a defense of the paper. If I had to take their side as I see it after translating (but traduttore traditore) from ordinary language into the language of known behavioral processes, I would say that what they say is reasonable ordinary-language wise. Words of criticism, while potentially able to exert discriminative control over the listener’s behavior (like any rule…or a grocery list etc.) also functions as a punisher or a negative reinforcer. What is very likely, in that circumstance, is escape in possibly subtle forms (e.g., the person doesn’t “pay attention” to the “message” – they spend no time in the covert behavior that is called “thinking” etc.). So…perhaps one could directly manipulate the negatively-reinforcing aspects of the criticism as well as the “informational content” – or something like that – in individual subjects where, maybe, some kind of behavior change is the DV. Anyway…what I am proposing now (very loosely) stands in contrast to Daniel’s description of how to proceed (i.e., elaborate model creation etc.) if scientific pursuit of the phenomenon (or “phenomenon” as the case may be) is desired. Now, you could say that my statements about the role of negative reinforcement and (discriminative) control by the “information content” constitute a theory and that is true, but it is a theory and an approach of a far different type than Daniel’s. The whole theory is directly about quantities that enter into the description of the IVs and DVs arrived at largely through induction – not about some alleged world of hidden processes and states behind the data (which are, after all, only important insofar as they are symptoms of the “real phenomena”).

          AG: In my opinion that paper is a bunch of unfounded extrapolations taken from the predictably noisy data that resulted from a mindless set of “switch-flipping” experiments. The whole paper seems to me to have been based on a shot-in-the-dark approach to science, where the investigators combined weak theory, sloppy experimentation, and noisy data in the vague hope that something interesting would come up. I don’t think this is a good way of doing science—even if, if you spin this sort of thing well, it can get accepted in Psychological Science or PPNAS.

          The above paragraph expresses no allegiance to the hypothetico-deductive method or whatever. It just expresses a distrust for a certain approach to scientific inquiry which unfortunately has become all too common in psychology research.

          GS: But I wasn’t responding to the “above paragraph” in previous posts. BTW, I think you will have a hard time answering my question about how it is that your brief statements about the role of theory aren’t consistent with the stock hypothetico-deductive model of science. Things that you have said elsewhere on this blog also point in that direction. In any event, I am not even attacking deductive hypothesis testing and its importance in science (of course, in physics what is tested is the scientist’s theory…not the straw man null) where it is appropriate. The whole series of posts started with my contrasting one way of doing science with another. To return to the main point: I do not think that “theory,” in the sense (I believe) you mean it, is always necessary to suggest what variables should be manipulated, and that is where I started out. Now…I admit that I bristled a bit when you described having no theory (presumably of the type suggested by Daniel) with “flipping switches” – a clear reference to some mindless rote activity.

  2. Do faculty at Harvard University actually let their students run studies with such (supposedly planned) small sample sizes?
    Also, if you are going to hang your hand on the NHST framework you should probably used a one-tailed t-test for the replication study (study 3)that is based on the results from Study 2. Of course, then your supposed replication would really show a failure to replicate.

    • Marcus:

      You ask, “Do faculty at Harvard University actually let their students run studies with such (supposedly planned) small sample sizes?”

      Remember: Amy Cuddy taught at Harvard! And it was a couple of Harvard professors who released the statement that “the replication rate in psychology is quite high—indeed, it is statistically indistinguishable from 100%.”

      So, yeah, Harvard has some clueless people with fancy titles.

    • Well, we should be thankful for small mercies. They could have used a post hoc justification for a one-tailed t test to cut their p values of .053 (Study 2) and .055 (Study 3) in half to get them below .05. Instead, they just continued as if those were statistically significant. Presumably I missed the point where they stated they were using an alpha level of .10.

    • To be fair and critical at the same time. It’s often not feasible for students to run a proper number of subjects in a study as a pedagogical exercise due to time constraints. So this type of “researcher” happens all the time in that context. Once awhile something interesting pops up < .05 and voila… PS!

  3. This study has a serious logical problem. (I am focusing on Study 1; I haven’t had a chance to read Study 2 yet.)

    Subjects were shown mug shots and asked to indicate each one’s estimated age range (under 30 or over 30) with a joystick. They were then read a statement that “One of the things we are interested in is whether people are influenced by race when perceiving threat” and some subsequent explanation.

    From there, they were divided into an “evidence” group and a “no evidence” group. The subjects in the “evidence” group were each read a statement that, during the first part of the experiment, the researchers had measured their galvanic skin response (via the joystick) and had used this information to assess whether they were more threatened by African American faces than by Caucasian faces. The “no evidence” subjects were not read this statement. (The statement was false; the joystick did not actually measure galvanic skin response.)

    Next, they were given an activity that supposedly measured their “aesthetic preferences”–but was really meant to serve as reflection time. The researchers assumed that the subjects would take this time to reflect on their bias (or not).

    They were then asked, “To what degree did you feel more threatened by the African-American mug shots than by the Caucasian mug shots?” plus 13 additional questions.

    The researchers found that those in the “evidence” group were less likely to report feeling threatened than those in the “no evidence” group. The researchers conclude that “Telling participants that we had evidence of their racial bias decreased rather than increased the likelihood that they would acknowledge that bias themselves.”

    But, but, but…. There is a big discrepancy here between experiment and conclusion!

    If I am participating in a study, and then, halfway through the study, I am informed that the researchers have been measuring something I didn’t realize they were measuring–such as galvanic skin response–I would be angry. I would feel that they hadn’t dealt with me honestly. This would probably distract me for the remainder of the study and influence my responses.

    The researchers did not in fact *present* the subjects with any evidence; they just said, in that second statement, that they had been taking measurements. There is a big difference between the two. The subjects may have been reacting to the idea that their galvanic skin response had being measured without their knowledge. They certainly weren’t reacting to “evidence” of their bias, since no evidence was offered up.

    So the researchers’ experiment did not even match up with their hypothesis and conclusions. The last sentence of the abstract, “These results suggest that under some circumstances, confronting people with public evidence of their private shortcomings can be counterproductive,” is not in keeping with the substance of the experiment.

    • Even the data don’t support their conclusions. There are several questions in their questionnaire that measure the same or similar outcomes, and almost none of them show a significant difference. The authors did a handful of uncorrected t-tests, picked the single (barely) significant result, and then claim that they had predicted it all along. Later on, the third study actually fails to replicate study 2, but the authors claim that it does anyway. A terrible paper all around.

  4. Would have been funnier if I could have found the link?

    “Obviously this Dan is one of the greats (these are the Dans I know I know, these are the Dans I know #kidsinthehall #davesiknow )”

  5. Pre-registered hypothesis:

    Most experiments described in a paper that starts with an excerpt of a poem, or novel, will fail to replicate.

    What’s the point of doing this by the way? I always wondered about that> It seems to me to be mostly social psychological researchers/papers that feel the need to include some excerpt of something. Does it make the paper look all fancy and intellectual or something? I don’t get it.

  6. Harvard may be the university of 100% replication, but they are not 100% accurate.

    You will be happy to know that this paper contains several typos/mathematical errors.

    Look at this quote:
    “This left 74 participants in the data set (38 who identified
    as White, 20 who identified as Asian American, 8 who identi-
    fied as Hispanic, 6 who identified as Other, and 1 who preferred
    not to indicate a race”

    Umm, those numbers only add up to 73.

    Also, all three tables contain a couple granularity errors, Table 1 is straight up just missing a couple statistics (that copyediting value), and the degrees in Table 2 should be F(3, 73) instead of F(1, 73).

    I’m too busy with pizzagate (I got invited to give a talk) to look into more of Gilbert’s work, but it would be interesting to see if the statistical inconsistencies I noticed in this paper are just an anomaly or are part of a larger pattern.

Leave a Reply to Daniel Lakeland Cancel reply

Your email address will not be published. Required fields are marked *