Powerpose update

I contacted Anna Dreber, one of the authors of the paper that failed to replicate power pose, and asked her about a particular question that came up regarding their replication study. One of the authors of the original power pose study wrote that the replication “varied methodologically in about a dozen ways — some of which were enormous, such as having people hold the poses for 6 instead of 2 minutes, which is very uncomfortable.” As commenter Phil put it, “It does seem kind of ridiculous to have people hold any pose other than ‘lounging on the couch’ for six minutes.”

In response, Dreber wrote:

We discuss this in the paper and this is what we say in the supplementary material:

A referee also pointed out that the prolonged posing time could cause participants to be uncomfortable, and this may counteract the effect of power posing. We therefore reanalyzed our data using responses to a post-experiment questionnaire completed by 159 participants. The questionnaire asked participants to rate the degree of comfort they experienced while holding the positions on a four-point scale from “not at all” (1) to “very” (4) comfortable. The responses tended toward the middle of the scale and did not differ by High- or Low-power condition (average responses were 2.38 for the participants in the Low-power condition and 2.35 for the participants in the High-power condition; mean difference = -0.025, CI(-0.272, 0.221); t(159) = -0.204, p = 0.839; Cohen’s d = -0.032). We reran our main analysis, excluding those participants who
were “not at all” comfortable (1) and also excluding those who were “not at all” (1) or “somewhat” comfortable (2). Neither sample restriction changes the results in a substantive way (Excluding participants who reported a score of 1 gives Risk (Gain): Mean difference = -.033, CI (-.100,
0.034); t(136) = -0.973, p = 0.333; Cohen’s d = -0.166; Testosterone Change: Mean difference = -4.728, CI(-11.229, 1.773); t(134) = -1.438, p = .153; Cohen’s d = -0.247; Cortisol: Mean difference = -0.024, CI (-.088, 0.040); t(134) = -0.737, p = 0.463; Cohen’s d = -0.126. Excluding participants who reported a score of 1 or 2 gives Risk (Gain): Mean difference = -.105, CI (-0.332, 0.122); t(68) = -0.922, p = 0.360; Cohen’s d = -0.222; Testosterone Change: Mean difference = -5.503, CI(-16.536, 5.530); t(66) = -0.996, p = .323; Cohen’s d = -0.243; Cortisol: Mean difference = -0.045, CI (-0.144, 0.053); t(66) = -0.921, p = 0.360; Cohen’s d = -0.225). Thus, including only those participants who report having been “quite comfortable” (3), or “very comfortable” (4) does not change our results.

Also, each of the two positions was held for 3 min each (so not one for 6 min).

So, yes, the two studies differed, but there’s no particular reason to believe that the 1-minute intervention would have a larger effect than the 3-minute intervention. Indeed, we’d typically think a longer treatment would have a larger effect.

Again, remember the time-reversal heuristic: Ranehill et al. did a large controlled study and found no effect of pose on hormones. Carney et al. did a small uncontrolled study and found a statistically significant comparison. This is not evidence in favor of the hypothesis that Carney et al. found something real; rather, it’s evidence consistent with zero effects.

Dreber added:

In our study, we actually wanted to see whether power posing “worked” – we thought that if we find effects, we can figure out some other fun studies related to this, so in that sense we were not out “to get” Carney et al. That is, we did not do any modifications in the setup that we thought would kill the original result.

Indeed, lots of people seem to miss this point, that if you really care about a topic, you’d want to replicate it and remove all doubt. When a researcher expresses the idea that replication, data sharing, etc., is some sort of attack, I think that betrays an attitude or a fear that the underlying effect really isn’t there. If it were there, you’d want to see it replicated over and over. A strong anvil need not fear the hammer. And it’s the insecure researchers who feel the need for bravado such as “the replication rate in psychology is quite high—indeed, it is statistically indistinguishable from 100%.”

P.S. I wrote the above post close to a year ago, well before the recent fuss over replication trolls or whatever it was that we were called. In the meantime, Tom Bartlett wrote a long news article about the whole power pose story, so you can go there for background if all this stuff is new to you.

92 thoughts on “Powerpose update

  1. Actually what i would do is recruit amy cuddy to rerun the replication using her expert knowledgebon how best to do the study. In fact, it is odd that she has not done this yet. If someone told me that my career making result (i don’t have one but suppose i did) did not replicate the absolute first thing i would do is try to replicate it.

  2. Just for clarification: did you really mean “remove all doubt”? I doubt that a successful replication, or even a series of replications, removes all doubt. Maybe it removes a particular kind of doubt, in providing evidence that the original results were not coincidental or contingent on particular circumstances.

    In any case, yes, it’s important to view replications as supportive rather than antagonistic. As you said, if you care about a topic, you want to know whether the findings are real.

  3. Sadly, many people continue to assume that power pose effects are real and grounded in science. There is even a “Victory Alarm Clock” based on the power pose:

    https://www.fastcodesign.com/3048784/can-appliances-be-designed-to-make-you-happy

    “Wiles came up with the designs by researching the work of psychologists who study how physical actions chemically alter the body, like that of Harvard social psychologist Amy Cuddy. Cuddy’s assertion that a technique called “power posing” can boost confidence by raising testosterone levels and decreasing cortisol (the stress hormone) provided the basis for the Victory Alarm Clock. The clock only stops beeping once you’re out of bed and raising your hands above your head like an Olympic gymnast who just stuck a landing.”

    • P.S. The article about the “Victory Alarm Clock” appeared in July 2015, only a few months after the failed replication by Ranehill et al. (It’s possible that the news had not yet spread.) But the inventor still displays the product on his website (http://www.tedwiles.com/Involuntary-Pleasures) with the following description:

      “To turn off the alarm the user must hold the two actuators and adopt a victory pose for two minutes. Enclosed within the hand-held actuators, accelerometers send their position via bluetooth back to the base, causing the alarm to stop after two minutes of use. This posture increases levels of Testosterone and reduce levels of Cortisol, making the user feel more confident and less stressed.”

      I imagine that this is just one of hundreds of products and services based on the power pose. Oh, wait, here’s another example: suggestions for the classroom, posted on the Scholastic website!

      http://choices.scholastic.com/blog/power-posing-fake-it-til-you-make-it

      • Yes very sad given this claim of theirs “Parents subscribe to our magazine to make sure that their kids are getting high-quality, deeply-researched information about the most important topics teens face.”

        • That blog says:

          “My son has really taken to power posing. It’s part of his life skills toolkit now. Power posing harnesses your body to change your mindset.

          Students can power pose for a few minutes before any occasion and find the strength to rise to any challenge: an exam, a track meet, or the college interview. Teachers and parents can use it too. Who doesn’t need a little boost of can-do? It can be as easy as taking a few minutes to stand tall.”

          Instead, he could download the data and show his son how the linear model coefficient estimates flip around wildly if you remove one extreme value. Now that would be a life skills toolkit to acquire. Why didn’t anyone teach this to me when I was 10?

        • you guys are confusing average treatment effects with unit treatmemt effects. are you willing to claim power poses never work for anyone? they work for me.

        • Jack:

          Nobody is claiming power poses never work for anyone. The point is that nobody’s offered good evidence that power pose has the consistent effects claimed by Cuddy. Cuddy and her collaborators have published papers claiming to provide such evidence, but then it’s been pointed out that these papers have statistical problems, and when others have tried to replicate the results, they’ve failed in various ways. No consistent effects.

          When Cuddy’s former collaborator, Dana Carney, writes, “I do not believe that ‘power pose’ effects are real,” I don’t think she’s saying that power pose can never work for anyone. I think she’s saying that she doesn’t believe in the consistent effects that had been claimed.

          To say that something could work for some people in some settings . . . that’s just not enough. I’m sure that the simple expedient of taking a deep breath and counting backward from 100 by sevens will be helpful in many settings too, to calm people down and get them focused. One thing that made power pose special is the claim that it had large and consistent effects, and that it was backed by scientific evidence.

        • My correspondent doesn’t want to be quoted, unfortunately. I wrote a bit in my email about the question of “is there any harm?” (in advocating for power poses). That’s discussed in this comment thread, but I think there are more (and more important) things to be said. I’ll elaborate on that, rather than these specific emails, at some point in my blog, but it will have to wait until grant writing, class prep, and a bunch of other things get pushed out of the queue!

        • Blog post here, though I ended up writing less about the topic of the harm caused by power pose claims, and more about science communicators, even good ones, latching onto hype, with a particular example related to my own connections with the science communication world.

      • jack – I don’t mean to be dismissive of your experiences, but this is the sort of anecdotal evidence that leads me to ask the question, what do you mean, “[power poses] work for me even when i thin[k] they shouldn’t work”? I’d be very interested in hearing that.

        It does seem that the subjective “feelings of power” works fairly consistently (well, I say “fairly” and “consistently”), but that none of the more physiological impacts replicate reliably. Do you “feel” better when you do power posing? How does it affect you behavior?
        If you’re committed to the idea that power posing “works”, then what does it mean to say “even when i thin[k] they shouldn’t”?

        Again, I agree with Andrew – no one is saying that striking a power pose never, ever works for anyone, under any conditions. We’re simply disputing that the effect is “there” in the sense that Cuddy (and, to be fair, most psychologists, and I am a member of that guild) – that it is large, consistent, and reliable. My questions above are, even if they seem critical or dismissive, real, in the sense that I am legitimately curious about your experience of power posing. Like, I just struck one for the sake of it, and I felt far too ridiculous and self-conscious (even alone in my office!) to feel any more powerful than usual.

        Further, and this bit is definitely snark. I know power poses are real – I’ve seen pictures of them!

  4. 1) Here is a link to an article in which the original authors (Cuddy et al.) actually use a 6 minute power pose condition themselves:

    http://faculty.haas.berkeley.edu/dana_carney/pp_performance.pdf

    “Participants maintained the poses for a total of five to six minutes while preparing for the job interview speech”

    Apparently not a problem there.

    2) And here is another very recent failed published replication (pre-registered!):

    http://www.tandfonline.com/doi/full/10.1080/23743603.2016.1248081

  5. Andrew said, “Indeed, lots of people seem to miss this point, that if you really care about a topic, you’d want to replicate it and remove all doubt. When a researcher expresses the idea that replication, data sharing, etc., is some sort of attack, I think that betrays an attitude or a fear that the underlying effect really isn’t there. If it were there, you’d want to see it replicated over and over.”

    and Shravan said, “In fact, it is odd that she [Amy Cuddy] has not done this [run a replication study] yet. If someone told me that my career making result (i don’t have one but suppose i did) did not replicate the absolute first thing i would do is try to replicate it.

    Sometimes I wonder if I’m from a different planet. Saying things like, “If you really care about X, then you’d want to do Y,” or “It is odd that she didn’t do what I’d do,” seems to ignore what to me is a fact of life: that different people think, want, care, do, etc. differently. In fact, these are just instances of between-subject variability.

    If I may offer a couple of other possibilities for why Cuddy has not done a replication: One is, “This is the way we’ve always done it, so why should I go against tradition?”. Another (based on what I’ve heard reported) is “I don’t care about doing more research; I love giving TED and other talks promoting power pose.”

    • Martha:

      It seems to me that the actions of Bem, Cuddy, etc., are consistent with these researchers having a sort of bifurcated view of the phenomena they’re studying: On one hand, they’re true believers, willing to bet their reputations on their controversial claims. On the other hand, I suspect that at some level they “know” that their findings are fragile or unreplicable or whatever you want to call it. So they don’t want to put their theories to the test.

      As for “loving to give Ted talks” etc.: Sure, I love giving talks too. But I’d feel a bit weird giving a talk promoting an idea that doesn’t really help people!

      • “It seems to me that the actions of Bem, Cuddy, etc., are consistent with these researchers having a sort of bifurcated view of the phenomena they’re studying:”

        Actions can be consistent with a variety of views of them.

        “I love giving talks too. But I’d feel a bit weird giving a talk promoting an idea that doesn’t really help people!”

        Just because one person would feel weird doing something doesn’t mean that another person would feel the same way. People are all different. Cuddy might believe that power pose does help people; as I recall, you’ve said it might help some but not others — so possibly she agrees with that and consequently sees no problem if it does help some.

        • Martha:

          Sure, but now we’re going in circles. You write, “Cuddy might believe that power pose does help people”—but presumably she’d want it to help people on net: it’s not such good advice if it helps 100 people a little and hurts 200 people a lot! So this returns us to my earlier point that if she wants to help people, and if she thinks it really does help, I’d think she’d want to do some definitive replications. On the other hand, if she wants to help people, and she wants to keep giving these talks, and she kinda thinks power pose should help on net, then it makes sense that she won’t want to examine the evidence too closely.

        • Jack:

          Power pose is being compared to other poses. If you compare power pose to various alternatives (for example, the “crouching panther” pose or simply sitting still and meditating for a couple of minutes), I’d expect different poses to be more effective for different people in different settings, if they have any effect at all.

        • (The belief in) power poses can hurt someone i would reason.

          Perhaps a person doing power poses could spend their time more wisely in order to achieve whatever it is they want to achieve via power posing.

          Perhaps the people who bought Cuddy’s book could have spend their money more wisely as well.

          I think psychologists in general should pay much more attention and thought to the possible negative consequences of the research they put out there.

        • They could be seen power-posing by someone who has followed the failed replication saga and be judged negatively for it…

        • I’d say the most serious harmful effect of power pose is if it does work to make someone feel powerful, and that feeling prompts them to act in ill-thought-out ways that cause harm — e.g., an overconfident surgeon or military commander can do harm.

        • Isn’t there a wisecrack to the effect that it isn’t as important for a naval commander to be confident about what’s the right course of action in a tricky situation but more important to *appear to be confident* about what’s the right course of action?

        • Jack: Placebo effect, huh. The last time I stood with arms akimbo and feet spread (just like Wonder Woman), someone told me that I was blocking the aisle. :-)

        • “presumably she’d want it to help people on net” … “I’d think she’d want”

          You are making assumptions about her, without giving any evidence to back up these assumption. That seems unreasonable to me.

        • Martha:

          Fair enough. But you wrote, “Cuddy might believe that power pose does help people . . . so possibly she agrees with that and consequently sees no problem if it does help some.” Both of us are speculating here.

        • Yes, we have both been speculating, but there are what to me are important differences in the type of speculation. Consider your statements that I have objected to:

          “Indeed, lots of people seem to miss this point, that if you really care about a topic, you’d want to replicate it and remove all doubt. When a researcher expresses the idea that replication, data sharing, etc., is some sort of attack, I think that betrays an attitude or a fear that the underlying effect really isn’t there. If it were there, you’d want to see it replicated over and over.”

          “presumably she’d want it to help people on net”

          “I’d think she’d want …”

          Compare and contrast with what I have said:

          “If I may offer a couple of other possibilities …”

          “Cuddy might believe that power pose does help people…”

          “…so possibly she … ”

          The things I objected to in what you said had no “could”‘s or “might”‘s that indicated you were just considering possibilities. They sound like they are expressing beliefs or statements of facts, not just possibilities. I have not said anything that indicates that I believe any of the possibilities that I have put forth. That is what I was objecting to.

        • P.S. I will say that I think it can be helpful to speculate in this way. But you’re right that we should be aware of our ignorance in such settings. All we’re doing is exploring how the behavior of Cuddy, say, looks from our own perspectives.

          The flip side is that someone like Amy Cuddy or David Brooks might be genuinely puzzled as to why I am so ready to admit my own mistakes. Perhaps they would see this as a sign of weakness or a lack of commitment to my beliefs or an underlying frivolity on my part!

          I once had a colleague who didn’t seem to believe that I didn’t want to be department chair. I’d told him many times that I didn’t want to be chair, I’d more than once declined the opportunity to be chair, but he still seemed to think that I was scheming for the position. We can make all sorts of mistakes when trying to draw inferences from people’s behaviors.

        • Your PS is interesting — I was thinking of the problem as more, “You seem to have a prior with all probability at one point, and I don’t think that’s good science, so I’m suggesting other points that are plausible, so need to have some non-zero probability in your prior.”

          But your examples of others doing what you were doing are good ones that help make the point that we need to be careful not to project our thinking, values, assumptions, etc. onto other people.

      • >’I see a lot of “since I’d respond this way, I’m astonished others don’t respond the SAME way”’

        Are there any machine learning strategies that work by making a series of A/B comparisons, then performing “statistical significance” evaluations? To make it more realistic, the algorithm would also drop the results that are not “significant” from it’s history.

        I would really like to quantify how well this approach can predict something or lead to insight under various idealized conditions/assumptions. I can be convinced by evidence, but it doesn’t seem to be forthcoming from the NHST camp.

      • Perhaps it’s a bit more than that. It’s not like we’re talking about how a researcher reacts to “subjective” things like their favorite colour, or candy, or something like that. In those cases, i think it’s very logical, and valid, to *not* be astonished at people’s answers, (re-)actions, or perspectives.

        However, since we’re talking about science here, which i assume has certain shared “objective” values, rules, whatever, perhaps one *can* be logically, and validly astonished by people’s answers, (re-)actions, or perspectives concerning those “objective” values, rules, etc.

        For instance, and based on the above reasoning, i think it’s fair to assume that it is important in science that findings are replicated. From this assumed shared “value” one can speculate that a scientist values replications of their findings. Consequently, it can be seen as astonishing if a scientist does not feel this way.

    • Martha, I think you are really hitting on a kind of fallacy in argumentation which I see all the time that is a kind of “this is what I’d like to think I’d feel” projection along the lines of “no true Scotsman” but not exactly the same. Or really, this is how I think I trap someone with the logical consequences of an unsupported hypothetical.
      It (I’m certain unintentionally) comes across as somewhat patronizing which is why we’re probably both responding to it in the same way. I think it is very easy to criticize this work without resorting to this kind of telling another researcher how she should be thinking. Or just say what you think directly “I believe that if a researcher has results they believe in, they should favor there being many attempts to replicate it.”

      • Nor does Cuddy set a good example of discourse here; she responded not to what he said, but to her own interpretation: that he’s “worried” about what SNL thinks of him. (Her interpretation may be correct–who knows–but it’s heavy on the inferences.)

    • I hereby vow to never again poke fun at my colleague and compatriot in Yellowism, Amy Cuddy. If I had known that her research was driven by Yellow, I would never have allowed you people to ridicule her for so long. Amy Cuddy is a Yellowist icon.

      “Live life with my arms reached out. Eye to eye when speaking. Bright as yellow. Warm as yellow.” – Amy Cuddy’s Twitter Bio*

      “In Yellowism, all the possible interpretations are reduced to one – are equalized, flattened to yellow. Interpreting Yellowism as art and being about something other than just yellow deprives Yellowism of its only purpose.” – This is Yellowism**

      Yellow.

      * https://twitter.com/amyjccuddy
      ** http://www.thisisyellowism.com

      • Alex:

        Ooooh! A Mobius strip. I love it! Trump thanks Cuddy, then Cuddy has to spend the rest of her life torn between the two options of (a) accepting responsibility for Trump becoming president, or (b) agreeing with her former collaborator Carney that power pose doesn’t work.

        Yours is the comment of the year.

        • how the hell would she be responsible? even if power poses work she would never be responsible for that. the only people responsible for trump’s election are the voters who did not vote for hillary.

        • Wow, the COTY award – and it’s not even February! Now I’m glad I did this 6-minute double-dose-powerpose session before posting!

    • Keith:

      Interesting. The author of the article you linked to is currently working at the National Science Foundation. I just gave a talk on this stuff last month at the NSF; I wonder if he was in the audience.

      • Possibly – very likely if you mentioned Gregor Mendel’s suspicious data in your talk (as Nusbaun referenced your blog as the source for that.)

        I recall exchanging some emails with Gary King in 2005/6 discussing how almost no one was being successful at getting funding for reproducible/replicatable research projects. Now their are “Dear Colleague” letters earnestly requesting such.

        Here Nusbaun is outlining a strategy/opportunity to march in behind an army that has been marching in on them – “hey guys were the experts on the Science of Humans and that will be crucial in managing the communal activity that science must be to actually be science.”

        I think he has a good point and the added bonus might be the replication deniers in the field will be seen as blocking opportunity rather than preserving it.

    • What i find much more interesting is that i can’t see a comment section on any of the new APS Observer articles.

      Could it be that the APS has decided to not include a comment section for their new Observer articles after two recent “remarkable” Observer pieces with very critical comments posted in the comment sections?

      http://www.psychologicalscience.org/observer/preregistration-replication-and-nonexperimental-studies

      http://www.psychologicalscience.org/observer/a-call-to-change-sciences-culture-of-shaming

      • Anon:

        At your second link, Susan Fiske writes:

        One development in parallel with this column is an independent online statement that people can sign to express concern: “Promoting open, critical, civil, and inclusive scientific discourse in Psychology,” which can be found here. Thanks to those who express support of mutually respectful discussions of our science.

        Kind of amazing to see such a statement coming from someone who never apologized for characterizing people who disagreed with her as “terrorists.” That’s about as disrespectful as you can get!

      • The comments section has this sentence from yet another social psychologist (Schimmack):

        “We need to learn that a single original study with a small sample and a significant result of p < .05 only tells us that it is likely that the population effect size is not zero and in the direction of the effect in the small sample of the original study."

        As Frank Harrell says in his online book, "No!"

        The fumbling and floundering of social psychologists is pretty strange to see. What's stopping them from taking some courses in statistics? It's not that hard.

        • Schimmack is coming at it this from a fairly savvy, albeit frequentist-null-hypothesis-p-values-loving position. And even then, I wouldn’t say his original statement is wrong on statistical grounds given he’s being rather guarded as he says we only know that the population effect size is unlikely to be zero. With a single small study we don’t know much about the direction or the magnitude, which I think is captured although not directly expressed in his statement. (It could be wrong on the garden-of-forking-paths meta-statistical grounds).

          His site https://replicationindex.wordpress.com/about/
          His R-index site http://www.r-index.org/

        • I think that the problem here is that many of Schimmack’s definitions and statements (like the one above) are simply wrong. You say that “as he says we only know that the population effect size is unlikely to be zero”. We don’t *know* this; it’s this false sense of certainty that leads to a lot of confusion when results come out some other way in subsequent studies. You also say “With a single small study we don’t know much about the direction or the magnitude, which I think is captured although not directly expressed in his statement.” How is it captured in the sentence “it is likely that the population effect size is … in the direction of the effect in the small sample of the original study”?

          Anyway, I don’t want to quibble about what he says, that’s not my point. What these self-styled experts are doing is that they are disseminating incorrect explanations (again), and it has a fake-news quality to it in that somehow ends up becoming a fact. For example, someone credits R-Index for the new truths they learnt in a textbook they are writing:

          Paragraph in my upcoming introductory #statistics textbook that I wouldn't have written if I hadn't followed @R__INDEX on Twitter. pic.twitter.com/Bup9qcFzJg— Russell Warne (@Russwarne) November 22, 2016

          You can read my exchange with R-Index about all the above quote here:
          http://vasishth-statistics.blogspot.com/2016/11/statistics-textbooks-written-by-non.html

          So in my opinion, one consequence of these incorrect statements is that they lead to the very problems that brought on the replicability crisis.

          The other interesting trend I observe is that whenever a social psychologist with a half-baked understanding of statistics stands up and says “I’m an expert and I will tell you how all this works” they usually take an aggressively anti-Bayesian stance.

          Most of the statisticians I studied with had a very sane and sensible attitude: they used both methods judiciously as needed. Never once in my MSc did I see a professor going into frequentist- or Bayesian-bashing mode when teaching the methods (maybe privately they have strong opinions, I don’t know).

          It’s only these self-anointed experts with a sometimes comical understanding of the math (e.g., not knowing that the area under a curve at a point in a continuous distribution is 0) that get into aggressive fights bashing Bayes. So in this respect their contribution to the issues facing soc psych are negative. I must admit that they also make positive contributions by demanding things like higher power and replicability; I’m with them there. But I wish they’d just stop at that point and not go beyond their ability and knowledge level and disseminate incorrect definitions and explanations.

          BTW, I have made these same mistakes in the past both as a user of statistics and when trying to teach it, and am trying to improve. Wrong definitions, incomplete understanding and incorrect inferences. I am working on doing better and learning from my mistakes. But I don’t see these guys ever try to correct course. Once they’ve taken a position, that’s it. The typical professor/scientist mindset: I got it right the first time and I will never reconsider my position.

        • And see this paper, which suggests that soc psych*ists are not that much into replication:

          https://share.osf.io/preprint/4603C-EB2-872

          “Results show that tenured social psychologists are more likely to defend the status quo, and to perceive more costs (relative to benefits) to proposed initiatives such as direct replications and pre-registration, than are those who are untenured.”

        • So that raises the question of which possibility is most likely:

          a. The non-tenured ones who favor replications don’t get tenure — not good

          b. Gradually the situation will get better, as younger people get promoted and influence their students in favor of replications and possibly other reforms.

        • Shravan said: “What’s stopping them from taking some courses in statistics? It’s not that hard.”

          Hard is relative. I only once had a social psychology student in a graduate statistics course I taught numerous times — he was the worst student I ever had in the course (which usually had half or more of its enrollment from biology, engineering, business, and education). There are exceptions, but my impression is that the scientific standards in most social psychology departments are very low.

        • As someone who has PhD in Psychology (Social to be precise), I’m sympathetic to the reasons for the poor reputation that psychology and social psychology especially enjoys on this blog and on the blog-o-sphere generally. However, I would like to point out though that some of us are fortunate enough to be trained very, very well. And that the field is trying to take it’s deficiencies seriously. See for example: https://www.ncbi.nlm.nih.gov/pubmed/18193979

          In a survey of all PhD programs in psychology in the United States and Canada, the authors documented the quantitative methodology curriculum (statistics, measurement, and research design) to examine the extent to which innovations in quantitative methodology have diffused into the training of PhDs in psychology. In all, 201 psychology PhD programs (86%) participated. This survey replicated and extended a previous survey (L. S. Aiken, S. G. West, L. B. Sechrest, & R. R. Reno, 1990), permitting examination of curriculum development. Most training supported laboratory and not field research. The median of 1.6 years of training in statistics and measurement was mainly devoted to the modally 1-year introductory statistics course, leaving little room for advanced study. Curricular enhancements were noted in statistics and to a minor degree in measurement. Additional coverage of both fundamental and innovative quantitative methodology is needed. The research design curriculum has largely stagnated, a cause for great concern. Elite programs showed no overall advantage in quantitative training. Forces that support curricular innovation are characterized. Human capital challenges to quantitative training, including recruiting and supporting young quantitative faculty, are discussed. Steps must be taken to bring innovations in quantitative methodology into the curriculum of PhD programs in psychology.

          Aiken, L. S., West, S. G., & Millsap, R. E. (2008). Doctoral training in statistics, measurement, and methodology in psychology: replication and extension of Aiken, West, Sechrest, and Reno’s (1990) survey of PhD programs in North America. The American psychologist, 63(1), 32.

        • >”The median of 1.6 years of training in statistics and measurement was mainly devoted to the modally 1-year introductory statistics course, leaving little room for advanced study.”

          I still doubt things are being understood from that. It isn’t that the intro material isn’t advanced enough. It is just plain wrong and acts as a stepping stone in the opposite direction. It is really similar to how they use a zero-dimensional model to explain the greenhouse effect, then go in the totally wrong direction by adding layers to that until a dead-end is reached. Next, you need to jump orthogonal to looking at thousands of lines of code taking into account latitude/rotation/etc to understand what is *actually* used.

          After writing the above, I quickly checked the paper and sure enough no mention of Meehl:
          http://www.fisme.science.uu.nl/staff/christianb/downloads/meehl1967.pdf

        • I agree that the modal sequence of ANOVA and then Regression is insufficient for all of the areas that psychologists want to apply their theories to. Indeed, the big thing about the PNAS Hurricane paper was the first study, which was an archival one. The remaining (5 or 6 I can’t remember) studies focused on things that are readily analyzed with ANOVA. All of the action on that paper was in the psychologists badly overextending their argument in area outside of their typical focus on experimental design and research.

          It is more reasonable now I think to have a graduate stats education for psychologists that integrates the p-hacking/HARKING/etc literature along with an emphasis on measurement, diagnostics, and criticism. A training that introduces them to things like the Garden of Forking paths and how to reason about statistical evidence in a more sophisticated and nuanced way would still reasonably fit into a graduate sequence.

          As for checking the paper for Meehl, I’m not certain why that is relevant. Don’t get me wrong, I’m a fan of Meehl but I’m not sure the heuristic value of your search. A quick search of Andrew’s Garden of Forking paths paper and sure enough Meehl isn’t cited there. A quick search of Andrew’s blog and it appears that Meehl isn’t mentioned on the blog prior to “All Meehl, all the time” in December 2009 which is easily a year after the review article’s publication date.

        • AnonAnon,

          Thanks for the evidence that (as I assumed) there are indeed exceptions and that some in the field are trying to improve the situation. However, some concerns I have on reading your post:

          1. The survey you link to and quote the abstract from is dated 2008. I wonder if the impetus to improve has continued in the nine years since then.

          2. The abstract mentions “innovations in quantitative methodology” and “curricular innovation”. Unfortunately, not all innovations are improvements, so the focus on “innovation” could result in change that is not an improvement.

          3. I also wonder if there is a need for attention to preparation for quantitative reasoning readiness/willingness in admitting students to Ph.D programs preparing students for research in psychology.

        • Martha,

          1. I’m not certain if Aiken or West have any plans to follow up with the survey. I think Steve was recently and (begrudgingly) the head of the APA’s Quantitative Methods area and so he’s had some influence there. Leona will probably be focused more on the quantitative student’s training at Arizona State since Roger Millsap’s sudden and tragic death. Beyond that, they’ll probably be focusing on revisions of the Cohen, Cohen, West and Aiken textbook and their mythical retirement.

          2. I suspect what they mean by innovations are things like full semester courses dedicated to psychometrics, missing data, and multilevel modeling. And barring that, the integration of that into the basic graduate course work.

          3. Preparing undergraduates who are interested in psychology is a hard problem. Arizona State (largely because of the quantitative methods area there) had two a stat course requirement for undergraduates seeking a BS focusing on basic descriptive and inferential statistics and extending to ANOVA and Regression.

          That would probably be a hard sell at other psychology departments who don’t have 1) sufficient numbers of dedicated quantitative faculty and 2) sufficient numbers of quantitatively minded graduate students to TA those courses.

        • AnonAnon,

          Thanks for the reply.

          Re item (3): There is no reason that an undergraduate quantitative requirement for entering psychology Ph.D. students needs to be courses in a psychology department — in many cases, courses from a statistics department would be good or (in some cases) even better. In fact, courses from a statistics department would (in some, perhaps many, perhaps most) cases be better for weeding out students with an aversion to quantitative work. But I’m not thinking about just standard intro-stats-for-non-majors — an upper division course or two would be better.

          BTW, my university’s BS in psychology requires (or did — I’m retired and haven’t checked lately) the intro stats from the math department, rather than the one taught in psychology. (Although my understanding is that at least partly, this was to make a psych BS a possible major for pre-med students.)

        • @Martha:

          More rigorous / non-optional math requirement for Psych undergrads is an interesting situation.

          Anecdotally, quite a few undergrads choose a Psych major expressly *because* they dislike math & math-ey subjects.

          Alienating that “customer segment” would be a brave choice for Psych Departments. I would be curious to see any try it out.

        • Martha,

          Re#3 as a member of the “lost generation” who skipped academia in large part because of the methodological turmoil, I have pretty strong feelings about how stats and methods should be taught. While I think there good pragmatic reasons for your suggestion, I think ultimately stats and research methods should be taught together to psychology students. Doing so would enable students to learn about measurement, design, and analysis in concert with each other.

        • >”As for checking the paper for Meehl, I’m not certain why that is relevant.”

          Because the intro courses teach you to use p-values to reject a default strawman nil-null hypothesis, then build on that idea. It can only lead to a dead end of affirming the consequent and transposing the conditional, no matter how “advanced” your statistical methods.

        • First, I had a modal introductory stats sequence in graduate school and off the top of my head I remember learning about multiple imputation, Full Information Maximum Likelihood, and Full Bayesian methods for dealing with missing data and nary a mention of p-values.

          Second, I think the particle physicist I knew in grad school (and others like Mayo) would disagree about your view regarding advanced statistical models being baseless when built upon p-values. (As a rather ardent frequentist he was critical of p < .05. He thought it should be p < .001 or even < .0001.)

          Third, Meehl certainly wasn't the only one who was critical of p-values, so using that as a filter is going to miss plenty of folks who cited him and were perhaps cited in the article. Jacob Cohen comes pretty quickly to mind.

        • AnonAnon,

          It is impossible for me to communicate on this with you until you accept the possibility that rejecting a default null hypothesis of zero makes no sense (unless, of course, such a hypothesis that a parameter = 0 is actually predicted by theory).

          The problem really has nothing to do with p-values, that is a huge red herring. If you had actually read Meehl’s arguments you wouldn’t be arguing this the way you do…he is very explicit.

        • @Anoneuoid:

          Particle Physics often comes up in the context of Mayo. Here’s the thing about that.

          1) Particle physics experiments involve repeatedly smashing high energy particles into each other, millions of times. In the context of doing something millions of times that is designed to be as close to exactly the same each time as possible, a model where these repeated experiments are treated as if they were a collective output by a random number generator isn’t so wacky as it is when you say recruit patients into an asthma study by asking all your pulmonologist friends to get you whoever they can find to apply to be a part of the study, and then screen them with some survey and randomize 75 people into two groups.

          2) Large numbers of these collisions are glancing or otherwise occur at lower energy levels than is required to see what is of interest. So a system needs to be built for *filtering out all the usual junk*. This is actually exactly what p values are good for. They’re filters that allow you to identify when a particular thing is not that different from a large collection of things you already know you aren’t interested in.

          3) Particle physics theories, at least compared to say power poses or voting pattern theories, produce relatively precise quantitative predictions. For example, if you’re looking for conservation of a certain quantity, then over a very large N you are looking for the sum of all quantities Q to be exactly zero. Of course your detectors don’t see everything, so it’s only the underlying quantity Q that is *exactly* zero. In these kinds of contexts the exact null is meaningful.

          4) Just because particle physics looks a lot like the analysis of a random number generator such as:

          “generate a random event, if it’s a lot like these ‘null’ events throw it out. from the remaining ones, if it’s a lot like the theoretical prediction for what should occur when some particular event occurs, collect it together. Analyze all of them to see if there is any theoretical prediction which would be compatible with this collection of events. If all the other possibilities produce a very low p value, then declare that you’ve observed the predicted event.”

          That doesn’t mean that p values are a generally useful tool in science.

        • Anoneuoid,

          The bulk of my issue is with your Meehl-citation-index heuristic. It’s sloppy and unnecessary as it ignores the simple fact the authors could have cited someone else who made the same points (again Cohen) and it glibly dismisses the substance of the article (like it’s goal in addressing issues around mentorship and young quantitative faculty support raised in the article).

          You’ve mistaken my position as being either unfimilar with Meehl or unfamiliar with the issues of null-nil hypothesis testing. I’m not. In response to the null-nil hypothesis testing framework, I agree with you that it doesn’t make sense. I felt that way ever since I took my first Bayesian analysis & modeling course where we read surprising not Meehl but instead Cohen’s article “The Earth Is Round (p < .05)" where he does cite Meehl extensively. Which is where I learned about Meehl and which is what caused me to find and read his work well before I ever encountered it on this blog.

          Lastly, correcting the mechanisms of inference and moving away from null-nil hypothesis is not a panacea for psychology's woes. It isn't even really a start. The most important starting place in my opinion is measurement and in understanding meta statistical issues like the Garden of forking paths. Two areas which don't require bringing up null-nil hypothesis testing.

  6. Incidentally, I saw a quote in Philip Glass’s book (mentioned on Frank Harrell’s blog by a commentator there), that would resonate with Fiske (or whoever it was) when they wrote about failed replications not really being informative:

    Kuhn to Popper: “In point of fact, no conclusive disproof of a theory can ever be produced; for it is always possible to say that the experimental results are not reliable or that the discrepancies which are asserted to exist between the experimental results and the theory are only apparent and that they will disappear with the advance of our understanding.”

    Maybe these guys are taking this Kuhnian position. Failure to replicate may indeed not be conclusive. It’s a fair point.

  7. >”Lastly, correcting the mechanisms of inference and moving away from null-nil hypothesis is not a panacea for psychology’s woes. It isn’t even really a start. The most important starting place in my opinion is measurement and in understanding meta statistical issues like the Garden of forking paths.”
    http://statmodeling.stat.columbia.edu/2017/01/16/powerposeupdate/#comment-397775

    It really makes no difference at all what equations are used to analyze data as long as the goal is going to be testing a strawman hypothesis. Unless you get rid of that, the list and magnitude of the problems will keep growing as psychologists/medical researchers/etc keep coming up with new myths to justify the behavior they have been taught. This is precisely what we see has happened.

Leave a Reply

Your email address will not be published. Required fields are marked *