Mindset interventions are a scalable treatment for academic underachievement — or not?

Someone points me to this post by Scott Alexander, criticizing the work of psychology researcher Carol Dweck. Alexander looks carefully at an article, “Mindset Interventions Are A Scalable Treatment For Academic Underachievement,” by David Paunesku, Gregory Walton, Carissa Romero, Eric Smith, David Yeager, and Carol Dweck, and he finds the following:

Among ordinary students, the effect on the growth mindset group was completely indistinguishable from zero, and in fact they did nonsignificantly worse than the control group. This was the most basic test they performed, and it should have been the headline of the study. The study should have been titled “Growth Mindset Intervention Totally Fails To Affect GPA In Any Way”.

Ouch.

As Alexander reports, the authors “went to subgroup analysis.” But we all know the problems there. Garden of forking paths, anyone? Again, let me emphasize that I don’t think that preregistration is the best solution to the garden of forking paths; rather, I recommend multilevel modeling, looking at all interactions that might be of interest, not just pulling out a few dramatic comparisons.

At this point I’m in a cloud of cognitive dissonance. On one hand, I met Dweck once and she seemed very reasonable. On the other hand, Alexander’s criticisms do seem reasonable, and it doesn’t help that the article in question was published in . . . yup, you guessed it, Psychological Science.

So really I don’t know what to think.

But what really amazed me were two things:

1. I’d never heard of this guy and his blog has about a zillion comments. There clearly are large corners of the internet that I didn’t know about.

2. It was also striking that 100% of the commenters thought the study in question is B.S. I have no idea, but Dweck is a respected resesarcher. I don’t think she’s in Daryl Bem or Ellen Langer territory.

The person who sent the original message replied to me:

There definitely been complaints from some corners about Dweck’s work not replicating, but also lots of followers doing other mindset experiments in her tradition.

Re that, these are the two posts preceding the analysis of that study:

http://slatestarcodex.com/2015/04/08/no-clarity-around-growth-mindset-yet/
http://slatestarcodex.com/2015/04/10/i-will-never-have-the-ability-to-clearly-explain-my-beliefs-about-growth-mindset/

The blog posts moderately often about bad stats, and Vox and the Atlantic link to him occasionally. Some other random stats related posts:

Alcoholics anonymous:

http://slatestarcodex.com/2014/10/26/alcoholics-anonymous-much-more-than-you-wanted-to-know/

More alcoholism treatment and the false positive psychology paper:

http://slatestarcodex.com/2014/01/02/two-dark-side-statistics-papers/

The claim false rape accusations are less than 1/30th as common as being struck by lighting:

http://slatestarcodex.com/2014/02/17/lies-damned-lies-and-social-media-part-5-of-%E2%88%9E/

Euthanasia:

http://slatestarcodex.com/2013/08/29/fake-euthanasia-statistics/

Psychotherapy:

http://slatestarcodex.com/2013/09/19/scientific-freud/

The “perceptions of required ability by field” study:

http://slatestarcodex.com/2015/01/24/perceptions-of-required-ability-act-as-a-proxy-for-actual-required-ability-in-explaining-the-gender-gap/

Parapsychology:

http://slatestarcodex.com/2014/04/28/the-control-group-is-out-of-control/

Hmmm, I guess I should look into this in more detail. Maybe I’ll talk with some of my psychology colleagues. In any case, I’m still impressed by Alexander getting hundreds of comments on that post—he must be doing something right to be getting this sort of attention and careful reading!

P.S. More here.

P.P.S. The person who sent the above message informs me that an author of the paper said that they have had another successful replication since, and will be preregistering their next one on Open Science Framework. If their effect is real and works in the preregistered many-school replications then it will generate a huge amount of social value by helping millions of kids in school.

P.P.P.S. David Yeager responds in some comments below. I appreciate his active engagement in post-publication review, which should be a model for researchers everywhere.

89 thoughts on “Mindset interventions are a scalable treatment for academic underachievement — or not?

  1. Scott Alexander built a sort of following over many years but it wasn’t by talking sensibly about everyday topics.

    He got big in the so-called “rationalist community”, writing about the things they like to read about (e.g. how an AI is going to kill us all). Now a lot of people read his blog — people, I gather, who know each other in real life from rationalist meetups — so there’s a sense of community like in the old days of blogging.

    More recently he started to become known outside these circles, I think in part due to his posts being easy to read (at the cost of being overly long).

    • Jake:

      I’m envious that Alexander gets thousands of comments on individual posts. On the other hand, if I got thousands of comments I wouldn’t be able to read and respond to them—or, if I did, I wouldn’t be able to do anything else!

    • I don’t think that’s a particularly accurate description of his blogging history. He was part of Less Wrong in some capacity, and he had a Live Journal blog before Slate Star Codex, so I think you’re right to say that his involvement in the rationalist community helped get him a sizable audience fairly quickly. But he clearly stated his motivation in the first SSC post:

      This blog does not have a subject, but it has an ethos. That ethos might be summed up as: charity over absurdity.

      Absurdity is the natural human tendency to dismiss anything you disagree with as so stupid it doesn’t even deserve consideration. In fact, you are virtuous for not considering it, maybe even heroic! You’re refusing to dignify the evil peddlers of bunkum by acknowledging them as legitimate debate partners.

      Charity is the ability to override that response. To assume that if you don’t understand how someone could possibly believe something as stupid as they do, that this is more likely a failure of understanding on your part than a failure of reason on theirs.

      Which you could maybe loosely paraphrase as “talking sensibly about everyday topics.” Okay, maybe not everyday topics, since the second post is about whether or not Abraham Lincoln was a necromancer, but the “talking sensibly” thing is kind of his whole schtick. And not just about things rationalists like to read about, as far as I can tell.

    • I think that most of the people who know each other know each other from interacting online – I have been some meetups, and I knew people from online discussion, and met some new ones – but I didn’t talk online with the people I knew already – and that makes a lot of sense, given the size of the community and the lack of geographic proximity. I note this because the main reason that Scott has so much discussion on the blog is because it doesn’t happen via journal papers, academic conferences, and other fora, since the community is newer and doesn’t have those institutions. If Andrew (and the rest of the bayesian statistical analysis world) didn’t publish in journals or go to conferences, and the primary way to interact with him was via this blog, I think posts would have much larger discussions

      Also, much of the community is not focused on the “AI is scary”, and are focused on how people can apply insights from the academic research on cognitive biases, and applications of decision theory to their personal decisions, including areas like effective altruism, which is largely just applied statistical analysis for charitable giving.

    • My take on the “rationalist community”: Unless smart people are optimizing for status (as they do in academia, academics try hard to seem as prestigious as possible) then they are going be treated a bit like nerds typically get treated in high school. They think hard about things, come to unusual conclusions, act differently from everyone else, ignore prevailing social norms. Everyone who’s less intelligent views them with an air of suspicion & derision.

      The “rationalist community” is a group of very smart people who aren’t optimizing very hard for prestige. None of the criticisms that have been written about it are very serious. And that’s because none of their critics are serious people. They’re like the less intelligent bullies who poke fun of nerds in high school. Most intelligent people who take a serious look in to the rationalist community become members.

      That’s not to say that the community is infallible, far from it–it’s full of flaws, just like your typical high school nerd social group (or your typical university even). I’m just saying they’re yards beyond your typical internet social club, and they’re a good read if you want a weird but smart perspective that you won’t find in academia.

  2. I read Carol Dweck’s book a few years ago and I think it is wonderful. It made a difference to how I approached some parts of my life (sample size =1 ). I can’t say the same for most other books I read. I’m glad that I didn’t wait for some “sub-group” analysis to prove her message.
    Yes, what she says is common sense and reasonable, but sometimes we just needed to remind ourselves to think clearly and with the right motivation. There is nothing counter-intuitive in her message. That said, there are many reasons why the experiment might not hold – timing/lags, maturity, implementations problems (and yes heterogeneous treatment effects).

    • You can’t estimate heterogeneous treatment effects (across most dimensions) if the sample consists of high dropout-risk high school students. Also, I think the message would be better received by parents/guardians that may then use these strategies as part of their child rearing. But in any case, this particular sample might have problems more serious than those involving a mindset switch.

    • > there are many reasons why the experiment might not hold
      Andrew forgot to mention that it not the theory must be wrong but rather the experimental results of this particular study are being mis-construed or supported details omitted (subgroup predefined rather than discovered).

    • According to Alexander’s account the intervention is a a 45 minute online “course”, and the response variable is measured at the end of the semester. It seems dubious that such an intervention should be able to produce an effect that much later.

        • A protocol that includes periodic ‘Mindset’ interventions throughout a semester or school year would more likely be effective. Though in addition to Martha’s pointing out of the possibility of heterogeneous treatment effects you have the unmeasured and possibly uncontrolled causes of GPA that are unrelated to effort, persistence, and even subject mastery.

        • It’s an open question whether more exposure to growth mindset messages would be more effective than a limited exposure. Adolescents have a way of believing things less the more that grown-ups tell them to to think something.

          Indeed, two meta-analyses by Eric Stice found that shorter behavior change interventions were more effective than longer ones, which contradicts public health intuition but matches the intuition that teenagers don’t like being told what to think or do.

          Stice, E., Shaw, H., Marti, C.N. (2006). A meta-analytic review of obesity prevention programs for children and adolescents: The skinny on interventions that work. Psychological Bulletin, 132, 667-691.

          Stice, E. Shaw, H., Bohon, C. Marti, C.N., & Rohde, P. (2009). A meta-analytic review of depression prevention programs for children and adolescents: Factors that predict magnitude of intervention effects. Journal of Consulting and Clinical Psychology, 77, 486-503.

        • I can see your point, I suppose the information as presented when done well might be the ‘missing piece of the puzzle’ that generates the ‘Aha’ moment and the “I get it… it all makes sense now.” That said, I wonder what role ‘attentional issues’ play in the target population that repeated exposure might help to overcome if one were to use different delivery method and couching of messages — perhaps follow-up assessments of information recall at follow-up time points that can be retaken until all information is understood and answered correctly.

        • I’ve read the first few pages of the paper. I don’t think the analogy with Gawande’s surgical intervention is a good one, since Gawande’s is introducing transparency and openness as a routine thing. Its effectiveness did not surprise me at all when I first heard of it. So his intervention is very different from the educational interventions, in two ways: First, the latter involve no transparency/openness/accountability involved, and second, they do not involve a routinely repeated practice.

      • >”Participating schools were asked to select a study coordinator
        who would recruit teachers to participate and follow-
        up with teachers if classrooms lagged. The
        coordinator asked teachers to create accounts on the
        study Web site (http://www.perts.net) and to schedule
        two 45-min sessions about 2 weeks apart (mean = 13
        days). Both sessions were administered in the school
        computer lab during the spring semester, between
        January and May 2012.”

        Some of the interventions did not occur until May, which could effect final exams. They essentially measured academic performance for the spring semester. The overall effect looks small, possibly these differences could be turning in one extra homework assignment the next day, etc.

    • I’ve read some of her work, and it did get me to be more thoughtful and deliberate about how I respond to students who say “I’m bad at math.” I don’t think it’s harmful even if it might not be the be all and end all of things. Again n=1 but it certainly doesn’t do any harm.

  3. I’m curious as to why Dweck is singled out in your post when she is the sixth author. I understand that she’s the theorist here, but sixth author seems to imply that she’s not primarily responsible for carrying out the study.

  4. “This isn’t comparing apples to oranges. This isn’t even comparing apples to orangutans. This is comparing apples to the supermassive black hole at the center of the galaxy.”

    Thanks. Made my evening!

  5. Was the author’s response not up when you read the post, or did you think it’s not worth mentioning? I don’t much care or know about this area of research, though Alexander seems to have missed that values were standardized (and has all but admitted as much in the post as it stands by now):

    http://slatestarcodex.com/2015/05/07/growth-mindset-4-growth-of-office/

    Specifically, oart of what you cite here seems to be exactly what Alexander has (in his own words) “completely bungled” and “screwed up.”

    Also, the author counters the “forking path” allegation, and while I cannot judge if it’s true it seems to me that knowing one thing or two about the field would be an advantage to understand his argument about why they chose to concentrate on the subgroups they did.

      • Ah, OK. I just thought it’s worth mentioning cos missing/misinterpreting standardized values in this way seems really bad. The author in his response calling the concept ‘esoteric’ regardless: e.g. de Veaux in the textbook you recommend uses z-scores almost from the get-go and very frequently uses them for the sake of clarification. So this appears almost the contrary of esoteric to me. If really, as Alexander posits, none but one commenter caught this error in his argument, I have my doubts that this is because the concept is so complicated or esoteric. I mean, even I know what this is about, and I know nothing of statistics and I am broadly stupid.

        • This prompts me to write a comment I was holding back :) I wouldn’t be too envious of the hundreds of comments received by Alexander, as all but one of the first two hundreds and fifty commenter followed him blindly in his misunderstanding of the z-scores graph. The regular commenter here may be less in numbers, but definitely not in attention – you people would have never let such a blunder go unnoticed!

        • I think there is a bit more to it. Andrew sometimes (once?) said there is a problem that everybody wants to be Jared Diamond. On the other side, everybody wants to be Andrew Gelman! You know, debunking applied scientists who use stats mechanically, which is why they fall into every conceivable trap because they don’t know what they are doing. But neither do lots of the debunkers! Just as everyone who ever haggled with a taxi driver thinks they have insights on international trade, everyone who ever produced a pie chart in Excel thinks to have insights on stats. And so you get a lot of peope just jumping on the “[Researchers in a certain field] are sooooo stupid”-bandwagon, and not looking especially good doing so.

          To be fair, Alexander seems reasonable and tries to get it right. He also fully acknowledged his screw-up. Still, he got a concept wrong that intro stats books treat from the early chapters on. Which makes me wonder why he thought he could make a devastating critique in the first place. I mean, at some level he must know that he is just not competent, isn’t it?

          More generally, I think this is a problem of a half-assed, ill-conceived and frequently self-defeating notion of “reasoning.” Most scientific areas are hard to learn and complicated. I am sure professional statisticians like Andrew have been spending a lot of time to obtain their technical competence.”Reasoning” yourself through whole areas just won’t do it.

          Which leads me to my speculative idea why some blogs have that many readers and commenters: Everybody is thinking about the latest hot shxt on the Internet. But after a long work day and spending too little time with family and friends, most people don’t have the time to write up whatever thoughts come up in their late-evening readings, much less to do serious research or understand technical stuff. But they still want to read about it! And so they spend their time reading their own thoughts conventiently spelt out on their favorite blogs, as if they could overcome their lack of knowledge about whole areas by “thinking hard.”

          If he wants more readers, I suggest Andrew spend much more time outside his fields of expertise, and replace thoughtfulness moderated by competence with “reasoning” moderated by snippets of Internet wisdom.

        • Yes, this and also: Apart from wanting to be Diamond and Gelman, everybody thinks they are Feynman or Grothendieck. There is this extreme admiration for people who, at least according to hagiographic accounts, gained their insights ‘from scratch’, more or less by just thinking really hard. Needless to say, this will be self-defeating for most people. But apparently thinking like this about oneself is more satisfying that sitting down learning from textbooks at the danger of exposing one’s own weaknesses and intellectual limits. Now wonder they dislike the idea of a growth mindset from the get-go…

        • @Martin

          What about all the people who go the “textbook learning” route and write crappy papers in academic Journals that fail even a common sense test?

          I think you are being too harsh on him. It was merely a blog post. Haven’t we all seen career academics commit worse blunders in peer reviewed papers?

          No one says textbooks are bad. But not all of us can read textbooks that cover areas outside our primary domains of expertise.

          I don’t think that ought to stop people like Alexander from commenting on areas they are not experts in. If a blogger makes sophomoric errors all the time, intelligent readers will dump him quite soon.

          And frankly, I’d give a lot to be lectured by a Feynman on almost any topic than a run of the mill tenured Prof. whose primary credentials are having read the prescribed textbooks and jumped through all the required hoops.

        • Rahul,

          I did not suggest a ‘textbook approach’ as opposed to a ‘thinking hard’ approach. I critisized a ‘thinking hard’ approach that seems pretty unconcerned with actually having competence in the hard thought-about topic at hand. Alexander can do whatever he wants. But if he puts a devastating and (as he himself admits) snarky critique about the use of stats in a paper online for everybody to read, and then screws up so badly and w/r/t such a basic concept in stats, well… where is your defence of the authors of the paper against Alexander being harsh on them?

          Read again what I wrote re Feynman. It says that his approach won’t work out for most people, not that it didn’t work out for him (though I have my doubts concerning the most hafiographic (self-)accounts).

        • @Martin

          Alexander made a critique. He happened to be wrong. Others pointed it out. He realized his mistake. Perhaps, the next time he critiques a Stat method our priors about him will have slightly changed.

          Overall, the process seems to have worked well.

          What would you rather have? Him not having commented at all about this? Andrew sticking to posts strictly within his professional domain?

        • Rahul,

          I do not wish anything for Alexander to have done differently: he screwed up, and I think points to a broader problem with the ‘think piece’-blogosphere. So I said so.

          Andrew can and very probably will do whatever he wants to, and I have no plans to seriously suggest otherwise, nor am I in a position to do so. And if I disagree with him and I feel like it, I will vent my frustrations, as long as he lets me right here.

          If you feel I am wrong, I have a concrete feeling you will let me know. And I hope nobody will read implication in your diasgreement that are nowhere to be found.

        • Alexander may indeed have “screwed up”. But in my opinion, his screw up is minor compared to how crappy the original paper is.

        • There might be a slight misunderstanding concerning my allegedly being too harsh here with this critique of the paper you (non-harshly, of course) identify as crappy and a much worse screw-up: The screw(ed)-up you scare-quote is Alexander’s self-assessment of his… screw-up, a ‘huge and unforgivable’ one at that.

        • I feel like there is some meta (or recursive?) irony here.

          Scott Alexander has tried to bootstrap a lot of his statistics knowledge (not literally, but in the sense that a bit of it is self taught on the fly). Scott seems to not like the paper — and based off many different postings– he doesn’t like the Growth Mindset idea. Yet Scott (and possibly his readers?) seems to exhibit a Growth Mindset, struggling with new ideas and trying to grab things that are just out of reach— and crucially admitting mistakes along the way, so as to learn.

          To the extent I read the comments right: Martin does not like this, but does not dislike the Growth Mindset.

        • I think the friction is between the broad idea versus the specifics.

          I don’t think anyone disagrees that an attitude of developing ones skills and improving on our baseline talent reaps rich rewards.

          The real question is whether you can give a random, unmotivated fellow a 1 hour online lecture on this topic and then measure a significant performance improvement.

  6. From the paper:
    >”Both interventions were intended to help students persist when they experienced academic difficulty; thus, both were predicted to be most beneficial for poorly performing students. This was the case.”

    I don’t see how they can make that claim (it is the usual null hypothesis is false so my theory is true). One alternative explanation is that the control intervention was harmful. They didn’t have a “no-intervention” condition.

    The control was to have the students read about functional localization and encourage them to “put forward economic self-interest”. So the results could just as well be explained by the control intervention discouraging the low performing students. They had to read something more complicated sounding, I bet to understand it required they remember weird names of brain regions, etc. Then the students were reminded they need to be out making money rather than wasting time in class.

    Figure 2A shows scores in all subjects except math decreased pre-post intervention for the control group. The other measure was GPA. They do have pre-intervention GPA but choose not to share whether that decreased for the controls: “We calculated each student’s end-of-semester GPA in core academic courses (i.e., math, English, science, and social studies) in the fall (preintervention) and in the spring (postintervention).”

    Another possibility is that the intervention affected the teachers rather than the students. It does not say whether the teachers were blinded to condition while grading, but I doubt it. I imagine this would be hard to do because they would want to discuss this activity with the students.

    Also, a difference of 0.1 GPA with 67% confidence intervals +/-0.4 is not very large. The people involved in the study should be able to think of many explanations for an effect of that size and uncertainty, I don’t see where they consider any.

    • Typo, this should be +/- 0.04: “Also, a difference of 0.1 GPA with 67% confidence intervals +/-0.4 is not very large.”

      >”P.P.S. If their effect is real and works in the preregistered many-school replications then it will generate a huge amount of social value by helping millions of kids in school.”

      Preregistration and successful replications are good, but do not address the primary problems here. If it is a straight up replication, it will not be able to give a valid answer to the question they want to ask.

      In order to determine if the intervention is actually helping kids in school they will have to rule out at least the two explanations offered above. That means the next study should blind the teachers to condition (I imagine this will require some cleverness…), and include a “no intervention” comparison group. There are surely other issues as well, I’d really recommend they run the experimental design past some skeptical outsiders with experience in this area before performing the next study.

    • I’ve had plenty of students concerned about the difference between a B- and B, plus you won’t get into a lot of graduate programs with a 2.9 but you will with a 3.0 so I don’t think a .1 difference is small at all. In fact I’d be way more worried about seeing a large effect size from such a small scale intervention. People are not atoms or even barley, doing anything to systematically change them for the long term especialy is not easy.

        • A very interesting question and probably not much, but in general everyone who teaches knows that subject mastery is just one component part of student acheived grades (so is whether they write well, whether they actually answer all the questions on the test, understood the instructions, partied the night before the test etc). I don’t think here they are interested in “subject mastery” as much as “success in courses.”

        • James:

          Of course education is about learning. Elin is addressing the different goals of the learning. One goal is to achieve mastery of the material, another goal is to be able to use the material in future settings.

      • It’s not so much whether .1 is of practical significance, it is whether that difference can be attributed to many things unrelated to the theory of interest. Maybe one of the tasks required less time than the other, so students were able to finish homework/studying/cheatsheets for the next class. Could that improve GPA .1? There are many reasons an intervention can cause small differences other than the explanation that motivated the study.

        • Why is practical significance unimportant? When we use criteria like GPA which do not have direct meaning, it makes the task you describe exceptionally difficult as the possible alternative explanations are large in number and ruling them out requires quite a bit of data and exceptionally well designed studies.

        • >”Why is practical significance unimportant?”
          It is important to the cost-benefit analysis, but not to the theory testing (insofar as A>B can be considered a prediction from a “theory”). Small effects cause problems in both areas.

          >”When we use criteria like GPA which do not have direct meaning, it makes the task you describe exceptionally difficult as the possible alternative explanations are large in number and ruling them out requires quite a bit of data and exceptionally well designed studies”

          I would make the stronger argument that it is impossible (in practice) to get useful results using this type of experimental design in this context. It is pointless to continue doing group A vs group B at a single timepoint studies. They need to collect longitudinal data, come up with a theoretical curve that explains it, and then deduce precise predictions from that theory.

        • That’s all about design, and it’s fine to critique and raise questions about threats to internal validity. That’s not the point of the argument being discussed here though. That’s also why you do process evaluation to make sure you identify issues like differences in time, though really you should pick that up in the design phase when you are pretesting the interventions.

        • I think it is not only ‘fine’ to do so, but essential. And it is not all about design. It is fundamental to the claims being made. How can one make an argument using evidence of ‘prediction’ that relies on the conceptual overlap between the predictor variables and the criterion without carefully considering the potential criterion contamination with important and impactful confounders?

        • To elaborate just a bit: If the goal of the study was “Predictors of GPA”, I would concede your argument. But if the study is a test of the Mindset Theory, then identifying appropriate criteria by which to judge it that are directly meaningful and consistent with predicted theory is necessary to achieve this goal.

        • The study is not designed as a test of mindset theory. They assume mindset theory is valid. The study is a test of whether this particular low cost intervention influences achievement.

    • I agree that choosing a good control group is hard. In school studies a major concern is “do no harm” so researchers often err on the side of modestly helpful control groups. It seems reasonable that a control group that taught interesting scientific information about the brain to high school students who might be taking biology and learning about the brain might even benefit from the control group. Of course that’s not shown in the Paunesku paper one way or another.

      Showing a decrease in the GPA for controls would be uninformative because there is almost always a decline in grades year over year in high school (especially freshman year), because coursework gets harder as students approach college.

      Benner, A.D. (2011). The transition to high school: Current knowledge, future directions. Educational Psychology Review.

      Commenting on this:

      “It does not say whether the teachers were blinded to condition while grading, but I doubt it.”

      Students were randomly assigned by the computer and no school staff person was given information about students’ condition assignments. Also students typically did the exercise in their electives (like PE or Health) and so the content area teachers (English, Math, Science, Social Studies) wouldn’t have known that students were even in a study. Which is another way to say that it was a double-blind design.

      • >”I agree that choosing a good control group is hard.”

        Not just one control group, but many will be required. Here, the most obvious control of “no-intervention” was missing. It is not possible to make claims about the intervention benefiting students from the current data. Even interpreted in the most generous manner possible (to the favored theory), it is still only possible to say intervention A was better than intervention B. Nowhere was a comparison to the status quo made.

        >”Showing a decrease in the GPA for controls would be uninformative because there is almost always a decline in grades year over year in high school (especially freshman year), because coursework gets harder as students approach college.”

        This was fall to spring semester, not year over year. Also the vast majority of students were freshman, it would not surprise me to learn the first semester of highschool is an “adjustment period” characterized by poorer performance. Either way, this needs to be discussed in the paper.

        >”Students were randomly assigned by the computer and no school staff person was given information about students’ condition assignments. Also students typically did the exercise in their electives (like PE or Health) and so the content area teachers (English, Math, Science, Social Studies) wouldn’t have known that students were even in a study.”

        The students may or may not have discussed the activity with each other and teachers. Who knows to what extent this matters. This is also crucial information that belongs in the paper, otherwise the audience cannot follow the reasoning. Even a handwave showing this stuff was considered is better than silence.

        >”Which is another way to say that it was a double-blind design.”
        In a blinded design, efforts are taken to ensure the researchers/participants are unaware of their treatment group. Any indication these efforts failed is considered to mar the the study. In a double-blind design the participants are unaware of what treatment they received. Is that what happened here?

        • Actually the important thing is that the participants are unaware whether the they received the treatment or the placebo. I don’t see anything that indicates that the students would have known which videos were the treatment, unless you think there was something in the consent form or that they or their parents or the teachers were Googling about the content of the videos (or the names of the investigators on the consent form). Or the teachers could have heard about it from the kids and may even have read some of this work before (I know I’ve read Dweck in faculty development contexts and I think in some popular articles, I assume many teachers have too). So it’s certainly possible that happened. I totally agree that it’s important to have process researchers on the ground to try to capture this to the extent possible. Research on schools is hard, they are little social systems where people talk to each other and react to things in unexpected ways. On the other hand, all this would happen just as much if teachers just started using the curriculum without trying to test it first. I don’t think that’s a preferable alternative.

  7. It seems like the subgroup analysis still yields a test for an interaction effect of mind-set-intervention with risk group with p = .048. As far as I can tell, the “simple effect test” (the treatment v. control contrast in the high-risk subgroup) is not reported.

    • That was the first of three treatment conditions. Two showed a significant interaction, one was p = .07. All three combined (compared to control) was significant at p = .01. Here is the original text:

      This interaction was significant for the growth-mind-set intervention, b = 0.13, 95% CI = [0.00, 0.26], t(1568) = 1.97, p = .048, and the sense-of- purpose intervention, b = 0.17, 95% CI = [0.03, 0.32], t(1568) = 2.31, p = .021, and it was marginally significant for the combined interventions, b = 0.14, 95% CI = [−0.01, 0.28], t(1568) = 1.81, p = .071 (Fig. 1). … [COMBINED] The regression analysis revealed a significant At Risk × Intervention interaction, b = 0.14, 95% CI = [0.03, 0.25], t(1572) = 2.56, p = .011.

      Here is the simple effect test:

      The intervention effect was significant among at-risk students, b = 0.13, 95% CI = [0.02, 0.25], t(499) = 2.30, p = .022, but not among other students, t < 1.3

      • This just seems to be evidence that the garden-of-forking-paths needs to be taken into account — e.g., by complete multilevel modeling, as Andrew suggests.

        • I thought Andrew suggested this “looking at all interactions that **might be of interest.**” — and then putting them in a multilevel model.

          A question for this field and any field is how to define “of interest.” It seems reasonable to use previous literature; not all interactions are equally interesting or meaningful (I actually don’t think the 2013 garden draft sufficiently deals this).

          Some previous studies have found interactions of brief psychological interventions with prior performance, such that previously lower-achievers benefitted more:

          Hulleman, C. S., & Harackiewicz, J. M. (2009). Making education relevant: Increasing interest and performance in high school science classes. Science, 326, 1410–1412.

          Cohen, G. L., Garcia, J., Purdie-Vaugns, V., Apfel, N., & Brzustoski, P. (2009). Recursive processes in self-affirmation: Intervening to close the minority achievement gap. Science, 324, 400–403.

          Yeager, D.S., Henderson, M., Paunesku, D., Walton, G., Spitzer, B.,* D’Mello, S., & Duckworth, A.L. (2014). Boring but important: A self-transcendent purpose for learning fosters academic self-regulation. Journal of Personality and Social Psychology, 107, 559-580.

          The same interaction is true it turns out for much lengthier social programs like early childhood programs:

          Tucker-Drob, E. (2012). Preschools reduce early academic achievement gaps: A longitudinal twin approach. Psychological Science, 23, 310-319.

          Miller, E.B., Farkas, G., Vandell, D.L., & Duncan, G.J. (2014). Do the effects of head start vary by parental preacademic stimulation. Child Development, 85, 1385-1400.

          So while I agree in general that unspecified post-hoc moderators can lead to over-claiming data, I don’t follow the logic for why a theoretically predicted interaction should be treated the same as others that were not – or am I misunderstanding here?

          Either way, I do agree it would be great to have a replication of the interaction in larger samples and more fully know what appeared due to chance and what was a reliable result.

        • David,

          Is the P.P.S. right that you are “preregistering the[] next one on Open Science Framework” to avoid forking paths and ex post defenses of the choices?

        • David,

          You are correct that in this post Amdrew said “looking at all interactions that might be of interest” — I was recalling (possibly incorrectly) previous comments he has made.

          I agree that the definition of “of interest” needs discussion. But my take is that “of interest” needs to be decided primarily without looking at previous studies — that is, by considering the purpose of the intervention and what might go wrong, or support other hypotheses, or whatever. I’m tempted to say that it may be as important to give a reason for excluding an interaction as to give a reason for including one.

          Also, I winced a little at “The same interaction is true it turns out for …”. That’s the kind of thing I caution my students not to say. We can’t conclude that something is true from a statistical analysis — just that the analysis (if sound) shows that the data are consistent with it.

          “So while I agree in general that unspecified post-hoc moderators can lead to over-claiming data, I don’t follow the logic for why a theoretically predicted interaction should be treated the same as others that were not – or am I misunderstanding here?”:

          I’m a little confused here. I would consider “theoretically predicted” to be different from “found in a previous study.” Or am I mistaken in assuming that this sentence is related to the previous paragraph discussing related studies? (Also, I don’t know what you mean by “over-claiming data.”)

  8. I wonder how research psychologists usually think about conflicts of interest / competing interests.

    The paper notes: “The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.”

    Carol Dweck is presumably a substantial stakeholder in Mindsetworks / Brainology, which markets a product based on the idea that these interventions work.

    If these interventions were thought about how medical science thinks about drugs or therapies, this would be a case of COI / competing interests.

      • That disclaimer about Dweck’s lack of financial interest has been added after her potential COI was pointed out around the time Scott Alexander’s post was originally published. The disclaimer wasn’t there in March 2015 which is temporally the closest Wayback Machine version. The idea that someone who is described as a company’s Co-Founder & Board Director would have no COI seems dubious.

      • Maybe there is some legal thinking behind the no _current_ _financial_ interest but it certainly is a potential COI which by definition is a COI and it should have been declared.

        For instance if I have advised friend and or family to invest in say United Fluff and Dust – I have a COI in matters concerning them.

        To me others should be replicating this work and preregistration is no cure for COI without an audit.

      • Well, I would read that as: mindset works licensed carol dweck’s interventions for a one-time price, “no current”. Does that mean she has no conflict of interest. I dk.

  9. Glad to see my two favourite blogs link up. Scott Alexander actually admits being weak on stats, but I think (through extensive rationalist training or personality) he’s a bit better at giving everything a fair shake (charity > absurdity). So, AG + SSC compensates well. And both AG and SSC have shown me the value of applying my priors somewhat explicitly numerically to topics that I’m wondering about.

    Regarding mindset I saw this talk at a conference. An independent replication by this author counts way more to me than most of the rest of the literature.
    http://drjamesthompson.blogspot.de/2015/09/willpower-makes-you-brighter.html
    “Moreover we find no support for incremental beliefs about will-power on measured cognitive test scores.”
    “We found no support for incremental versus fixed mindset on grades. We further found no significant effect of mindset priming on IQ scores post a performance setback challenge in either of two replications. Across multiple studies, we were unable to support empathizing, will-power conservation, or mindset (typical or primed) as factors affecting IQ or cognitive control.”

    But good that they are pre-registering. It’s the thing to do when things are overly contentious.

    • Is this replication a test of incremental theories of intelligence (which have been independently replicated in many studies; see meta-analysis by Burnette et al. 2013), or non-limited theories of willpower?

      It seems like they’re testing the latter, willpower studies, where are much more recent and not the same variable (or theory or effect).

  10. “1. I’d never heard of this guy and his blog has about a zillion comments. There clearly are large corners of the internet that I didn’t know about.”

    I wonder what was going on your head to be surprised about this. Your prior should be that you do not know most of what happens in the internet, not the other way around!

  11. Dr. Dweck’s Growth Mindset is kind of similar to motivational speaking oldies but goodies like Norman Vincent Peale’s “Power of Positive Thinking.” It’s worth noting that one of Rev. Peale’s congregants was the very young Donald Trump, who has followed the Power of Positive Thinking (about himself, at least) ever since, and it still seems to be working for him.

    I think I can say with some confidence that the evidence is that some motivational speakers (e.g., Norman Vincent Peale) are better than other motivational speakers (e.g., myself), and that some listeners (e.g., Donald Trump) are more motivated by motivational speakers than other listeners (e.g., myself). And, there are also sort of complex interaction effects between individual speakers and individual listeners.

    This suggests that while motivational speaking is a big business, it’s often hard to replicate results that one individual motivational speaker got in a particular time and place.

    • I haven’t seen Norman Peale randomize people to positive thinking or not, but mindset researchers have randomized people to receive one kind of mindset or another seen how they cope with challenges.

      Here is a meta-analysis (not conducted by Dweck or Dweck students) of N=28,217 participants and k=113 studies showing effects of mindsets on attitudes, behaviors, and coping:

      http://faculty.wcas.northwestern.edu/eli-finkel/documents/2013_BurnetteOBoyleVanEppsPollackFinkel_PsychBull.pdf

      • I’m sure that some famous motivational speakers would succeed in randomized trials, just as Dweck’s brand of motivational speaking succeeds in some trials.

        If you think of what Dweck is doing as motivational speaking rather than as Theory of Relativity-style Science!, it’s also easier to imagine why it might fail to replicate in other studies. Norman Vincent Peale, for example, appears to have been highly successful at motivating some people in his time and place. But his sermons tend to strike most people today who are younger than Donald Trump as old-fashioned.

        With human beings, effects tend to wear off. What was hugely motivating a generation ago seems corny today. As you can see from this article by Dweck:

        http://www.edweek.org/ew/articles/2015/09/23/carol-dweck-revisits-the-growth-mindset.html

        She has come up with a bunch of new slogans for motivating children to think they’ll be good at math. But eventually, our society will catch on to what she’s up to and then her tactics will be as outmoded as Norman Vincent Peale’s. But, no doubt, some new charismatic figure will eventually emerge with a new set of sayings that will strike the next generation as brilliant leaps forward, rather than as obvious old-fashioned Growth Mindset tricks.

        The motivation cycle will spin on, forever.

  12. I don’t know if the intervention works, I’m always really skeptical about things like this, I think it was a good move to think about what scale up means and to try to measure its effectiveness. As we’ve discussed before, there are ways that early adopters may be systematically different from later adopters that makes success in the later adopter group less likely. And there are a lot of ways that a “high touch” intervention is not going to ever be scalable even if it is successful. So I think it’s great that they tried to figure out what a less expensive, scalable intervention might look like and whether it would be effective. And so what if the effect size is small, now they can think about dose-response and whether a larger intervention would yield greater effect, which, as David mentions above.

    I’m also a bit puzzled by the complaints about their p values, given the ways in which we all know that p values are a problem. I agree with Andrew that multilevel modeling is the way to go with data like these.

  13. I thought the authors comments on this at http://slatestarcodex.com/2015/05/07/growth-mindset-4-growth-of-office/ where important and work done since then. I found this by searching the blog after the recent talk about the NYT article because I feel these ideas are really important as we seak to improve science.

    See http://blogs.edweek.org/edweek/finding_common_ground/2017/06/misinterpreting_the_growth_mindset_why_were_doing_students_a_disservice.html for more.

    I think other evidence has come up about this work but I do think Carol Dweck is top notch and works to take criticism well and use it to improve her work.

Leave a Reply

Your email address will not be published. Required fields are marked *