Checkmate

Sandro Ambuehl writes:

As an avid reader of your blog, I thought you might like (to hate) the attached PNAS paper with the following findings: (i) sending two flyers about the importance of STEM fields to the parents of 81 kids improves ACT scores by 12 percentile points (intent-to-treat effect… a bit large, perhaps?) and (ii) ACT-scores predict college science-course taking. The paper concludes that “these findings demonstrate that a motivational intervention with parents can have … downstream effects on STEM career pursuit 5 y later” (as if correlation were transitive…). To their credit, they do state, buried in the text, that “There were no significant direct intervention effects on posthigh-school STEM career pursuit variables”.

You’ll correctly guess the editor.

My quick answer is that I neither love nor hate this paper. I get the impression that the authors are doing their best and are presenting their work clearly, and they just happen to be working within a framework in which effects are overestimated and overgeneralized.

The highlighting at the above link is from Ambuehl, I think. I agree that’s a bit much to think that handing out two brochures and a link to a website would be enough to cause an increase of mathematics and science ACT scores by about 12 percentage points. But we know that published point estimates are biased. We can take our Edlin factor and scale that down to, ummm, 3 percentage points? Even 3 percentage points isn’t nothing. The mechanism suggested in the paper is that when the parents got the brochures and website link, the kids were more likely to take STEM (science, technology, engineering, and math) courses in high school, and it makes sense that they could learn something in these classes and then do better on the exam.

I agree with Ambuehl that the logic of transitive correlations is in error. In particular, in the concluding sentence of the abstract, “Overall, these findings demonstrate that a motivational intervention with parents can have important effects on STEM preparation in high school, as well as downstream effects on STEM career pursuit 5 y later,” that last phrase does not seem supported by their data.

And, by the way, Ambuehl was right: I did guess the editor! Actually I think this paper is much much much better than the air-rage paper, the himmicanes paper, and the ages-ending-in-9-paper. This paper is just fine, almost. It’s hard to get kids to take math and science classes, and for this group of students whose families were already enrolled in a longitudinal study, it turns out that sending brochures to their parents seems to have been an effective intervention. It wouldn’t be so hard to mail such brochures to every parent of a high-school kid in the country, and I guess if they all take one more math or science class and one less class in, ummm, driver’s ed? U.S. history? Spanish? whatever? I guess that would be good, I dunno. I think they’re drawing a lot of conclusions from just 181 kids in this study, and for the usual reasons I’m suspicious of a claim such as, “a modest intervention aimed at parents can produce significant changes in their children’s academic choices.” The problem is that there are so many modest interventions happening all the time, and they can’t all have big effects.

Fundamentally I see this kind of thing as “engineering” rather than “science.” I don’t mean “engineering” in a bad way, not at all! These researchers are trying to figure out ways of getting kids to take more STEM classes, and here’s something they tried in this small group, and it seemed to work, so it’s good to share the information. It seems odd to me for this work to have appeared in a psychology journal (Psychological Science) and then a general science journal (PPNAS) rather than in some sort of education policy journal, but that’s just an artifact of our current decentralized system of research communication. I assume the results made their way into the What Works Clearinghouse so the relevant policymakers will know about it.

Overall I have the impression that many of the mistakes we see in statistical inference are created by the framing of research in terms of scientific discovery, in that researchers are pushed to make deterministic and overstated claims. But you can’t really say PPNAS did anything particularly wrong in this particular case: they popularized a bit of workaday policy research which seems at least on first glance to be a solid piece of work, flawed more in its presentation than its execution. It’s a little study, not a big deal. But not everything has to be a big deal.

P.S. Upon reflection, I think I was too generous in my above assessment. Or, to put it another way, no, I don’t believe the published estimates. I think they’re biased, and I’d expect that if someone were to try a controlled replication, that the results would probably be smaller. Perhaps I was giving the paper a soft reception because it was in PPNAS: the soft bigotry of low expectations and all that. Also, just about every published paper in policy analysis uses estimates that are positively biased, so we shouldn’t single out this particular articles. It really is much better than the PPNAS classics on himmicanes, air rage, etc.

15 thoughts on “Checkmate

  1. As a new editor of a new journal, I’ve been pushing hard on your closing idea that not every study has to be a big deal. A single study can’t do that much, so editors need to encourage authors to acknowledge the compromises they’ve made, rather than pressuring them to ignore them and pretend their paper is a big deal.

    At our last conference, we aired a couple of short videos to emphasize the point to potential authors and reviewers:

    Substance, Commentary and Compromise: JFR’s New Approach to Academic Publishing

    The Journal of Financial Reporting Playbook

    Here’s a snippet from the transcript of the first video:

    First, we recognize that every paper must compromise on some goals to achieve others. It’s hard to explore new ideas and test old ones; it’s hard to identify causality and generalize across samples and settings. It’s hard to address deep issues and provide useful applications.

    We’re fine with compromises; in fact, we encourage them. We’ll let you pursue a limited set of goals, as long as you do it well and report honestly about what you did and didn’t achieve.

    Second, we’ll base publication decisions on the substance of your work—what you actually do and find. How do you gather data? How do you analyze it? What narrow conclusions do you draw? We’ll hold substance to the highest standards.

    But most papers also include a good deal of commentary. What makes the study interesting? How does it address big questions? How does it test or extend existing theory? How might it affect practice or policy?

    These issues are important, but they’re so hard to evaluate that a lot of good papers get rejected on the basis of reasonable disagreements about highly subjective matters. We think that’s wrong, and that this is one of the big reasons peer review is so tough on new work and unknown scholars. We won’t use reasonable disagreements as a basis for rejection. Instead, if need be, we’ll hash them out in published discussions.

    In this study, I’d see the substance of the study including the brochure intervention (what they did) and the association between treatment and test scores (what they found). Commentary would include claims about how big, plausible, and generalizable the association actually is, and whether there is a causal effect and how it would operate. Maybe there’s something there, maybe there isn’t, but this study wasn’t intended to assess it, so as long as the authors don’t say anything unreasonable about these matters, and clearly indicate their remarks are just commentary, what they say shouldn’t matter for the accept/reject decision.

    This is a new approach and we’ve only started using it in our editorial process, so I’d appreciate any thoughts you all might have.

  2. I do like their theoretical model in Figure 1. ‘Theoretical model of how the intervention should have long-term effects on STEM career pursuit through increased STEM preparation …”. In psycholinguistics I have heard such models called boxese.

  3. I’m gonna guess that this is a dumb question and I just don’t really understand something about the methods – but what’s up with the standard errors in the Appendix tables? There are like two values of standard errors in each table, and they just repeat across all the variables. Some sort of normalization that I would know about if I did this kind of thing?

  4. I believe a prior study did the brochures thing and this one did 5 year follow up to make claims about how much the extra semester boosted all sorts of measurables. The one about brochures sounded interesting and I’d love to see that done in different schools with different socio-economic markers. I agree that’s engineering in the “find a lever” sense. The one about 5 year follow up is blech.

  5. For quite a few of those variable there is a lot (up to 35% of sample) of missing data. Not sure how well FIML estimation deals with such high levels of missing data.

  6. I have no problem believing that reaching out to parents to inform them of the benefits of STEM education and to encourage them to talk with their kids about its importance. That said, some of these numbers simply look insane. Based on the description in the paper, we are talking about a tiny isolated intervention, but the alleged effects are huge. In addition to the previously mentioned 12th percentile…

    “Harackiewicz et al. (32) found that this intervention increased students’ STEM course-taking in the 11th and 12th grades of high school by approximately one semester of additional STEM course-taking on average”

    “A previous report showed that 86% of the parents (either mother, father, or both) reported remembering and using the intervention materials, and 75% of the adolescents confirmed exposure to the intervention materials from their parents, demonstrating a high degree of overall engagement with the intervention.”

    I love the comic understatement of that last sentence.

  7. I guess I disagree with Andrew and agree with Sandro here: the main result of the paper is quite simply incorrectly stated, as there is no evidence here that the intervention affects science outcomes later in life (aside from the improper correlation 1 plus correlation 2 result). Worse, it is frankly impossible to learn anything from the intervention “give 181 kids a brochure”. If this was a paper I was handling, I would have rejected it flat out, as it is utterly implausible that any generalizable knowledge can be generated by this particular research design. I think this is more what Sandro was getting at.

    And of course “not everything needs to be a big result”; but this is PNAS, supposedly cross-discipline summaries of the very most important work being done in academia. The standard should be “equivalent to a very top journal in your field”, not “a minor result”.

    • Anon:

      That goes in my “flawed more in its presentation” comment. But, fair enough: If the presentation is bad enough, that can be said to flow back into one’s evaluation of the execution of the project. I added a P.S.

      • Fair enough. One other point, related to how psych folks do their studies: I defy you to figure out, on the basis of reading the paper, the functional form being used here on the basis of reading the (10 page, so not tiny!) paper, the justification for what (I think) is a linear SEM, or the sample size and features. The small sample size never appears in either the study design or the results, tables included (it is mentioned off-hand in the conclusion). The sloppiness with which psychology treats statistical analysis, beyond simply “stars or no stars”, is mind-boggling.

  8. How do you write a paper describing an experiment in which you never actually present the means and standard deviations for all of your variables across both conditions? It is very hard to make sense of this paper without this basic information.

    • Marcus:

      I dunno, but I don’t think the editor of this PPNAS article, herself a member of the esteemed National Academy of Sciences, would’ve accepted the paper had it not met the highest standards of research quality.

    • As is standard for this kind of journal (Science and Nature do the same thing), the detailed methods are buried in a supplemental section online – there’s a link to it at the bottom right of the first page of the article. It is, perhaps, the most obvious sign that these journals promote style over substance.

  9. – 181 participants is generally considered too small for SEM, especially considering some variables only have data on around 120 participants (around 55 in the treatment group).
    – I find it somewhat odd, that they include up to three-way interactions in the model, use that to argue they can’t show model fit because it is saturated and then turn around and not show the full model, but just the significant paths. Each is fine on its own, but taken together it’s odd.
    – Similarly, I would expect the intervention effect to be addressed via multigroup SEM and miss an explantion why it wasn’t.

    Generally, as the indirect effect (i.e. the SEM) is crucial to the paper, I think the modelling needs to be described in far greater detail.

  10. Andrew,

    Regarding your sentence ” It wouldn’t be so hard to mail such brochures to every parent of a high-school kid in the country, and I guess if they all take one more math or science class and one less class in, ummm, driver’s ed? U.S. history? Spanish? whatever? I guess that would be good, I dunno.”

    >That depends on your goal. If your goal is to improve math/science, then yes, it is beneficial to include an additional math/science class in the curriculum. But not everybody’s goal is math/science-related, and there are valid arguments against taking this approach. If Student X is trying to be the next great writer of literary fiction, is it going to serve his purposes to take calculus? Almost certainly not. Or if Student Y wants to be a translator to ease the difficulties of international commerce, Physics II is probably going to be useless in the pursuit of this career goal.

    If these students WANT to take these courses, they may do so, but that has to do with individual motivation/ability/desire for challenge/valuation of a well-rounded liberal arts education. The assertion that students _in general_ would be better served by taking more math/science and less history or Spanish is, on its face, a rather silly and field-centric one. Many people would be better served by a deeper understanding of math or science. But many others would not.

    I’m afraid this goes down a particularly thorny path at the intersection of philosophy, economics, and pragmatic educational practices. How should we allocate limited resources (i.e. class time in high school) to maximum benefit, especially considering that different students have different strengths and different aspirations? The fact that many (all?) high school students are given some leeway in which elective courses they can pick (after meeting certain requirements) is a sensible solution–and perhaps the only sensible solution.

    -If you’ve driven lately, I think you’d be as adamant as I am that many people need _more_ (and/or more rigorous) driver’s ed, NOT less!

    I think your idea that this particular study is more along the lines of engineering rather than ‘pure’ science is an insightful one. I’ve faced this very issue with my dissertation; both the ‘applied/engineering’ approach and the ‘intellectual discovery’ approach are useful–but it seems that one is more highly valued (at least by some people and journals) than the other.

    Perhaps there should be more philosophical conversations about this distinction, along with a sharper dividing line between publications that emphasize the ‘applied’ approach and those that emphasize an ‘intellectual’ approach. I agree with you that the two are complementary; however, some seem to value one of these at the expense of the other.

Leave a Reply to Marcus Cancel reply

Your email address will not be published. Required fields are marked *