PPNAS again: If it hadn’t been for the jet lag, would Junior have banged out 756 HRs in his career?

In an email with subject line, “Difference between “significant” and “not significant”: baseball edition?”, Greg Distelhorst writes:

I think it’s important to improve statistical practice in the social sciences. I also care about baseball.

In this PNAS article, Table 1 and the discussion of differences between east vs. west and home vs. away effects do not report tests for differences in the estimated effects. Back of the envelope, it looks to me like we can’t reject the null for many of these comparisons. It would have been easy and informative to add a column to each table testing for differences between the estimated coefs.

Overall, I think the evidence in Table 1 is more consistent with a small negative effect of jet lag (no matter whether home/away or east/west).

Or perhaps I am missing something. In any case, I know you collect examples of questionable statistical practice so I wanted to share this.

My reply: I’m starting to get tired of incompetent statistical analyses appearing in PPNAS. The whole thing is an embarrassment. The National Academy of Sciences is supposed to stand for something, no?

Just to be clear: this is far from the worst PPNAS paper out there, and much of what the authors claim could well be true; jetlag could be a big deal. But many of the specifics seem like noise mining. And we shouldn’t be doing that. It would be better to wrap your comparisons into a multilevel model and partially pool, rather than just grab things that are at “p less than .05” in the raw data and then treat so-called non-significant comparisons as if they’re zero.

Again, no big deal, nothing wrong with this sort of thing published—along with all the raw data—and then others can investigate further. But maybe not in a journal that says it “publishes only the highest quality scientific research.”

Look, I do lots of quick little analyses. Nothing wrong with that. Not everything we do is “the highest quality scientific research.” To me, the problem is not with the researchers doing a quick-and-dirty, skim-out-the-statistically-significant-comparisons job, but with PPNAS for declaring it “the highest quality scientific research.” What’s the point of that? Can’t they just say something like, “PPNAS publishes a mix of the highest quality of scientific research, some bad research that slips through the review process or is promoted by an editor who is intellectually invested in the subject, and some fun stuff which might be kinda ok and can get us some publicity”? What would be wrong with that? Or, if they don’t want to be so brutally honest, they could say nothing at all.

I teach at Columbia University, one of the country’s top educational institutions. We have great students who do great work. But do we say that we admit “only the highest quality students”? Do we say we hire “only the highest quality faculty”? Do we say we approve “only the highest quality doctoral theses”? Of course not. Some duds get through. Better, to my mind, to accept this, and to work on improving the process to reduce the dud rate, than to take the brittle position that everything’s just perfect.

On the specific example above, Distelhorst followed up:

There may also be some forking paths issues around the choice of what flight lengths are coded as potentially jet-lag-inducing.

Ironically I am a Seattle Mariners fan and they often have one of the worst flight schedules in MLB. I would like everyone to believe that their 15 year playoff drought could be blamed on jet lag…

If it hadn’t been for the jet lag, Junior certainly would’ve banged out 756 HRs in his career!

14 thoughts on “PPNAS again: If it hadn’t been for the jet lag, would Junior have banged out 756 HRs in his career?

  1. Good timing with this post, as I was recently sent a paper that involves a lot of familiar players–your university of Columbia, your favorite journal PPNAS, and your favorite person Susan Fiske (editor).

    The irony is strong with this one, the paper is titled “Perceived social presence reduces fact-checking”, available here: http://www.pnas.org/content/114/23/5976.full

    A fellow terrorist I met at SIPS said he ran the paper through statcheck, http://statcheck.io/, and found 9 errors. I confirmed there are 9 statcheck errors, and 1 additional statcheck error if you include the supplement.

    As always with anomaly detection software, the problems identified are always a lower bound and should be viewed as the tip of the iceberg. So it wasn’t surprising that looking through this paper I found that the degrees of freedom listed are a complete train wreck, with Experiment 4 being particularly egregious.

    The text didn’t contain enough information to reproduce the values, but to the authors’ credit they made the raw data publicly available at https://osf.io/x4k6w/. I recalculated some values with this data to see if we are dealing with a Wansink situation here, and found the values either correct, within .01 of being correct (maybe rounding error), or off by a couple hundredths (not sure what happened there). I didn’t think it was worth recalculating every number as I’m not planning on contacting the journal as I’m sure they’ll say the errors don’t affect the conclusions, and besides, the raw data is publicly available if anyone wants to confirm or extend the findings.

    The study may be well-designed (the sample sizes seem large)–I don’t know–but I just do not understand how someone can publish a paper with this many independent elementary errors. I lose sleep when I have a single typo in a preprint, and yet here we have a paper published by a journal that publishes only “the highest quality scientific research” and edited by the great Susan Fiske, and it may contain several dozen typos/miscalculations.

    The irony is that all the journal had to do is run statcheck on the paper and maybe they would have realized all the numbers need to be checked. And Susan Fiske knows about statcheck as she has fearmongered about the software in interviews.

    Maybe the errors slipped into the paper because the authors, reviewers, and editors were on social media at the time, which reduced their fact-checking.

    • With minor errors like that you need to remember this data has been emailed around as a spreadsheet, copy pasted in various ways, opened in different versions of excel, etc. I have seen excel do some pretty ridiculous stuff to my data just by opening for a second to take a look and making the mistake of saving afterwards. I have also seen it save different values than it displays…

        • Degrees of freedom be damned! These psychologists are amazing. They conducted 56 hypothesis tests, and the results of every single one were exactly as their theory predicted. Every test that should have been significant was (and the effect was in the predicted direction), and every test that should not have been significant wasn’t. These folks are so good they never even make a Type I or Type II error. Clearly, this work deserves to be in PPNAS!

  2. “In fact, the home-team eastward travel effect (−3.5%, P < 0.05) was comparable in magnitude to this home-field advantage (+3.9%). Thus, if the home team traveled two time zones east, and the away team was visiting from the same time zone, the home-field advantage was essentially nullified. On the other hand, the effect of traveling west was smaller and did not reach statistical significance (−2.0%, P = 0.11), suggesting direction selectivity."

    No, that suggests post-hoc subgroup analysis, rather than science. It's impossible to detangle from the franchise effect that east cost teams and west coast teams have different spending patterns.

  3. But do we way that we admit “only the highest quality students”?

    Should say “But do we say* …” (Just pointing this out, because it threw me off when I read it, not trying to be a pedant).

  4. LOL at all the careful distinctions between what might be implied by p = 0.03 vs p = 0.04 vs p = 0.07 vs p = 0.11 from a sample size of 46,535.

  5. > But do we say that we admit “only the highest quality students”?

    Well, Columbia says:

    “This year’s 2,193 admitted students, selected from the largest applicant pool in Columbia’s history, amazed and humbled us with their exceptional accomplishments in and out of the classroom, their adventurous intellectualism and their commitment to a better society.

    But we are confident that the Class of 2020 brings that unique combination of academic ability, leadership skills and personal characteristics that have distinguished Columbians over the years, and it makes today truly one of the most rewarding days for us in the Offices of Undergraduate Admissions and Financial Aid and Educational Financing.”

    This seems functionally indistinguishable from “only the highest quality students”

    • D,

      To the extent that Columbia engages in hyperbole, I don’t like it either!

      But I don’t think the above quote is as hyperbolic as PPNAS. The Columbia quote is a statement about the population of admitted students. It does not say that every student admitted is of the highest quality, only that the class as a whole is exceptional.

      If PPNAS were to say that the ensemble of its papers represented the highest quality research, I wouldn’t be so bothered. Given the low quality of many highly-publicized PPNAS papers in recent years, I might dispute the claim, but at least the claim would be arguable: after all, an ensemble of papers will have a mix of qualities, and to make any counter-claim about the ensemble I’d have to gather some systematic evidence. But PPNAS says that they publish only “the highest quality scientific research.” That claim is obviously false.

Leave a Reply to Cody L Custis Cancel reply

Your email address will not be published. Required fields are marked *