Air rage update

So. Marcus Crede, Carol Nickerson, and I published a letter in PPNAS criticizing the notorious “air rage” article. (Due to space limitations, our letter contained only a small subset of the many possible criticisms of that paper.) Our letter was called “Questionable association between front boarding and air rage.”

The authors of the original paper, Katherine DeCelles and Michael Norton, published a response in which they concede nothing. They state that their hypotheses are “are predicated on decades of theoretical and empirical support across the social sciences” and they characterize their results as “consistent with theory.” I have no reason to dispute either of these claims, but at the same time these theories are so flexible that they could predict just about anything, including, I suspect, the very opposite of the claims made in the paper. As usual, there’s a confusion between a general scientific theory and some very specific claims regarding regression coefficients in some particular fitted model.

Considering the DeCelles and Norton reply in a context-free sense, it reads as reasonable: yes, it is possible for the signs and magnitudes of estimates to change when adding controls to a regression. The trouble is that their actual data seem to be of low quality, and due to the observational nature of their study, there are lots of interactions not included in the model that are possibly larger than their main effects (for example, interactions of plane configuration with type of flight, interactions with alcohol consumption, nonlinearities in the continuous predictors such as number of seats and flight difference).

The whole thing is interesting in that it reveals the challenge of interpreting this sort of exchange from the outside. how it is possible for researchers to string together paragraphs that have the form or logical argument, in support of whatever claim they’d like to make. Of course someone could say the same about us. . . .

One good thing about slogans such as “correlation does not imply causation” is that they get right to the point.

41 thoughts on “Air rage update

  1. Just to be clear, though, it looks like you are agreeing with their general point about suppression, as illustrated by the intelligence/boredom/performance example? The thing that’s confusing is that your published criticism really emphasizes suppression in general, whereas this blog post suggests that the real issue is about the low quality of the data and the presence of other potential interactions?

    • Dmitri:

      I don’t agree with the authors’ point about suppression. The paper has many problems including data quality but there was not space to go into all these problems in the letter. My coauthors wanted to focus on the suppression issue and that was fine with me. To put it another way: If a project starts with data problems, then there will typically be lots of statistical analysis problems too.

    • Hi Dmitri,

      There are lots of problems with this air rage study. But we had only 500 words, so we had to choose just one, and we didn’t have enough words to do a great job on even just that one. I pushed for suppression because that is one of my interests.

      Carol

      • For what it’s worth I am an outsider who generally likes this blog and is sympathetic to its perspective. The trouble I had is that I thought the response gave a pretty good example of plausible suppression — for equally motivated people, intelligence helps with widget production; but intelligence leads to boredom which leads to a decline in widget production. OK, sounds plausible.

        But then the question becomes “is this a plausible example of suppression or not?” And here it is hard to see what the argument is other than “suppression is generally considered an artifact.” As written, this looks a bit like an argument from communal authority. I have no doubt there are good reasons for statisticians to be suspicious of suppression but from an abstract rhetorical/debating perspective the argument reads as being a little weak.

        Maybe what I think is that if your real view is “this is crap science,” then it’s sort of dangerous to make a very focused and technical critique? Because the very habits of mind that lead people to do crap science are going to lead them to make crap responses to criticisms — responses that might be plausible in the abstract but are misguided in this particular case, etc.

        • Hi Dmitri,

          Here is what DeCelles and Norton wrote:

          “Consider one of many reasonable instances of interpretable suppression effects, the iconic example of the relationship between workers’ intelligence and performance on widget assembly (10) (Fig. 1A): The positive association between intelligence and performance is suppressed by intelligence’s association with boredom (which relates to lower
          performance). Without taking the suppression effect into account, researchers might draw erroneous conclusions regarding the relationship between intelligence and performance.”

          DeCelles and Norton are referring to McFatter’s (1979) example of predicting the “number of errors” made by assembly-
          line workers as a function of “IQ” and “intolerance of boredom.” This is not an “iconic” example of suppression, if “iconic” is taken to mean “widely recognized and well-established.” Moreover, it is not an empirical example but a hypothetical one, and not a particularly believable or reasonable one, given that it was constructed so that the beta for “intolerance of boredom” (in the regression predicting “number of errors” from “IQ” and “intolerance of boredom”) would equal the correlation between “IQ” and “intolerance of boredom.” The correlation between “number of errors” and “intolerance of boredom” was given as .3535; the correlation between “number of errors” and “IQ” was given as zero; and the correlation between “IQ” and “intolerance of boredom was given as .707. If one were to compute the betas, they would equal .707 for “intolerance of boredom” and -.500 for “IQ.” I’d be a lot more convinced by a real example.

          Note that one problem is that this is an example of classical/traditional suppression (because the correlation between the
          criterion and one of the predictors is zero), but our comment focused on negative/net suppression.

          This may be an example of an “interpretable suppression” effect; we never said that suppression is never interpretable. We said that suppression effects are *generally* considered statistical artifacts *unless* there is a strong theoretical explanation for them. DeCelles and Norton did not provide any such explanation for why negative suppression would be expected in their air-rage study. I think that they did not even notice the suppression until we commented on it, and then they scurried around the literature to see what they could find to refute our letter. It is clear to me that they really don’t understand suppression.

          That having been said, let me say also that we could have made a stronger case if we could have gotten the data.

          Carol

  2. Is it just me, or do other people see opening sentences like “We appreciate Crede et al.’s attention to our research” as a veiled taunt? It’s like “Ha! You think our research sucks but you still read it! So who’s the real winner here?”

    • Jordan:

      No, I don’t see it that way. My guess is that authors of this sort of weak paper—especially those who respond without admitting any errors—find criticism to be upsetting and possibly threatening, not an occasion for fun at all, and that they use a phrase such as “We appreciate . . .” out of courtesy and politeness.

      • I guess the statement reminds me of how talk show hosts sometimes respond to someone criticizing them. They’ll say things like “Glad to know you’re a fan of the show” or something along those lines.

        There’s a quote from Howard Stern’s movie that listeners who hated him listened longer than listeners who liked him. I don’t know how true that is, but it is applicable to the Wansink case. People who love his work have probably never read a single one of his papers, while his critics have carefully gone through dozens.

        So when I read that statement I see it as an acknowledgment that the authors know you only read their work because you thought it was BS.

  3. I always comment as just plain “Carol” but now I’ve been outed!

    Some time ago I went through the DeCelles and Norton reply to our letter and made a list of all the errors in it. I also did some additional analyses showing that their results were anomalous. Shortly I will bring this stuff off the back burner and write it up as an official response to their reply to our letter. I don’t think that PNAS would publish it, though. I believe that PNAS ends the exchange with the reply from the authors of the critiqued article (i.e., article, letter, reply, END). Also, the “retort” is considerably longer than the 500 words allowed for letters and replies. Does anyone have an idea where we could send this or post this? Andrew suggested this blog (as a guest post?), so that’s one possibility. Others?

    Aside: Susan Fiske was the editor for the original article, our letter, and the reply to our letter. Fiske argues strongly against post publication review on the social media, insisting that letters (comments) published in journals (after peer review) are preferable. One reason to prefer the social media is that length is not restricted. We were allowed only 500 words for our letter — and this was strictly enforced — but the original article was considerably longer and had supporting information as well.

    Another aside: One of the two authors — Michael Norton — had Susan Fiske as a member of his dissertation committee (PhD Princeton, 2002). Should not Fiske have recused herself as editor because of the strong personal connection? I note that Fiske is also the editor on a more recent PNAS article also co-authored by Norton.

    http://www.pnas.org/content/114/32/8523.full.pdf?sid=dc8d05a2-c4f8-439c-a5b8-12de1e42d0d2

    Thanks!

    Carol

    • Thanks for all of your work on this Carol & Andrew! I think posting as a blog post in this case makes a lot of sense. Or, depending on your level of concern/interest, publishing in another outlet for a different reach/audience (e.g., the conversation; or trying to write something more accessible for an op-ed) could also work. I definitely agree with you that there should have been a different editor on this, seems to dance a very treacherous ethical line. However, I would note that, at least on a superficial peak at methodology, the Whillans ‘buying time’ article that you linked seems to be a bit better quality (e.g., larger, more diverse samples; pre-registered studies, etc.). Though, that says nothing of the rigor of the work/treatment of data of course. So, might not be Norton so much as the other authors that are really driving the actual work here…Nevertheless, seems very questionable to have the same editor handle multiple papers when they have a connection to the senior author.

    • Carol said,

      “One of the two authors — Michael Norton — had Susan Fiske as a member of his dissertation committee (PhD Princeton, 2002). Should not Fiske have recused herself as editor because of the strong personal connection? I note that Fiske is also the editor on a more recent PNAS article also co-authored by Norton.”

      My impression is that PNAS is often used as a means for NAS members to get their students’ work published quickly — part of the old boys’ (and a few girls’) network. That’s why I think it’s a good idea to read things in it particularly carefully.

    • Carol: They might publish this kind of thing at EconJournal Watch. I believe their mission statement says that they are interested in discussions from outside of economics, if of interest to economists, but you’ll have to look that up yourself.

    • Hi Shravan,

      I requested the data. DeCelles told me: “We are in legal agreements not to provide data or additional details for non research team members.”

      The article itself states that the data are “private” and “proprietary.”

      Carol

    • The PPNAS instructions for authors includes the statement:
      To allow others to replicate and build on work published in PNAS, authors must make materials, data, and associated protocols, including code and scripts, available to readers. Authors must disclose upon submission of the manuscript any restrictions on the availability of materials or information. Authors must include a data availability statement in the methods section describing how readers will be able to access the data, associated protocols, code, and materials in the paper. Authors are encouraged to deposit laboratory protocols and include their DOI or URL in the methods section of their paper. Data not shown and personal communications cannot be used to support claims in the work. (emphasis added)

      I could not find such a data availability statement in the methods section. I did see a reference to “private database”. But, even if they could not release the data, they could release their code—review of which might provide further insights.

      Bob

    • Doesn’t the Canadian funding agency have requirements about data release?

      Yes it does. See http://www.sshrc-crsh.gc.ca/about-au_sujet/policies-politiques/statements-enonces/edata-donnees_electroniques-eng.aspx (Modified 2016-12-09)

      Social Sciences and Humanities Research Council
      Research Data Archiving Policy

      All research data collected with the use of SSHRC funds must be preserved and made available for use by others within a reasonable period of time. SSHRC considers “a reasonable period” to be within two years of the completion of the research project for which the data was collected.

      There are various caveats as one would expect. It may be that the data release is covered under one of these exoceptions. A query to SSRC might clarify this.

        • Hi jkrideau and Shravan,

          Very interesting but perhaps this requirement doesn’t apply here. DeCelles and Norton did not collect the data for the air-rage study themselves. They obtained the data on air-rage incidents and boarding patterns from the airlines and purchased the flight information from OAG Worldwide.

          Carol

        • In this context, Amy Cuddy deserves a lot of credit for not only releasing her data, but hiring an independent analyst, who showed that her data’s main claim didn’t hold up.

        • And the text says:
          We obtained the data on air rage incidents and boarding patterns for flights directly from the airline after entering into a confidentiality agreement. We matched these data to a population of
          flights from a proprietary dataset purchased from OAG Worldwide

          This explains why they cannot release the data and SSRC may have agreed to the conditions when approving the funding. In some circumstances this seems reasonable and it may well be here.

          However why did PNAS accept it?

        • Hi jkrideau,

          I don’t know. PNAS states that the data must be made available. Perhaps there is some sort of editor discretion, but this is not explicitly stated. It is not unknown that journals require data and that some researchers get around this requirement. That’s part of the battle over the PACE trial. PLOS ONE requires that the data must be made available but the authors refuse to furnish them.

          Carol

  4. “due to the observational nature of their study, there are lots of interactions not included in the model that are possibly larger than their main effects”

    Did you mean to write confounders instead of interactions?

    • To Z, I think he did mean interactions, but IMHO it comes down to the same thing. In most observational studies, the chances are high that interactions will produce “impossible” signs and magnitudes, which strongly suggests that there is confounding going on. (Alternatively, if even “impossible” signs and magnitudes are “consistent with theory”, then the theory is unfalsifiable and thus it is nonsense to claim that it is being tested.)

    • Kinda the same thing, uncontrolled interaction can confound. When interactions are not accounted for main effects can be exaggerated (IIRC, I’m happy to be corrected).

  5. Out of curiosity, what would the ideal response read like? Let’s assume the response isn’t an author retraction of the original piece, but that the authors do agree that maybe the suppression issue and other things cast doubt on the conclusions.

    • Jacob:

      I think an ideal response would be something like this:

      We thank the authors of this discussion along with others [cite, for example, this post by John Walton] for pointing out serious errors in our paper. We continue to feel that much of value can be learned about social interactions from quantitative study of real-world environments such as airline cabins, but we recognize that our data and analysis were insufficient to support the claims made in our paper. We also appreciate the efforts of various critics given that, for business reasons, we were not able to release the raw data used in our paper.

  6. “We also appreciate the efforts of various critics given that, for business reasons, we were not able to release the raw data used in our paper.”

    If the data are so sensitive that they cannot be released, why aren’t the “results” too sensitive to be publicly available?

Leave a Reply

Your email address will not be published. Required fields are marked *