The problems are everywhere, once you know to look

Josh Miller writes:

My friend and colleague Joachim Vosgerau (at Bocconi) sent me some papers from PNAS and they are right in your wheelhouse. Higher social class people behave more unethically.

I can certainly vouch for the jerky behavior of people that drive BMWs and Mercedes in Italy (similar to Study 1&2 in Piff et al. but their graph doesn’t include error bars). This seems to be true all around the world, which is strange, because personally I feel more comfortable asserting myself if I am driving a clunker.

The fact that they write P<.05 in study 1&2, and then write P<.04 in study 3, as if that is stronger evidence, shows that somebody didn’t get the memo. Is this just a bad habit, or does it signal unreported studies, forking paths, and other shenanigans?

The two papers are High economic inequality leads higher-income individuals to be less generous, PPNAS 2015, by Stephane Cote, Julian House, and Robb Willer, and Higher social class predicts increased unethical behavior, PPNAS 2012, by Paul Piff, Daniel Stancato, Stephane Cite, Rodolfo Mendoza-Denton, and Dacher Keltner.

Without looking at the papers in detail, I am indeed suspicious of all evidence presented there based on p-values, also you have to watch out for comparison-between-significant-and-non-significant statements such as, “Higher-income participants were less generous than lower-income participants when inequality was portrayed as relatively high, but there was no association between income and generosity when inequality was portrayed as relatively low.” All in all, these papers are following the standard paradigm of grab some data and look for statistically significant comparisons—with all the problems that entails. Again, without commenting on the specific claims in these publications, I think they’re using methods that are setting themselves up to find and promulgate spurious results, that is, patterns that occur in their particular datasets but which don’t reflect the general population.

The more recent paper concludes in a blaze of interactions. From a substantive point of view, I’m supportive of this effort: As I’ve said many times, interactions are important, and we should expect large effects to have large interactions. But from a statistical perspective, I’m wary of methods that search for interactions by sifting through data and pulling out statistically significant comparisons. What you end up with is one particular story that fits various aspects of the data, without a recognition that many other, completely different stories would also fit. I guess this is a good candidate for a preregistered replication, but I wouldn’t be so optimistic about the results.

P.S. Vosgerau adds:

In discussions about the crisis in the behavioral sciences we talk a lot about the devastating consequences of small sample studies, p-hacking, forking paths, and fraudulent data. There is, however, another much simpler “cheating” technique which seems to be equally damaging to the self-correcting mechanism of science: bluntly ignoring or grossly misrepresenting counter-evidence. I am curious about your perspectives on this.

To illustrate my point with an example: Paul Piff, Stephane Côté and colleagues published a paper in JPSP (2010) alleging that low socio-economic status (SES, i.e., lower income and education) causes people to behave more prosocially, because people of low SES are more inter-dependent and socially attuned than high SES people. Two years later Piff et al. doubled down on their claim that the rich and educated are bad people by further demonstrating that people of high SES are more likely to behave unethically (PNAS 2012). The findings proved popular in the media (NYT 2010, Economist 2014) and caused more papers to be published in JPSP (2015) and PNAS (2015). All evidence in these papers is based on small-sample studies.

The surprising thing is that there is overwhelming counter-evidence showing that people of lower SES behave less rather than more prosocially than those of higher SES. From an economics’ perspective, it should be the rich who can afford to be generous, behave morally, and donate more time and money than the poor (Trautmann et al. 2013). Korndörfer et al. (PLOS-One 2015) give a comprehensive review of the extant counter-evidence. They also present 8 new studies testing the SES–prosocial behavior link with large, representative, international samples, for example they show that higher SES individuals are more likely to volunteer in charitable activities across 30 countries.

The counter-evidence presented by Korndörfer et al. is completely misrepresented in a new PNAS paper by Stephane Côté et al. (2015) who hypothesize that higher SES leads to less prosocial behavior only when there is a lot of economic inequality (operationalized as GINI). The authors write (p. 15838): “Past findings are inconclusive, yet generally consistent, with the prediction that a negative relation between income and generosity emerges under conditions of higher inequality, and is attenuated or reversed under conditions of lower inequality. Many studies finding that higher-income individuals are less generous were conducted in California […], one of the most unequal US states […]. In contrast, recent investigations conducted in the Netherlands and Germany, where there is considerably less inequality […], found […] a positive association […] between income and how much participants in a “trust game” reciprocated the cooperative behavior of their partner (Korndörfer et al. 2015).”

This is a gross misrepresentation of Korndörfer et al.’s findings. Korndörfer tested not only whether SES is related to trust, but whether SES is related to donating, volunteering, and helping. And they tested the SES―volunteering link not only in the Netherlands and Germany but in 30 countries, including the US!

It seems that such misrepresentation of counter-evidence does as much damage as small sample studies, p-hacking, forking paths, and fraudulent data do. But nobody talks about it, and authors face no reprehensions when publishing papers that ignore and/or misrepresent counter-evidence. Should we do something about this, and if so, what?

PPNAS . . . The whole thing reminds me of the “air rage” episode in that people seem eager to find evidence in support of a social theory that fits their model of the world. That’s fine, but if you want to do this, I think it’s better to set up a model of your theory, fit it, and test it. Not to reject a null model that you never wanted to work with in the first place.

18 thoughts on “The problems are everywhere, once you know to look

  1. The Piff et al. (2012) PNAS paper was one of my early reports about a set of findings appearing “too good to be true”. The analysis is at

    http://www.pnas.org/content/109/25/E1587.full?ijkey=f5e34f999dbaf769e2c6130b80167ca1ef624ca1&keytype2=tf_ipsecsha

    A response from Piff et al. is at

    http://www.pnas.org/content/109/25/E1588.short?trendmd-shared=0

    My rebuttal is at

    http://www2.psych.purdue.edu/~gfrancis/Publications/FrancisRebuttal2012.pdf

    • Greg:

      I’m curious . . . did the editor for that PPNAS paper, Richard Nisbett, ever comment on the episode, for example saying that he regretted accepting the paper, or that he agreed with you on the paper’s flaws but he thought it was reasonable to accept it at the time of submission because he (Nisbett) had not at that time been so attuned to publication bias?

      • The submission process did not even identify the editor who handled my Letter, so I do not know that Nisbett even saw it. Except for boilerplate comments, the feedback from the editor was: “This is a valuable letter, and I would like to see it appear in PNAS. It has been reviewed by one of the world’s leading authorities in meta-analysis, who agrees with the main point but suggests some ways in which the statement of the point might be honed.” That anonymous reviewer did have some good suggestions, which I largely followed.

        • Greg:

          I was thinking that, even if Nisbett didn’t handle your letter, he’d be interested in learning about this because he erred in accepting the original paper for the journal. We usually like to learn from our mistakes.

        • Greg:

          That a “world’s leading authorities in meta-analysis” was consulted on replication assessment seems promising for two reasons.

          1.Its avoids the usual miss-perception that assessing replication is a secondary rather than primary task i.e. “_In addition_ to providing an estimate of the unknown common truth, meta-analysis has the capacity to contrast results from different studies” https://en.wikipedia.org/wiki/Meta-analysis Why in world would one want to combine non-replicating studies to get spurious precision?!

          2. Meta-analysis might come to be seen as a worthwhile topic to publish in mainstream statistical journals and even teach in graduate programs.

          (I think it is likely doing better in Psychology although the need to use effects sizes makes for a largely different technology.)

    • Greg: that’s an excellent piece of academic jiu jitsu in the conclusion of your rebuttal, namely that if you don’t buy the evidence of bias (from whatever source) then you don’t buy the underlying inference logic of the original piece. Nicely done.

    • It seems like the Dr. Francis’ method could be a sensible filter for any paper with more than one hypothesis reported. Or, Dr. Gelman’s S and M tests. Either would/could be relatively simple to conduct by the editor or journal staff before sending to reviewers, a statistical smell test.

  2. I think the answer to “should we do something about this, and if so, what?” is pretty clearly ‘improve the peer review process’. The only way to know if someone is misrepresenting (or entirely leaving out) relevant background material or counter-evidence is for a person knowledgeable in the area to say so. Whether it’s pre or post publication, someone should have said ‘hey, all this stuff about Korndorfer is off base’. The article’s authors would presumably have some comment, and eventually people (journal editors, readers) could decide to change the paper, devalue it, whatever.

  3. >”Should we do something about this, and if so, what?”

    1) Accept that, with a small percentage of holdouts, modern research institutions and their satellite industries (eg publishing) are taken over by cranks, quacks, and (a smaller number of) snakeoil salesmen.

    2) Don’t bother trying to fix it, if this was doable it would have happened decades ago. They didn’t reason themselves into their self-created mess, and won’t be reasoned out of it. Instead make that model of research and info dissemination obsolete by working outside it. It is probably too much to hope for that system to fade into complete obscurity, but hopefully they can be limited to a role somewhat similar to the catholic church in US. They won’t have the general legitimacy to tell everyone what to do, litigate how you need to live your life, use taxes to fund their efforts to push pseudoscience, etc.

    There is progress on this front, especially the role of the internet as it allows remote lectures and easier propagation of the good information via people sharing links to papers, etc. Of course, the overall signal/noise is still quite low so there is a lot of room for improvement.

  4. Côté et al. wrote: “We propose that this pattern emerges only under conditions of high economic inequality, contexts that can foster a sense of entitlement among higher-income individuals that, in turn, reduces their generosity.”

    Of course, had the opposite pattern held (i.e., rich people in more equal states are less generous), they could have written, “We propose that this pattern emerges only under conditions of low economic inequality, contexts that can foster a sense among higher-income individuals that they do not need to be generous”. Andrew, your insight into this issue, as documented in several of your previous posts, has become a very useful part of my skeptical armoury.

    Another possibility, I suppose, is that income equality is, at least partly, a mechanical consequence of people not giving their money away. As the Bill Gates character in an episode of the Simpsons (not voiced by the actual Bill Gates, although I suspect he might get an appearance today if he asked) said, “I didn’t get rich by writing a lot of cheques”.

  5. I’m surprised Doob and Gross (1968) weren’t mentioned in PPNAS 2012….? And Josh Miller might want to rethink his “clunker” behavior based upon some of the data of Doob and Gross. ;-)

    Status of frustrator as an inhibitor of horn-honking responses.
    Doob, Anthony N.; Gross, Alan E.
    The Journal of Social Psychology, Vol 76(2), 1968, 213-218. http://dx.doi.org/10.1080/00224545.1968.9933615

    Best,
    Willy

    • Hi Willy

      Interesting paper, looks like a fun study to run.

      I love these two quotes from the paper:

      – “Two low status cars were used: a rusty 1954 Ford station wagon and an unobtrusive gray 1961 Rambler sedan. The Rambler was substituted at noon because it was felt that subjects might reasonably attribute the Ford’s failure to move to mechanical breakdown.”

      – “two cars in the low status condition, instead of honking, hit the back bumper of the experimental car, and the [experimental] driver did not wish to wait for a honk.”

      A quick look shows their tests are redundant (e.g. “When no honks are counted as a latency of 12 seconds”), so it boils down to 18 out of 36 honking at a high status car vs. 32 out of 38 honking on the low status. Not a terribly compelling sample size given the field setting, lack of randomization, and the study selection bias which I am sure existed in 1968.

      Interesting they choose the Imperial. Is it status, is it the smaller back window, or is it because no one wants to mess with the Black Beauty?! (Maybe it works the same in Italy—the BMW is more likely to be mafia!)

      Actually I doubt that people are more jerky in BMW’s because they expect deference to their status. I bet they are more likely to be in hurry, or stressed out, because of what they need to do to maintain their status.

  6. Much of the problem is they don’t realize they’re starting with a biased coin or rather that they think they’re counting the first coin flip from 0 when they’re actually starting with some version of heads or tails and that simply creates potential for “hot hand” thinking in which a short train of measured events, such as a series of interviews in which a question chain is started and ended, can generate this result just because the next coin flip agrees with the start value. I say “version of heads or tails” because we’re dealing with some summary statistic to begin with and I assume the mechanism for flipping answers is too complex to be modeled accurately at this summary level. I particularly enjoy seeing this confusion when people set up opposing start points like “high-inequality” or “low-inequality”: that’s guaranteed to create an apparent chain or rather a series of apparent chains that you then evaluate as though they’re meaningful.

    This reminds of the misunderstanding of “natural selection” versus “cultural evolution”: a coin flip is “natural” but any context established by people includes the natural bias effect of any context that isn’t exactly a coin flip. This includes any proposal that this or that context is “natural”: unless it begins and ends with the very basic numbers, stuff like 1 and 0 and base 10, then there’s a transform required to make that fit the ideal. That’s where probability comes up: it attempts to translate the various biases built into context, into the selection of start and end point, into terms that can be used to evaluate magnitude, effect and sometimes position in a causal chain. But it’s so easy to be misled when you don’t even realize you’re starting some place and that, at the barest minimum, is equivalent to a heads or tails result and thus any further heads or tails result when you cut the context off is going to be somewhere in a band defined by the sum of all the coin flip chains that fit.

  7. Here in response to Vosgerau’s comments we explain the logic of our literature review in Côté, House, and Willer (2015) and why we reviewed past findings the way that we did. We also describe how findings that we did not include in our literature review fit with our model.

    Given the short format of this article, we restricted our literature review to only prior studies that used what we considered the most valid measures of generosity: Studies featuring directly observed generous behavior (e.g., allocation behavior in a Dictator game or as second mover in a Trust game). We only reviewed Study 8 from Korndörfer, Egloff, and Schmukle (2015) because it is the only study in that paper featuring such a measure. We focused on measures of this type (and conducted our research using such measures) because of the limitations of alternative indices (self-reports of, for example, past charitable giving and time volunteered). Self-reports may be influenced by factors such as memory limitations and social desirability response bias (Côté, 2010; Podsakoff & Organ, 1986), and generous acts are among the most socially desirable acts (Willer, 2009). Further, there is reason to think that social desirability response bias may vary by social class. Recent studies found that higher income individuals were more prosocial than lower income individuals in a condition where they (ostensibly) communicated their name and city of residence to the recipient, which presumably created an incentive to engage in socially desirable behavior, but not in a condition where they remained anonymous (Kraus & Callaghan, in press). It is for these reasons that we omitted Studies 1-7 in Korndörfer et al. (as well as other studies on income and charitable giving, see Wiepking & Bekkers, 2012, for a review of this body of research).

    We also believe that volunteering, which Korndörfer et al. examined in Studies 4-6, may be a prosocial behavior more available to higher income individuals. Higher income individuals likely have greater time flexibility than lower income individuals who face greater financial pressures and are less able to afford child care. Further, as Korndörfer et al. themselves note (pp. 31-32), higher income individuals may have greater access to organizations that provide volunteering opportunities. Therefore, the relationship between income and volunteering may be different than the relationship between income and most other forms of generosity.

    In retrospect, we should have more clearly articulated the scope of our literature review, the limitations to the validity of self-report measures of generous behavior, and our belief that volunteering is a unique form of generosity that is more available to higher income individuals.

    Though likely less valid than directly observed generous behavior, we nonetheless see value in studies examining self-reported charitable donations such as Korndörfer et al.’s Studies 1-3. We believe that our model and results fit with their findings that, overall in the U.S. (Studies 2 and 3) and Germany (Study 1), upper class individuals donate a larger proportion of their income. In our work, we argued that the relationship between income and generosity varies by levels of inequality, so that the relationship becomes more negative at higher levels of inequality. In our studies, we also do not find a negative relationship between income and generosity in the U.S. as a whole (though in our data the relationship is null, not positive). The income-generosity relationship was negative only in U.S. states where inequality was above the average. Taken together with Korndörfer et al.’s studies, these findings suggest that the threshold level of inequality above which the relationship between income and generosity becomes negative is relatively high, higher than the level of inequality in the U.S. as a whole.

    The motivation for our research was to reconcile disparate findings on income and generosity in a way that assumed good faith on the part of all researchers and in the spirit of finding a contextual variable that reconciles past conflicting findings. Two recent laboratory studies also found that individuals who have more resources are less generous when inequality is high rather than low (Hargreaves Heap, Ramalingam, & Stoddard, in press; Nishi, Shirado, Rand, & Christakis, 2015). We believe that inequality is a promising moderating variable that can make sense of disparate findings on the link between income and generosity. We thus disagree with the characterization of our research as being reducible to “the rich and educated are bad people,” as we have clearly argued for a more nuanced view than this.

    In closing, we object to any suggestion that our limited literature review frame reflecting our concerns about the validity of self-reported measures of generous behavior is somehow akin to questionable research practices and the fabrication of data. We hope this note clarifies why we worded this sentence from our literature review in the way we did, and how our model accommodates other studies on income and generosity.

    Stéphane Côté and Robb Willer

    References

    Côté, S. (2010). Taking the “intelligence” in emotional intelligence seriously. Industrial and Organizational Psychology: Perspectives on Science and Practice, 3, 127-130.

    Côté, S., House, J., & Willer, R. (2015). High economic inequality leads higher income individuals to be less generous. Proceedings of the National Academy of Sciences, 112, 15838-15843.

    Hargreaves Heap, S., Ramalingam, A., & Stoddard, B. (in press). Endowment inequality in public goods games: A re-examination. Economics Letters.

    Korndörfer, M., Egloff, B. & Schmukle, S. C. (2015). A large scale test of the effect of social class on prosocial behavior. PLoS ONE, 10, e0133193.

    Kraus, M. K., & Callaghan, B. (in press). Social class and prosocial behavior: The moderating role of public versus private contexts. Social Psychological and Personality Science.

    Nishi, A., Shirado, H., Rand, D. G., & Christakis, N. A. (2015). Inequality and visibility of wealth in experimental social networks. Nature, 526, 426–429.

    Podsakoff, P. M., & Organ, D. W. (1986). Self-reports in organizational research: Problems and prospects. Journal of Management, 12, 531-544.

    Wiepking, P., & Bekkers, R. (2012). Who gives? A literature review of predictors of charitable giving. Part Two: Gender, family composition and income. Voluntary Sector Review, 3, 217-245

    Willer, R. (2009). Groups reward individual sacrifice: The status solution to the collective action problem. American Sociological Review, 74, 23-43.

Leave a Reply

Your email address will not be published. Required fields are marked *