Crossfire

The Kangaroo with a feather effect

OK, guess the year of this quote:

Experimental social psychology today seems dominated by values that suggest the following slogan: “Social psychology ought to be and is a lot of fun.” The fun comes not from the learning, but from the doing. Clever experimentation on exotic topics with a zany manipulation seems to be the guaranteed formula for success which, in turn, appears to be defined as being able to effect a tour de force. One sometimes gets the impression that an ever-growing coterie of social psychologists is playing (largely for one another’s benefit) a game of “can you top this?” Whoever can conduct the most contrived, flamboyant, and mirth-producing experiments receives the highest score on the kudometer. There is, in short, a distinctly exhibitionistic flavor to much current experimentation, while the experimenters themselves often seem to equate notoriety with achievement.

It’s from Kenneth Ring, Journal of Experimental Social Psychology, 1967.

Except for the somewhat old-fashioned words (“zany,” “mirth”), the old-fashioned neologism (“kudometer”) and the lack of any reference to himmicanes, power pose, or “cute-o-nomics,” the above paragraph could’ve been written yesterday, or five years ago, or any time during the career of Paul Meehl.

Or, as authority figures Susan Fiske, Daniel Schacter, and Shelley Taylor would say, “Every few decades, critics declare a crisis, point out problems, and sometimes motivate solutions.”

I learned about the above Kenneth Ring quote from this recent post by Richard Morey who goes positively medieval on the recently retracted paper by psychology professor Will Hart, a case that was particularly ridiculous because it seems that the analysis in that paper was faked by the student who collected the data . . . but was not listed as a coauthor or even thanked in the paper’s acknowledgments!

In his post, Morey describes how bad this article was, as science, even if all the data had been reported correctly. In particular, he described how the hypothesized effect sizes were much larger than could make sense based on common-sense reasoning, and how the measurements are too noisy to possibly detect reasonable-sized effects. These are problems we see over and over again; they’re central to the Type M and Type S error epidemic and the “What does not kill my statistical significance makes it stronger” fallacy. I feel kinda bad that Morey has to use, as an example, a retracted paper by a young scholar who probably doesn’t know any better . . . but I don’t feel so bad. The public record is the public record. If the author of that paper was willing to publish his paper, he should be wiling to let it be criticized. Indeed, from the standpoint of the scientist (not the careerist), getting your papers criticized by complete strangers is one of the big benefits of publication. I’ve often found it difficult to get anyone to read my draft articles, and it’s a real privilege to get people like Richard Morey to notice your work and take the trouble to point out its fatal flaws.

Oh, and by the way, Morey did not find these flaws in response to that well-publicized reaction. The story actually happened in the opposite order. Here’s Morey:

When I got done reading the paper, I immediately requested the data from the author. When I heard nothing, I escalated it within the University of Alabama. After many, many months with no useful response (“We’ll get back to you!”), I sent a report to Steve Lindsay at Psychological Science, who, to his credit, acted quickly and requested the data himself. The University then told him that they were going to retract the paper…and we never even had to say why we were asking for the data in the first place. . . .

The basic problem here is not the results, but the basic implausibility of the methods combined with the results. Presumably, the graduate student did not force Hart to measure memory using four lexical decision trials per condition. If someone claims to have hit a bullseye from 500m in hurricane-force winds with a pea-shooter, and then claims years later that a previously-unmentioned assistant faked the bullseye, you’ve got a right to look at them askance.

At this point I’d like to say that Hart’s paper should never have been accepted for publication in the first place—but that misses the point, as everything will get published, if you just keep submitting it to journal after journal. If you can’t get it into Nature, go for Plos-One, and if they turn you down, there’s always Psychological Science or JPSP (but that’ll probably only work if you’re (a) already famous and (b) write something on ESP).

The real problem is that this sort of work is standard operating practice in the field of psychology, no better and no worse (except for the faked data) than the papers on himmicanes, air rage, etc., endorsed by the prestigious National Academy of Sciences. As long as this stuff is taken seriously, it’s still a crisis, folks.

31 thoughts on “Crossfire

  1. That 1967 Ring quote is from a few years before the years Fiske (and I) did graduate study, and was well known at the time.

    But … was there a paradigm shift (as Fiske et al claim)? It’s not clear to me that there was a paradigm shift, but I don’t know what Fiske et al mean specifically by “Researchers rose to the challenges, and psychological science soldiered on.” It’s possible they meant that it’s more likely that an article will involve multiple experiments (looking for convergent validity), but that is an imperfect approach that didn’t save us from Bem/ESP.

    Perhaps the problem is that there isn’t much real theory in the field of social psychology. This isn’t unique; marketing is another social research field that doesn’t have a great deal of theory (and is similarly culture-bound), although in marketing there is the discipline of the marketplace — does this result help us sell stuff? No? Not useful.

    • Zbicyclist:

      Someone could perhaps ask Fiske et al. directly what they meant by “Researchers rose to the challenges, and psychological science soldiered on.” My guess that they meant nothing specific here, that this was just an empty platitude. I looked up the phrase “soldiered on” and found this: “To continue to do something, especially when it is difficult or tedious; persevere.” Your guess is as good as mine how this applies to “psychological science.”

      • I don’t know about soldiering on, but my understanding is that there really were big changes to social psychology in response to one of the previous crises. In the early to mid 1970s some social psych work was criticized as basically directing subjects to behave a certain way. If you tell people to think like a racist and then answer a survey about racism, it turns out that people seem to exhibit racist traits. (I am simplifying here, but subjects are pretty good at guessing what researchers are looking for, and they often try to provide it for them.) Even without blogs and PubPeer, critics of the time had pretty harsh judgments on this kind of research.

        The development of priming in social psychology was (to some extent) a response to those critiques. The idea was to make it so that subjects were not aware of what was being measured and thus could not “collaborate” with the researcher. Thus, you get studies like Srull & Wyer (1979) where exposure to scrambled sentences with kindness-related or hostility-related words subsequently influenced judgments about a neutral target person. From this perspective, weak primes were better than strong primes; and the best prime was one that the subject was completely unaware of! This is why social psych priming studies often have multiple experiments with each subsequent study showing an effect for a weaker manipulation. The researchers are not just going for crazier manipulations, they are systematically showing that subjects are unaware of what is effecting them.

        So, it really is the case that there have been previous crises and that the field genuinely did respond to them. The problem is that the response (in this case) ended up being to generate a bunch of effects by using p-hacking and the garden of forking paths.

        It must be frustrating for Fiske et al; they fought for legitimacy by responding to the critics. Now, they find they have to do it all over again. On the other hand, just because they fixed one problem does not mean they fixed them all; and good science seems to be much harder than many people realize. Moreover, they had a pretty good career even if their work is being discounted now.

        I should clarify that all of this was before my time (I was in grade school in the 1970s), and I am not aware of any formal description of the above story. I’ve gleaned it from conversations and discussion sections in papers. I may be wrong in many details.

        • Greg Francis: I don’t remember the 1970s social-psychology crisis, either, but a search turned up this:

          Muzafer Sherif (1977) Crisis in social psychology: Some remarks towards breaking through the crisis. PERSONALITY AND
          SOCIAL PSYCHOLOGY BULLETIN, 3, 368-382. It’s available online.

        • Direct link to Sherif paper: http://journals.sagepub.com/doi/pdf/10.1177/014616727700300305

          At least some of the response was to suggest multiple experiments for converging validity: “Triandis … analyzed the current crisis in social psychology and offered his ideas to overcome the impasse and parochial approaches through the use of cross-cultural comparisons and through the use of a combination of methods to insure validity.”

          On the whole, the Sherif complaints could be recycled today, just replacing the examples in the paper. One might substitute “power pose” for “small groups” in the paragraph below without much else needing change:

          “Yet what is the yield of substance established in the way of the social psychology of small groups in terms of valid generalizations and principles by the mid-seventies, even after the additional hundreds of research items since 1969? In appearance, this huge harvest looks impressive especially in technical surface, in the announced levels of significance
          of the findings. Yet when we start to separate golden kernels from the chaff to store them for the future, it is quite a different story. It does not add up to much. It is fragmented, too incoherent to fit together as a whole. Various evaluators among us commented on the defectiveness of the yield in substance and irrelevance of a considerable portion of it.”

        • I vaguely remember some things – left University in 1977 and returned in 1979 – a Psychology courses I looked forward to taking was no longer being offered as the topic was discovered to be based on flawed science. And the Personality Psy course in 1980 was essentially a postmortem overview of failed research on Personality. In the Social Psy course, I remember students raising concerns that the studies being over viewed in the course were all flawed to which the lecturer replied “trust me other better studies have verified the main claims” (after which I dropped the course). Should be a history somewhere?

          > just because they fixed one problem does not mean they fixed them all; and good science seems to be much harder
          Science is a constant (desperate) struggle to get (even a little bit) less wrong. Fixing one problem is like trying to sweeten all the oceans with a single spoonful of sugar!

          (Had a conversation with a wet lab science colleague a few years ago that really bothered me – they said “a couple years ago I spent the whole summer learning about the correct statistics to use in my work, so now I never need to worry about that again”.)

      • I like the idea idea of “the standpoint of the scientist (not the careerist)”. In the scientific (!) management tradition (F.W. Taylor and forward), “soldiering” has a pejorative meaning. Merriam-Webster defines it as “to make a pretense of working while really loafing”. I’m sure that wasn’t what Fiske meant. But it does seem as though many psychologists have made a pretense of studying while really careering.

  2. And during the Roman Republic, writers constantly complained about the decline of Latin as a language. And Alexander Pope railed against the hacks having access to the printing press. I don’t see that “science” is different because most “science” is just stuff being done by people in their careers, for their own benefit and maybe with an eye toward some minor place in some small niche in a form of eternity, sort of like having a drinking fountain named for them at the local synagogue. Social science is an easy target but harder science is also full of papers which don’t have any real meaning to them. Trying to say this clearly and likely failing: many papers in math, physics, etc. are interpretations that have no value outside the paragraphs typed out because even if correct they don’t mean anything more than this is an interpretation, sort of like this is how I play Rachmaninov except with music there’s at least typically an actual score instead of a series of assumptions rooted in accepting various conjectures and theorems that may be correct or not or which may not truly be appropriate here. So more sort of like this is how I play Rachmaninov based not on a score but on a rough approximation described to me by someone who may or may not have heard the music. Another metaphor I sometimes use is as accurate as a police sketch: it may be we’re looking for a black guy with a beard unless of course he’s white and doesn’t have a beard.

    Thing really is that people are people. But further, it’s not only impossible to limit access of hacks to the printing press but if we were able to do this odds are the hacks would control access and actual science would be inhibited because actual science illuminates what isn’t actual science. With science as the rest of life there’s way more noise than signal.

    To be clear, so you don’t feel I’m criticizing you or your efforts (not that that matters), your railing about these really inexcusable failures are important but I don’t expect the disciplines themselves to change, let alone the players within them. I truly enjoy your blog. The only place I offer comments on the internet.

    • Jonathan

      Maybe I take exception to this assertion just because I’m a mathematician:

      _many papers in math … are interpretations that have no value outside the paragraphs
      typed out because even if correct they don’t mean anything more than this is an interpretation,_

      Lots of published mathematics won’t be looked at by many folks after they’ve been skimmed in the journal, but the results are almost certainly correct and interesting to some, since the peer review system in mathematics really works. For the most part their value is in the effort it takes to do something even a little new and get it right. That’s what mathematicians do for fun and to stay sharp. The resuslt is more than just “the paragraphs typed out” and more than just “an interpretation”. Probably true for the other (hard) sciences too.

      • Why do you think published math is almost certainly correct? Math’s only cleaner in the sense that there’s no data.

        Math gets retracted. The most recent post on Retraction Watch,

        http://retractionwatch.com/2017/02/13/journal-retracts-paper-state-senator-former-mathematician/

        quotes a retraction from Topology and its Applications as saying

        This article has been retracted at the request of the Editors-in-Chief after receiving a complaint about anomalies in this paper. The editors solicited further independent reviews which indicated that the definitions in the paper are ambiguous and most results are false. The author was contacted and does not dispute these findings.

        The author was a faculty member at U. Chicago and has a Ph.D. in math from MIT. So it’s not like the problem’s restricted to the minor leagues.

        One reason there may not be as many retractions in math is that relatively few people read math papers. Like other academic papers, they’re written for insiders, but unlike many other disciplines, it takes a rather deep dive to become an insider in math.

        • Yes, it happens rarely. I don’t think the rarity is because there are few readers and little data. I think it’s because there are few blunders – it’s harder to fool oneself in mathematics than in the statistics this blog handles so eloquently. One of the best known is the nonproof of Dehn’s Lemma.

          From Wikipedia:

          This theorem was thought to be proven by Max Dehn (1910), but Hellmuth Kneser (1929, page 260) found a gap in the proof. The status of Dehn’s lemma remained in doubt until Christos Papakyriakopoulos (1957, 1957b) proved it using his “tower construction”.

          (https://en.wikipedia.org/wiki/Dehn's_lemma)

          When Wiles first claimed to have proved the Fermat Conjecture journal referees found a gap it took
          a year and a collaborator to finish the proof.

        • Hey, I published a false theorem once! OK, in that case I guess it wasn’t really a “theorem,” but we’d mistakenly labeled it as one. In this case there was no “gap in the proof.” Nope, there was a plain old counterexample.

    • OK… so Sturgeon’s Law is accurate. https://en.wikipedia.org/wiki/Sturgeon's_law But when Sturgeon’s Law is too low by 10 percent or so you have a much bigger problem. Stated another way… Your Rachmaninov performances (I assume) aren’t recorded and sold to both the public, and worse, other professional musicians (including the aspiring ones) as exemplars.

    • Jonathan:

      The problem is not that this stuff gets published—as I wrote, everything will get published somewhere. The problem is that this sort of work is endorsed by the National Academy of Sciences, which is . . . hmmm, let me google it . . . ok, here it is:

      The National Academy of Sciences (NAS) is a private, nonprofit organization of the country’s leading researchers. The NAS recognizes and promotes outstanding science through election to membership; publication in its journal, PNAS; and its awards, programs, and special activities. Through the National Academies of Sciences, Engineering, and Medicine, the NAS provides objective, science-based advice on critical issues affecting the nation.

      I don’t want Susan Fiske advising the nation on critical issues such as himmicanes.

    • Another mathematician chiming in on “many papers in math, physics, etc. are interpretations that have no value outside the paragraphs typed out because even if correct they don’t mean anything more than this is an interpretation”:

      I’d say this applies more often to physics than to math. But within math, my impression is that it’s more likely to apply to applied math than to pure math, and within pure math, it’s more likely to apply to topology and geometry than to other areas of pure math.

      Yes, math papers with errors are often published, but my impression is that math referees (we use “reviewer” only for someone who writes a post-publication review, rather than for someone who looks at the paper and gives comments/recommendations to the editor) are more likely to look at details than in other fields.

      I’ve had two instances of a paper being accepted for publication with an error. In the first case, the error was a gap; I noticed it myself and asked that the paper be withdrawn. I figured out how to fill in the gap a few months later, resubmitted the paper, and it was published (probably the best paper I ever published — it was a new proof of an old theorem, but the reason I came up with the new proof was that I couldn’t figure out the proof in the existing paper. Turns out that lots of other people couldn’t follow the original proof, either — so citations to the original paper are usually accompanied by a reference to mine.)

      The second was a real blooper. I colleague to whom I sent a preprint noticed it and sent me a nice letter pointing out the error, so I asked to have the paper withdrawn. (This points out that the common practice of providing preprints of math papers is another factor that reduces the instance of errors in published papers.) The theorem was proved a couple of years later by the colleague who pointed out my error, along with someone else.

    • I certainly hope not, why should the useful info move from free blogs to the holdings of “Medium Corporation”? It is like biomed people want to move back behind a paywall… Also, all “gene editing” articles need to include cell count timecourses so we can estimate survival and division rates.

      • Anon:

        Don’t undervalue the benefits to feeling like part of a community. I’m part of the statistics community and the political science community and the Bayesian community and the Stan community and the blogging community. Jordan is less connected, and if he just starts his own blog he might feel a bit isolated. So there’s a plus for him to feel like part of the Medium community. I understand your suspicions about corporate influence, but it’s only fair to say that the corporation is offering something in return. I’ve heard that Facebook helps people feel like part of a community too!

        • From this it looks like medium soon plans on soon implementing some kind of micropayments scheme (ie paywall):
          https://blog.medium.com/renewing-mediums-focus-98f374a960be

          That sounds great in general, I would love to see this attempted for many topics. But why would you want to put scientific criticisms behind a paywall, don’t you really want people to read it? Also, there are ethical issues if you are funded by taxes, same as with the legacy publishers.

      • I archive my own writing in case Medium decides to remove it for whatever reason. I also duplicate post at omnesres.com, although lately I’ve just been linking to Medium. One benefit to Medium is it is kind of like Switzerland. If I criticized someone on my own blog I could disable comments, or perhaps remove comments I don’t like. By posting criticisms at a neutral site anyone can comment and voice their concerns.

      • Personally, I post on medium because the interface is clean and fantastic for simple pieces. It is not a general purpose scientific blogging platform, for sure.

  3. Yes these social psych experiments are about who can do the zaniest study and it’s been that way for >50 yrs. Who says it has to be more than that for human interest and entertainment? The ordinary,nonspecialist reader knows they’re “chump” effects, and 50 years of moving through new techniques (tests, CIs, meta-analysis) that were supposed to make social psych scientific hasn’t helped. The level of statistics and self-criticism was quite a lot higher in Morisson and Henkel’s day. Bringing out the heavy machinery to criticize much of this work makes little sense. If people aren’t prepared to falsify many of the inquiry types–the measurements, and assumptions that the tests are even observing results due to the manipulation, then there’s no reason to think the pseudoscience will stop. Either falsify the methodology and assumptions (which in many cases wouldn’t be difficult) or accept the enterprise as human interest and largely for entertainment only.

    • Reminds of a quote from Freud – roughly – “There won’t be a scientific psychology in my life time – but I have no intention of changing careers!”

  4. I have been teaching a course on human judgment and decision making for the last 5 years. I use Kahneman’s Thinking; Fast and Slow and was growing increasingly frustrated about having to cautiously teach that some of the studies discussed in his book are not *good* studies. Finally, the man himself officially acknowledged this in the following R-index blog.

    https://replicationindex.wordpress.com/2017/02/02/reconstruction-of-a-train-wreck-how-priming-research-went-of-the-rails/comment-page-1/#comment-1454

    FYI

Leave a Reply to Andrew Cancel reply

Your email address will not be published. Required fields are marked *