Skip to content
 

Click here to find out how these 2 top researchers hyped their work in a NYT op-ed!

Gur Huberman pointed me to this NYT op-ed entitled “Would You Go to a Republican Doctor?”, written by two professors describing their own research, that begins as follows:

Suppose you need to see a dermatologist. Your friend recommends a doctor, explaining that “she trained at the best hospital in the country and is regarded as one of the top dermatologists in town.” You respond: “How wonderful. How do you know her?”

Your friend’s answer: “We met at the Republican convention.”

Knowing a person’s political leanings should not affect your assessment of how good a doctor she is — or whether she is likely to be a good accountant or a talented architect. But in practice, does it?

Recently we conducted an experiment to answer that question. Our study . . . found that knowing about people’s political beliefs did interfere with the ability to assess those people’s expertise in other, unrelated domains.

Oh, interesting, so they asked people to assess the quality of doctors, accountants, and architects, and saw how these assessments were colored by the political beliefs of these professionals?

I wonder how they did it?

The simple answer: They didn’t.

I followed the link to the research article and did a quick search. The words “doctor,” “accountant,” and “architect” appear . . . exactly zero times.

Nope. Actually, “Participants were required to learn through trial and error to classify shapes as ‘blaps’ or ‘not blaps’, ostensibly based on the shape’s features. Unbeknownst to the participants, whether a shape was a blap was in fact random.” And later they had to choose “who the participant wanted to hear from about blaps and how they used the information they received.”

Also this: “Unbeknownst to the participants these sources were not in fact other people but algorithms designed to respond in the following pattern.”

OK, fine. It’s a lab experiment. No problem. Lab experiments can be great. But then consider this quote from the abstract of the research paper:

Participants had multiple opportunities to learn about others’ (1) political opinions and (2) ability to categorize geometric shapes.

No. Participants did not learn about others’ opinions. There were no others; these were algorithms. And, no, there was no way to learn about anyone’s ability to categorize geometric shapes, as all these categorizations were purely random. [That last statement was an error on my part. Apparently the experiment was set up so that some of the bots performed better than others. Some commenters below pointed out my mistake. — AG.]

And this, also from the abstract:

Our findings have implications for political polarization and social learning in the midst of political divisions.

Huh? I’ll remember that, next time I go to a doctor and ask him or her to categorize geometric shapes for me.

And this, from the op-ed:

To make the most money, the participants should have chosen to hear from the co-player who had best demonstrated an ability to identify blaps, regardless of that co-player’s political views.

No, that’s false. By construction, all the predictions were noise. [To be clear: all the participants’ predictions were noise. But the predictions they were given from the bots varied in accuracy. — AG.] Indeed, it says so right in the research article: “Because in reality participant performance was held constant at 50% all participants who completed the entire experiment were paid a $5 bonus.”

Paging Gerd Gigerenzer. Gerd Gigerenzer, come to the phone. Researchers are taking perfectly rational behavior—showing more trust in statements given by people who share their values, in this case in a zero-cost setting—and giving it a cute name: “the halo effect.”

But what really bugs me is the title and opening of the op-ed. Especially that “But in practice, does it?” You can say a lot about this experiment, but “in practice” it’s not.

Look. If you want to do an experiment and write it up, fine.

But what’s with all the hype? Why do you have to describe it as what it’s not? If the experiment is as wonderful as you think it is, let it stand on its own two feet.

If you want to hype your work, fine. Just do honest hype. For example, “We did this experiment and it was the coolest thing ever. Sure, it was just a study of shapes on the computer, but we think it has broader implications.” Not: “Knowing a person’s political leanings should not affect your assessment of how good a doctor she is — or whether she is likely to be a good accountant or a talented architect. But in practice, does it? Recently we conducted an experiment to answer that question.”

I understand the moral reasoning, I think. If you think that (a) your experiment is of high quality, and (b) it has some relevance to the real world, then it’s appropriate to publicize it. Indeed, one might call a moral obligation, if you have media connections, to use them to make the world aware of your important research findings.

And then the moral reasoning continues . . . If you’re gonna publicize it, you want the maximum influence. So, don’t publish the research paper in Plos One, publish it in Cognition. And, don’t run the op-ed in Vox, run it in the New York Times. But, to get it into Cognition, yon need a hook. So you exaggerate a little in the abstract; don’t worry, your virtue is preserved because you present the honest details in the full paper. (And it seems they did.) And, to get it into the Times, you need a hook. So you give your op-ed a misleading intro, but that’s OK, you qualify things later on.

So, sure, looking at it a certain way, this sort of hype is the only moral way to go. But I don’t buy it. Partly because, as a scientist, I have to deal with the backwash from all of this: on one hand, a justified distrust of scientists’ claims, and, on the other, a general sloppiness in how scientific work is reported in the popular press.

To put it another way: How can we criticize David Brooks for repeatedly mangling data, when professional researchers do it too, when promoting their work?

I’m getting tired of neuroscientists lecturing to us about politics.

P.S. Regarding honest hype, Huberman gives us this classic example:

It has not escaped our notice that the specific pairing we have postulated irrimediately suggests a possible copying mechanism for the genetic material. Full details of the structure, including the conditions assumed in building it, together with a set of co-ordinates for the atoms, will be published elsewhere.

That’s how to do it.

P.P.S. Tali Sharot, the first author of the op-ed in question, responds here.

P.P.P.S. Lots of discussion in comments, including on my mistake in understanding some aspects of the experiment (see strike-throughs above). I remain disturbed by the hype, but I should clarify that being bothered by the hype does not imply that I think the research paper or even the op-ed are fatally flawed or useless or anything like that. The research is what it is, and I see some relevance of the research to the larger political question of interest. I don’t think it would’ve taken much tweaking of the op-ed to have kept it on line with the work being described. I also agree with commenters that, when criticizing work, I have the responsibility to try extra hard to avoid mistakes. Just as I don’t like to see sloppiness in public research summaries, I also don’t like to see sloppiness in public research criticism, especailly when it’s coming from me!

33 Comments

  1. You don’t need Gerd Gigerenzer to tell you that the “halo effect”, as illustrated both in the NYT and the paper, is perfectly rational. I am as hard-nosed a defender of the classic view of rationality – and of the heuristics and biases view that GG doesn’t like – as anyone I know, yet I wrote the first author about this very problem. I cited a forthcoming paper of mine (in Perspectives on Psychological Science, with John Jost) criticizing another paper on a very similar point (especially p. 14 ff in our paper:
    http://www.sas.upenn.edu/~baron/papers/dittoresp.pdf

    • Andrew says:

      Jonathan:

      I didn’t mean to imply that Gigerenzer is the only person who’s studied this problem, just that he’s particularly associated with skepticism about the “people are stupid” attitude regarding cognitive illusions.

  2. Jeff says:

    “No, that’s false. By construction, all the predictions were noise. Indeed, it says so right in the research article: ‘Because in reality participant performance was held constant at 50% all participants who completed the entire experiment were paid a $5 bonus.'”

    The predictions of actual participants were noise, yes, but I think the point was that the fake “co-players” weren’t really making predictions–they were presented to participants as having differing success rates for a task that was presented as a learnable skill. So, the reasoning goes, participants should have paid more attention to the guesses of “co-players” with higher apparent success rates over those with similar political leanings.

  3. Dale Lehman says:

    I admit that this post confuses me greatly. From what I’ve seen of this manuscript, I don’t care for this type of research and there appears to be plenty of room for noisy measurements and forking paths. But, Andrew’s criticism is what confuses me. It seems that subjects were given a fairly extensive questionnaire about political issues and these were matched to machine generated views that were designed to differentially agree or disagree with the subject’s own views. It is true that these were machine generated views and not real people. Is that the problem with this study – that it presents computer-generated opinions as if they are real people’s opinions? I would rather seem them accurately report what they did, but I’m not sure that it is terrible to present a machine-generated set of opinions as if they were a real person. If I (as subject) believe that these opinions represent those of a real person, does it really matter that they were generated by a computer and not a person?

    There is a lot of experimental research that is similar in that an artificial environment is created and behavior in that environment is then used to make inferences about how people behave in real world environments. For example, experimental economics often puts subjects in artificial settings with different incentive structures and then draws conclusions about how they behave in real circumstances that have similar incentives (e.g., an experiment such as the ultimatum game, where payoffs depend on whether or not you behave “rationally,” is then used to infer how people behave selfishly (or not) around real people in real situations). I am generally bothered by the issue of whether the artificial circumstances really tell us anything about the real circumstances they supposedly parallel.

    But this issue seems less of a concern in this particular study. If I am asked questions about lowering the voting age and then shown fake people’s responses to this question, does it really matter that they are fake people and not real people?

    Is this what Andrew is objecting to, or am I missing the point?

    Part of my confusion may be due to the fact that the paper is horribly written. In the text it says “participants indicated whether they agreed or disagreed with one of 84 social/political cause-and-effect statements.” But the supplementary appendix does not show any of these statements, but instead has demographic information about “political ideology (on a sliding scale from 0=”Liberal” to 1=”Conservative”). It is entirely unclear to me whether this is self-identified ideology, whether it is derived from the one question out of 84 social/political statements, or how they are related. And, if there is a set of 84 such questions, and only one is used for each subject, why introduce the extra noise of variability of the 84 potential questions? So, there is plenty I dislike about this paper, but I’m not sure that Andrew’s focus is the real problem here – or I’m not sure what Andrew’s focus actually is.

    • Andrew says:

      Dale:

      I don’t dislike the research project. What I dislike is the way that the authors misrepresented what they did. In their op-ed, they wrote, “Knowing a person’s political leanings should not affect your assessment of how good a doctor she is — or whether she is likely to be a good accountant or a talented architect. But in practice, does it?” But their paper did not ever mention doctors, accountants, or architects, nor did it have anything to do with “in practice.” If the work is so great, why can’t the authors present it straight, with no exaggeration?

      Part of the problem, I think, is selection bias. Take the very same research project and remove the overstatements, and maybe it doesn’t get published in a top journal. Take the very same op-ed and remove the misrepresentations, and maybe it doesn’t appear in the NYT. So it’s the exaggerated claims that we end up seeing.

      • Dale Lehman says:

        I don’t see it as such a great leap from an experiment that investigates how opinion agreement influences a choice of whose advice to trust in the experiment to speculating about how people might choose accountants or architects in relation to their political opinions. Yes, it is not the same thing, but that is the gist of their research question.

        Still the glaring contrast between those statements and the writing in the publication does speak to your point. Since the paper is written in dry relatively technical terms, inserting provocative statements that go well beyond what their experiment actually entailed, is a striking contrast. So, I guess I agree with being disturbed by the inflation of the language to garner headline publicity. But I’m more disturbed by the poor description of what they actually did and the measurement issues therein.

      • Anonymous says:

        But Andrew – it seems you are misrepresenting what they did.

        Dale is highlighting statements in your blog that are simply inaccurate and should really be corrected in a visible manner. It is very clear from the paper that there was indeed a way to learn about “co-players” ability to categorize geometric shape. I believed you misunderstood the methods.

        Further, the supplementary material states that the participants believed these co-players were actual humans, which makes the objection about people interacting with “algorithms” mute.

        As Dale mentions the article does not have overstatements – it is written in a relatively dry technical manner, yet still published in a good quality journal. So can it really be selection bias?

        As I see it the only sin of the authors was to open the first paragraph of the op-ed with a real-life example that can help the reader understand the point of what is yet to come. That is then followed by a detailed (again relatively dry) description of the experiment that is perfectly honest.

        I think you jumped the gun on this one. At least the straight forward inaccuracies in the blog should be altered.

        • Andrew says:

          Anon:

          1. I think the title of the op-ed is misleading, as the research has nothing about doctors or anything related to doctors.

          2. I think the following from the op-ed is misleading: “Knowing a person’s political leanings should not affect your assessment of how good a doctor she is — or whether she is likely to be a good accountant or a talented architect. But in practice, does it? Recently we conducted an experiment to answer that question.” There’s nothing “in practice” in the research.

          3. The statement, “To make the most money, the participants should . . .” is wrong. It seems silly to fault the participants for not doing something that might have theoretically earned them an additional $2.50, given that the amount of money they could actually make was fixed.

          4. The participants did not actually “learn about others’ (1) political opinions and (2) ability to categorize geometric shapes.” Rather, the participants were fed fake information about bots. That’s fine, it may be a wonderful experimental design, but then the authors should just describe it accurately.

          Again, my quarrel is not with the experiment, it is with the various hypey aspects of the description of the experiment. Maybe the results would still have appeared it the Times without the hype; we’ll never know.

          • Anonymous says:

            Andrew – you are ignoring the main point. You have false statements in your blog that should be corrected. As Jeff points out the “co-players” were not performing at random they had differing success rates and thus did provide signal to subjects. The reason it is important for you to acknowledge this is because if your statement was true it would have been the only objection of substance in the blog

            As for your objections above.
            1. Authors do not select their titles for these type of op-eds (often they are not even shown them in advance).
            2. Misleading is a bit harsh. Yes, it should have said “to answer such questions”.
            3. Subjects were told they would be paid according to how well they do on the task. Thus, the authors statement is correct and I do not read it as faulting anyone.
            4. Really Andrew – the authors repeat at least 3 times in the op-ed that the others were bots as well as multiple times in the paper. The Op-ed is a perfectly accurate description of what was done. So I really do not understand you picking on the use of the word “others” in this sentence.

            I don’t know, it seems to me your blog is misleading and the derogatory tone and click bate title are not helpful, nor necessary. You make sever accusations (use the words mangling and not honest) for a very straight forward account of an experiment, which only fault is to suggest that it may be related to other decisions people make in life. This is irresponsible accusations that you really should correct.

            p.s. – the (senior) co-author of the op-ed was the Administrator of the White House Office of Information and Regulatory Affairs in the Obama administration. I do not believe the oped is lecturing anyone, but if it was – wouldn’t he be more qualified than most to do so? (in response to the last sentence of your blog).

            • Andrew says:

              Anon:

              I take your point that the bots were given different success rates and I’ll correct the above post.

              Regarding your other points:

              1. My problem is not just with the title but with the entire intro to the op-ed, which frames the whole thing in terms of judgments of professionals such as doctors etc. Given the intro to the op-ed, the title makes a lot of sense. So I don’t think it’s appropriate to blame the headline writer; I think the responsibility here is on the authors of the piece.

              2. I think it’s misleading to say “in practice” when there’s no practice in the experiment.

              3. You write, “Subjects were told they would be paid according to how well they do on the task.” But how well they did was completely random.

              4. The op-ed contains an accurate description of the experiment and it also leads with an inaccurate discussion. I think the accurate description is fine; I’m unhappy with the inaccurate discussion.

              That said, I’m sorry about missing that point regarding the performance of the bots, and I appreciate the patience that you and other commenters have shown in correcting me on this point.

    • AB says:

      Dale,
      Each subject observed multiple political statements. End of page 4:
      “The learning stage consisted of 8 blocks of 20 trials each (10 blap and 10 political trials interleaved). Responses from one of the four sources were shown for the duration of a block”. So 80 political statements each (and 4 political statements were used in the practice trials as mentioned earlier).

      You are absolutly right to point out to the rest of us the confusion about Andrew’s objections – thank you.

  4. Paul Alper says:

    Andrew wrote:

    “I wonder how they did it?

    The simple answer: They didn’t.”

    In the medical field, this is sort of common: great and revolutionary treatments/results are promised and only later do we find out that the participants in the study were mice and/or not that many of them.

  5. Mark Palko says:

    “How can we criticize David Brooks for repeatedly mangling data [in the NYT op-ed section], when professional researchers do it too [in the NYT op-ed section], when promoting their work?”

    Perhaps there’s a common element in these two cases we could focus our criticism on.

  6. Martha (Smith) says:

    Of possible related interest if anyone is interested in trends in political affiliations of doctors: https://www.nytimes.com/2016/10/07/upshot/your-surgeon-is-probably-a-republican-your-psychiatrist-probably-a-democrat.html

  7. Roy says:

    See also this twitter thread by LIor Pachter about Andrew Ng’s claim that his AI for radiology will make radiologists unnecessary:

    https://twitter.com/lpachter/status/999772323359080449

    Particularly the third item in the thread where he looks at the actual results. Talking about a scientist over-hyping their results. And it is not even psychology

    Was also re-tweeted by Dan Simpson and Michael Betancourt

  8. Eli Rabett says:

    The Journal of the American Physicians and Surgeons, no better than a fake journal, but real.

    http://scienceblogs.com/ethicsandscience/2009/05/05/fake-journals-versus-bad-journ/

  9. Anonymous says:

    In the quote from the double helix paper I find this:

    irrimediately

    The “rri” does look like an “m” until you zoom in.

    How did that happen?

  10. Anonymous 3 says:

    I am a big (as in really big) fan of this blog, but posts like these drive me away. Maybe a future post reflection on why the original post of yours was both factually wrong and misleading could be interesting?

    • Andrew says:

      Anon:

      I made one mistake in reading that paper which was not to catch that the bots’ success rates were set to values far from 50%. Mistakes happen. I don’t think my main point was wrong, though, that the headline and lead-in of the op-ed were hype-y, in that the research said nothing about assessments of doctors and other professionals. The hype really bothered me.

      That said, I respect that others, including you, have different views on this, and I think open discussions are important. I’m glad that people feel comfortable criticizing me here, and I wouldn’t want it any other way.

      Also, I think there’s one general point of confusion regarding scale. Sometimes we talk about research products such as that ESP paper or that beauty-and-sex-ratio paper that have, to me, no redeeming scientific value, as they’re essentially nothing but manipulations of noise. Other times we talk about potentially interesting and important research that’s presented in a misleading way. These two things are different. I’m bothered by hype, but that doesn’t mean that I’m saying that the research is no good.

  11. Guillermo says:

    OK, this can be a little bit shallow but … the “paper” was written in word (not LaTeX)… that says something (and it’s not good)

Leave a Reply