Recently I had a disagreement with Larry Bartels which I think is worth sharing with you. Larry and I took opposite positions on the hot topic of science criticism.
To put things in a positive way, Larry was writing about some interesting recent research which I then constructively criticized.
To be more negative, Larry was hyping some sexy research and I was engaging in mindless criticism.
The balance between promotion and criticism is always worth discussing, but particularly so in this case because of two factors:
1. The research in question is on the borderline. The conclusions in question are not rock-solid—they depend on how you look at the data and are associated with p-values like 0.10 rather than 0.0001—but neither are they silly. Some of the findings definitely seem real, and the debate is more about how far to take it than whether there’s anything there at all. Nobody in the debate is claiming that the findings are empty; there’s only a dispute about their implications.
2. The topic—the effect of unperceived messages on political attitudes—is important.
3. And, finally, Larry and I generally respect each other, both as scholars and as critics. So, even though we might be talking past each other regarding the details of this particular debate, we each recognize that the other has something valuable to say, both regarding methods and public opinion.
What it’s all about
The background is here:
We had a discussion last month on the sister blog regarding the effects of subliminal messages on political attitudes. It started with a Larry Bartels post entitled “Here’s how a cartoon smiley face punched a big hole in democratic theory,” with the subtitle, “Fleeting exposure to ‘irrelevant stimuli’ powerfully shapes our assessments of policy arguments,” discussing the results of an experiment conducted a few years ago and recently published by Cengiz Erisen, Milton Lodge and Charles Taber. Larry wrote:
What were these powerful “irrelevant stimuli” that were outweighing the impact of subjects’ prior policy views? Before seeing each policy statement, each subject was subliminally exposed (for 39 milliseconds — well below the threshold of conscious awareness) to one of three images: a smiling cartoon face, a frowning cartoon face, or a neutral cartoon face. . . . the subliminal cartoon faces substantially altered their assessments of the policy statements . . .
I followed up with a post expressing some skepticism:
Unfortunately they don’t give the data or any clear summary of the data from experiment No. 2, so I can’t evaluate it. I respect Larry Bartels, and I see that he characterized the results as the “subliminal cartoon faces substantially altered their assessments of the policy statements — and the resulting negative and positive thoughts produced substantial changes in policy attitudes.” But based on the evidence given in the paper, I can’t evaluate this claim. I’m not saying it’s wrong. I’m just saying that I can’t express judgment on it, given the information provided.
Larry then followed up with a post saying that further information was in chapter 3 of Erisen’s Ph.D. dissertation and presented as evidence this path analysis:
along with this summary:
In this case, subliminal exposure to a smiley cartoon face reduced negative thoughts about illegal immigration, increased positive thoughts about illegal immigration, and (crucially for Gelman) substantially shifted policy attitudes.
And Erisen sent along a note with further explanation, the centerpiece of which was another path analysis.
Unfortunately I still wasn’t convinced. The trouble is, I just get confused whenever I see these path diagrams. What I really want to see is a direct comparison of the political attitudes with and without the intervention. No amount of path diagrams will convince me until I see the direct comparison.
However, I had not read all of the relevant chapter of Erisen’s dissertation in detail. I’d looked at the graphs (which had results of path analyses, and data summaries on positive and negative thoughts, but no direct data summaries of issue attitudes) and at some of the tables. It turns out, thought that there were some direct comparisons of issue attitudes in the text of the dissertation but not in the tables and figures.
I’ll get back to that in a bit, but first let me return to what I wrote at the time, in response to Erisen and Bartels:
I’m not saying that Erisen is wrong in his claims, just that the evidence he [and Larry] shown me is too abstract to convince me. I realize that he knows a lot more about his experiment and his data than I do and I’m pretty sure that he is much more informed on this literature than I am, so I respect that he feels he can draw certain strong conclusions from his data. But, for me, I have to go what information is available to me.
Why do these claims from path analysis confuse me? An example is given in a comment by David Harris, who reports that Erisen et al. “seem to acknowledge that the effect of their priming on people’s actual policy evaluations is nil” but that they then follow up with a convoluted explanation involving a series of interactions.
Convoluted can be OK—real life is convoluted—but I’d like to see some simple comparisons. If someone wants to claim that “Fleeting exposure to ‘irrelevant stimuli’ powerfully shapes our assessments of policy arguments,” I’d like to see if these fleeting exposures indeed have powerful effects. In an observational setting, such effects can be hard to “tease out,” as the saying goes. But in this case the researchers did a controlled experiment, and I’d like to see the direct comparison as a starting point.
Commenter Dean Eckles wrote:
The answer is that those effects are not significant at conventional levels in Exp 2. From ch. 3 (pages 89-91) of Cengiz Erisen’s dissertation (from https://dspace.sunyconnect.suny.edu/handle/1951/52338) we have:
Illegal Immigration: “In the first step of the mediation model a simple regression shows the effect of affective prime on the attitude (beta=.34; p [less than] .07). Although not hypothesized, this confirms the direct influence of the affective prime on the illegal immigration attitude.”
Energy Security: “As before, the first step of the mediation model ought to present the effect of the prime on one’s attitude. In this mediation model, however, the affective prime does not change energy security attitude directly (beta=-.10; p [greater than] .10. Yet, as discussed before, the first step of mediation analysis is not required to establish the model (Shrout & Bolger 2002; MacKinnon 2008).”
So (the cynic in me says), this pretty much covers it. The direct result was not statistically significant. When it went in the expected direction and was not statistically significant, it was taken as a confirmation of the hypothesis. When it went in the wrong direction and was not statistically significant, it was dismissed as not being required.
Back to the debate
OK, so here you have the story as I see it: Larry heard of an interesting study regarding subliminal messages, a study that made a lot of sense especially in light of the work of Larry and others regarding the ways in which voters can be swayed by information that logically should be irrelevant to voting decisions or policy positions (and, indeed, consistent with the work of Kahneman, Slovic, and Tversky regarding shortcuts and heuristics in decision making). The work seemed solid and was supported by several statistical analyses. And there does seem to be something there (in particular, Erisen shows strong evidence of the stimulus affecting the numbers of positive and negative thoughts expressed by the students in his experiment). But the evidence for the headline claim—that the subliminal smiley-faces affect political attitudes themselves, not just positive and negative expressions—is not so clear.
That’s my perspective. Now for Larry’s. As he saw it, my posts were sloppy: I reacted to the path analyses presented by him and Erisen and did not look carefully within Erisen’s Ph.D. thesis to find the direct comparisons. Here’s what Larry wrote:
Now it seems that one of your commenters has read (part of) the dissertation chapter and found two tests of the sort you claimed were lacking, one of which indicates a substantial effect (.34 on a six-point scale) and the other of which indicates no effect. If you or your commenter bothered to keep reading, you would find four more tests, two of which (involving different issues) indicate substantial effects (.40 and .51) and two of which indicate no effects. The three substantial effects (out of six) have reported p-values of <.07, <.08, and >.10. How likely is that set of results to occur by chance? Do you really want to argue that the appropriate way to assess this evidence is one .05 test at a time?
Hmmm, I’ll have to think about this one.
My quick response is as follows:
1. Sure, if we accept the general quality of the measurements in this study (no big systematic errors, etc.) then there’s very clear evidence of the subliminal stimuli having effects on positive and negative expressions, hence it’s completely reasonable to expect effects on other survey responses including issue attitudes.
2. That is, we’re not in “Bem” territory here. Conditional on the experiments being done competently, there are real effects here.
3. Given that the stimuli can affect issue attitudes, it’s reasonable to expect variation, to expect some positive and some negative effects, and for the effects to vary across people and across situations.
4. So if I wanted to study these effects, I’d be inclined to fit a multilevel model to allow for the variation and to better estimate average effects in the context of variation.
5. When it comes to specific effects, and to specific claims of large effects (recall the original claim that the stimulus “powerfully [emphasis added] shapes our assessments of policy arguments,” elsewhere “substantially altered,” elsewhere “significantly and consistently altered,” elsewhere “punched a big hole in democratic theory”), I’d like to see some strong evidence. And these “p less than .07″ and “p greater than .10″ things don’t look like strong evidence to me.
6. I agree that these results are consistent with some effect on issue attitudes but I don’t see the evidence for the large effects that have been claimed.
7. Finally, I respect the path analyses for what they are, and I’m not saying Erisen shouldn’t have done them, but I think it’s fair to say that these are the sorts of analyses that are used to understand large effects that exist; they don’t directly address the question of the effects of the stimulus on policy attitudes (which is how we could end up with explanation of large effects that cancel out).
As a Bayesian, I do accept Larry’s criticism that it was odd for me to claim that there was no evidence just because p was not less than 0.05. Even weak evidence should shift my priors a bit, no?
And I agree that weak evidence is not the same as zero evidence.
So let me clarify that, conditional on accepting the quality of Erisen’s experimental protocols (which I have no reason to question), I have no doubt that some effects are there. The question is about the size and the direction of the effects.
In some sense, the post-publication review process worked well: Larry promoted the original work on the sister blog which gave it a wider audience. I read Larry’s post and offered my objection on the sister blog and here, and, in turn, Erisen and various commenters replied. And, eventually, after a couple of email exchange, I finally got the point that Larry had been trying to explain to me, that Erisen did have the direct comparisons I’d been asking for, they were just in the text of his dissertation and not in the tables and figures.
This post-publication discussion was slow and frustrating (especially for Larry, who was rightly annoyed that I kept saying that the information wasn’t available to me, when it was there in the dissertation all along), but I still think it moved forward in a better way than would’ve happened without the open exchange, if, for example, all we’d had were a series of static, published articles presenting one position or another.
But these questions are difficult and somewhat unstable because of the massive selection effects in play. This discussion had its frustrating aspects on both sides but things are typically much worse! Most studies in political science don’t get discussed on the Monkey Cage or on this blog, and what we see is typically bimodal: a mix of studies that we like and think are worth sharing, and studies that we dislike and think are worth taking the time to debunk.
But I don’t go around looking for studies to shoot down! What typically happens is they get hyped by somebody else (whether it be Freakonomics, or David Brooks, or whoever) and then I react.
In this case, Larry posted on a research finding that he thought was important and perhaps had not received enough attention. I was skeptical. After all the dust has settled, I remain skeptical about any effects of the subliminal message on political attitudes. I think Larry remains convinced, and maybe our disagreement ultimately comes down to priors, which makes sense given that the evidence from the data is weak.
Meanwhile, new studies get published, and get neglected, or hyped, or both. I offer no general solution to how to handle these—clearly, the standard system of scientific publishing has its limitations—here I just wanted to raise some of these issues in a context where I see no easy answers.
To put it another way, I think social science can—and should—do better than we usually do. For a notorious example, consider “Reinhart and Rogoff”: a high-profile paper published in a top journal with serious errors that were not corrected for several years after publication.
On one hand, the model of discourse described in my above post is not at all scalable—Larry Bartels and I are just 2 guys, after all, and we have finite time available for this sort of thing. On the other hand, consider the many thousands of researchers who spend so many hours refereeing papers for journals. Surely this effort could be channeled in a more useful way.