How much time (if any) should we spend criticizing research that’s fraudulent, crappy, or just plain pointless?

Posted on March 6, 2014 9:13 AM by Andrew

I had a brief email exchange with Jeff Leek regarding our recent discussions of replication, criticism, and the self-correcting process of science.

Jeff writes:

(1) I can see the problem with serious, evidence-based criticisms not being published in the same journal (and linked to) studies that are shown to be incorrect. I have been mostly seeing these sorts of things show up in blogs. But I’m not sure that is a bad thing. I think people read blogs more than they read the literature. I wonder if this means that blogs will eventually be a sort of “shadow literature”?

(2) I think there is a ton of bad literature out there, just like there is a ton of bad stuff on Google. If we focus too much on the bad stuff we will be paralyzed. I still manage to find good papers despite all the bad papers.

(3) I think one positive solution to this problem is to incentivize/publish referee reports and give people credit for a good referee report just like they get credit for a good paper. Then, hopefully the criticisms will be directly published with the paper, plus it will improve peer review.

A key decision point is what to do when we encounter bad research that gets publicity. Should we hype it up (the “Psychological Science” strategy), slam it (which is often what I do), ignore it (Jeff’s suggestion), or do further research to contextualize it (as Dan Kahan sometimes does)?

OK, I’m not planning to take that last option any time soon: research requires work, and I have enough work to do already. And we’re not in the business of hype here (unless the topic is Stan). So let’s talk about the other two options: slamming bad research or ignoring it. Slamming can be fun but it can carry an unpleasant whiff of vigilantism. So maybe ignoring the bad stuff is the better option. As I wrote earlier:

Ultimately, though, I don’t know if the approach of “the critics” (including myself) is the right one. What if, every time someone pointed me to a bad paper, I were to just ignore it and instead post on something good? Maybe that would be better. The good news blog, just like the happy newspaper that only prints stories of firemen who rescue cats stuck in trees and cures for cancer. But . . . the only trouble is that newspapers, even serious newspapers, can have low standards for reporting “cures for cancer” etc. For example, here’s the Washington Post and here’s the New York Times. Unfortunately, these major news organizations seem often to follow the “if it’s published in a top journal, it must be correct” rule.

Still and all, maybe it would be best for me, Ivan Oransky, Uri Simonsohn, and all the rest of us to just turn the other cheek, ignore the bad stuff and just resolutely focus on good news. It would be a reasonable choice, I think, and I would fully respect someone who were to blog just on stuff that he or she likes.

Why, then?

Why, then, do I spend time criticizing research mistakes and misconduct, given that it could even be counterproductive by drawing attention to sorry efforts that otherwise might be more quickly forgotten?

The easiest answer is education. When certain mistakes are made over and over, I can make a contribution by naming, exploring, and understanding the error (as in this famous example or, indeed, many of the items on the lexicon).

Beyond this, exploring errors can be a useful research direction. For example, our criticism in 2007 of the notorious beauty-and-sex-ratio study led in 2009 to a more general exploration of the issue of statistical significance, which in turn led to a currently-in-the-revise-and-resubmit-stage article on a new approach to design analysis.

Similarly, the anti-plagiarism rants of Thomas Basbøll and myself led to a paper on the connection between plagiarism and ideas of statistical evidence, and another paper storytelling as model checking. So, for me, criticism can open doors to new research.

But it’s not just about research

One more thing, and it’s a biggie. People talk about the self-correcting nature of the scientific process. But this self-correction only happens if people do the correction. And, in the meantime, bad ideas can have consequences.

The most extreme example was the infamous Excel error by Reinhardt and Rogoff, which may well have influenced government macroeconomic policy. In a culture of open data and open criticism, the problem might well have been caught. Recall that the paper was published in 2009, its errors came to light in 2013, but as early as 2010, Dean Baker was publicly asking for the data.

Scientific errors and misrepresentations can also have indirect influences. Consider …, where Stephen Jay Gould notoriously… And evolutionary psychology continues to be a fertile area for pseudoscience. Just the other day, Tyler Cowen posted, on a paper called “Money, Status, and the Ovulatory Cycle,” which he labeled as the “politically incorrect paper of the month.”

The trouble is that the first two authors are Kristina Durante, Vladas Griskevicius, and I can’t really believe anything that comes out of that research team, given they earlier published the ridiculous claim that among women in relationships, 40% in the ovulation period supported Romney, compared to 23% in the non-fertile part of their cycle. (For more on this issue, see section 5 of this paper.)

Does publication and publicity of ridiculous research cause problems (besides wasting researchers’ time)? Maybe so. Two malign effects that I can certainly imagine coming from this sort of work are (a) a reinforcing of gender stereotypes, and (b) a cynical attitude about voting and political participation. Some stereotypes reflect reality, I’m sure of that—and I’m with Steven Pinker on not wanting to stop people from working in controversial areas. But I don’t think anything is gained from the sort of noise-mining that allows researchers to find whatever they want. At this point we as statisticians can contribute usefully be stepping in and saying: Hey, this stuff is bogus! There ain’t no 24% vote swings. If you think it’s important to demonstrate that people are affected in unexpected ways by hormones, then fine, do it. But do some actual scientific research. Finding “p less than 0.05” patterns in a non representative between-subjects study doesn’t cut it, if your goal is to estimate within-person effects.

What about meeeeeeeee?

Should I be spending time on this? That’s another question. All sorts of things are worth doing by somebody but not necessarily by me. Maybe I’d be doing more for humanity by working on Stan, or studying public opinion trends in more detail, or working harder on pharmacokinetic modeling, or figuring out survey weighting, or go into cancer research. Ir maybe I should chuck it all and do direct services with poor people, or get a million-dollar job, make a ton of money, and then give it all away. Lots of possibilities. For this, all I can say is that these little investigations can be interesting and fruitful for my general understanding of statistics (see the items under the heading “Why then” above). But, sure, too much criticism would be too much.

“Bumblers and pointers”

A few months ago after I published an article criticizing some low-quality published research, I received the following email:

There are two kinds of people in science: bumblers and pointers. Bumblers are the people who get up every morning and make mistakes, trying to find truth but mainly tripping over their own feet, occasionally getting it right but typically getting it wrong. Pointers are the people who stand on the sidelines, point at them, and say “You bumbled, you bumbled.” These are our only choices in life.

The sad thing is, this email came from a psychology professor! Pretty sad to think that he thought those were our two choices in life. I hope he doesn’t teach this to his students. I like to do both, indeed at the same time: When I do research (“bumble”), I aim criticism at myself, poking holes in everything I do (“pointing”). And when I criticize (“pointing”), I do so in the spirit of trying to find truth (“bumbling”).

If you’re a researcher and think you can do only one or the other of these two things, you’re really missing out.

48 thoughts on “How much time (if any) should we spend criticizing research that’s fraudulent, crappy, or just plain pointless?”

Fernando on March 6, 2014 10:00 AM at 10:00 am said:

Andrew: “Scientific errors and misrepresentations can also have indirect influences.”

Yes, and I think you are only scratching the surface. This is a tragedy of the commons. One where the pasture being trampled on is none other than the scientific method.

If academics are rewarded according to the size of the newspaper headline then the incentive is to find results like: “If all women ovulated simultaneously all hell would break loose”. At the same time, those that don’t pursue that kind of headlines will have a harder time publishing, thriving, and getting tenure. So the incentive is to trample all over the scientific method pasture.

It would be interesting to run simulations using agent based modeling or whatever to get a sense of the rotting process over time. Perhaps Andrew ought to write a post on the relation between academia and newspapers. I don’t think it is very healthy.
Paul O on March 6, 2014 10:23 AM at 10:23 am said:

RE: Jeff’s point 1: “I think people read blogs more than they read the literature”. Are the data out there to investigate that? It’d be empowering to quantifiably add blogs to the marketplace of scientific ideas.
Dan Wright on March 6, 2014 10:26 AM at 10:26 am said:

Since science involves bumbling, criticism, and lots of other things (good science even!), perhaps the psychology professor should have asked what good and bad outcomes result from lots of bumbling studies being in journals and headlines and what good and bad outcomes result from being critical of bumbling studies. It seems journals grow as bumbling increases, but this doesn’t necessarily advance science (of course the occasional bumble leads to more research that does make advances). Compare this to being critical of research, which seems to spark debate and advance methods. As with you, I agree everybody does a bunch of things including criticism and exploration, so the psychologist’s theory that there are two and only two completely separate taxa seems … worth publishing.
Choola on March 6, 2014 11:19 AM at 11:19 am said:

Wow, this Kristina Durante has another amazing finding on menstrual cycle effects here:

http://psychcentral.com/news/2014/02/28/women-are-more-competitive-during-a-certain-time-of-the-month/66520.html
- Andrew on March 6, 2014 11:37 AM at 11:37 am said:
  
  Choola:
  
  I hate to waste a whole blog post on this one, so let me just pull out from the press release some sentences that scream “multiple comparisons”:
  
  “We found that ovulating women were much less willing to share when the other person was another woman. They became meaner to other women” . . .
  
  The study found that ovulating women preferred Option B, choosing products that would give them higher standing compared to other women. . . .
  
  But, the studies find that ovulation doesn’t always make women want more status.
  
  When women played against a man rather than a woman in the dictator game, the researchers found an even more surprising result.
  
  Whereas ovulating women became meaner to women, they became nicer to men. . . .
  
  “These findings are unlike anything we have ever seen in the dictator game. You just don’t see people giving away more than half of their money” . . .
  
  You know, if this is unlike anything you’ve ever seen, maybe it’s just noise. And look at all the flip flops above. Suppose at women were “meaner” to both men and women: that would be newsworthy also. Or if married women were mean and single women were nice (recall that the authors’ earlier paper leaned on such an interaction). Or if the statistically significant finding were an interaction with socioeconomic status (as in the arm-circumference-and-political-attitudes paper). Or . . .
  
  But at this point I think we can expect to see papers from this research group forever. They’ve figured out how to get statistical significance and how to package such results with a straight face.
  
  And I expect they believe it all themselves. I imagine that it’s harder to fool others if you can’t fool yourself.
  - Daniel Lakeland on March 6, 2014 1:21 PM at 1:21 pm said:
    
    Andrew, on the other hand, I believe there is actually a fairly long history of theoretical background to these kinds of claims. Although the multiple comparisons issue you bring up is valid, I suspect a fair number of these claims would replicate in larger sample sizes and not turn out to be “just noise”. The overriding theory behind it all is relatively sound: evolutionary pressure produces more of the type of people who do things that tend to cause them to have children. favoring potential mates during high fertility periods and disfavoring potential rivals is consistent with this basic hypothesis, and was no doubt in their mind when they designed the experiment.
    
    What this group needs is a follow-up pre-registered large-ish sample size study that tests and confirms some of their core theories. The fact that resources are probably not available for this kind of un-sexy confirmatory research is a problem for evolutionary psych.
  - Daniel Lakeland on March 6, 2014 1:37 PM at 1:37 pm said:
    
    Also, “if this is unlike anything you’ve ever seen, maybe it’s just noise” doesn’t really make too much sense to me. I mean, I have to assume they’ve seen plenty of noise in non-ovulation-related versions of the lab games. Most likely these ovulation specific results don’t fit with that basic background noise. If they have any stats training at all, that’d be the most likely NHST they’d be using.
    
    I know you usually point out that you’re not specifically shooting down each individual result but overall complaining about NHST and simplistic models, and small powered studies and soforth becoming more ubiquitous. But I’m personally not so convinced that these people are the best example. I suspect there’s a large unstated but well understood theoretical basis within this field which would make these specific experiments quite obvious to other participants in this field. The point that women might have been “meaner” to both women and men and this would be newsworthy is valid, but although it would be newsworthy I suspect it would be counter-hypothesis for most evo-psych people, unless you could show that the specific men were somehow ruled out from being potential mates (like they were extremely ugly, old, had low SES, were happily married, whatever)
    - Andrew on March 6, 2014 1:53 PM at 1:53 pm said:
      
      Daniel:
      
      There’s a theoretical basis that ovulation is connected to all sorts of things. The effects could just as well go in the opposite direction. These are within-person effects being studied using between-person designs with very high noise levels. As a result, there’s nothing there. And, I completely disagree when you say that “The point that women might have been “meaner” to both women and men . . . would be counter-hypothesis for most ego-psych people.” The papers in this field are full of arbitrary choices of whether to study main effects or interactions. If two groups show the same effect, the researchers can find good theoretical reasons for the main effect; if two groups differ, researchers can find good theoretical reasons for the interaction.
      
      For a super-clear example, consider that the central point of the earlier Durante et al. paper was a comparison of partnered to unpartnered women. But this one (and the related paper by Tracy and Beall) did not. The theory is sufficiently flexible that it can explain all sorts of things.
    - Daniel Lakeland on March 6, 2014 2:27 PM at 2:27 pm said:
      
      You’re right of course that one can bend this theory a lot and encompass a lot of “significant” effects, especially in the context of NHST with multiple comparisons. That’s why I agree that they need a preregistered high powered confirmation study for some of the basic theories (ie. ovulation makes women prefer to enhance certain hues in their appearance, makes women more competitive with other women, and makes women more friendly to men). But I think your criticism comes across as basically “there’s nothing there” rather than “they haven’t really shown anything we can strongly believe”
      
      In my mind the way to do this is longitudinally (across time for several months), preferably with non-binary outcomes (ie. measure the hue angle of a photo of the upper torso in reference lighting with reference gray background using some appropriate color space, measure dollar outcomes from “economic games” rather than A/B choices etc.). And pre-register the analysis they’ll be doing with an explanation of what their theory expects to find. But, if they do that, I suspect they will find that the basic theory does have some consistent reliable effects. As someone who is familiar with the extent to which biological conditions can cause all sorts of things, it’s just not that likely that “nothing’s there”.
      
      Just as you’ve mentioned many times, we already know “the effect” is not exactly 0. The question is does the sign of the effect have the theoretically implied direction (ie. favor men, favor signs of fertility, disfavor women). and how big is this effect, especially relative to other basic effects such as say variability in responses between different randomly chosen women, or variation by age, or variation by basal testosterone levels (testosterone partially controls sexual desire in women if I remember correctly).
    - Andrew on March 6, 2014 2:34 PM at 2:34 pm said:
      
      Daniel:
      
      By “nothing there,” I mean there’s no statistical evidence to speak of. The theory is there. I just don’t think you can learn anything to refine the theory with this kind of high-variance between-subject study. The theorizing is fine but I think the researchers are kidding themselves if they think that these sorts of results tell them anything in addition to the theory they already have.
    - Daniel Lakeland on March 6, 2014 2:36 PM at 2:36 pm said:
      
      That’s a much clearer position that I can mostly agree with.
    - K? O'Rourke on March 7, 2014 8:32 AM at 8:32 am said:
      
      But avoiding hard work (finding out if effects can be replicated) by buying a lottery ticket and wishing (theorizing) often is too tempting.
      
      Not to be too flippant but the theorizing (assessing the credibility of the arguments the theoretically implied direction, the reputation of the investigators, the quality of the peer review, the toughness of the editor, etc.) does seem like wishful thinking to avoid the need for others to replicate _and_ correct.
      
      As Peirce put – roughly – “Good thing we die, otherwise we would eventually find out that anything we thought we knew, we were wrong about.”
Kaiser on March 6, 2014 11:36 AM at 11:36 am said:

With blogging, TEDtalks, popular science books, etc., we now have a bypass of the traditional peer review process and the traditional way of diffusing new research. So it is more important for bloggers and others to participate in putting such materials to the smell test. Some of these efforts are tied to PR efforts in which the sole objective is to generate attention, clicks, etc. and those are particular cases in which the science is bent and misappropriated.
The other thing is bear in mind is that people reading this blog might think the fraud is obvious but the average person does not have the training to notice the fraud – in fact, the average person is often sold that “conventional wisdom” is wrong.
- Andrew on September 1, 2014 10:48 PM at 10:48 pm said:
  
  Kaiser:
  
  It’s not quite the case that Ted etc bypass the traditional peer-review process. The peer-review process gives credibility to the Ted-type claims. Consider, for example, Gladwell’s promotion of Gottman’s peer-reviewed papers, or for that matter the publicity given to the various peer-reviewed noise studies of the “Psychological Science” variety. Ted and other publicity mechanisms are amplifiers, but it seems to me that they rely very heavily on peer review, to the extent of often seeming to take the position that if a claim is peer-reviewed then it must be worthy of respect.
Daniel Lakeland on March 6, 2014 1:03 PM at 1:03 pm said:

The real question is, can *I* too get away with just saying “consider …. where …. clearly made a large contribution to …. “?? (see 3rd paragraph of “it’s not just about research”)
Rahul on March 6, 2014 1:18 PM at 1:18 pm said:

Why don’t we routinely publish referee reports alongside every published paper? Is there a good argument against this? Maybe as supplementary information? Would such a measure adversely impact the frankness of referee reports?
Wonks Anonymous on March 6, 2014 2:15 PM at 2:15 pm said:

“infamous Excel error”
The Excel error is infamous because it’s obviously an error, but it did not significantly impact their results. Significant differences came from deliberate choices in how to go about finding a correlation. But in my mind the most important critique is Miles Kimball‘s, since he actually investigated causality rather than just looking at correlation. But almost nobody talks about that, instead focusing on the insignificant (but easier to understand) Excel error.
Nony on March 6, 2014 3:21 PM at 3:21 pm said:

Andrew:

I believe that you have a didactic bent and that pushing for truth and ethics and good methods is something that you enjoy and that you do well. I think you should continue to do so and encourage you to publish articles or books on this. Of course there is already stuff out there (Katzoff, Feinman, E Bright Wilson), but there are new young researchers coming up all the time. It is good to influence them and to bonk the middle-aged careerist hacks on the nose. You’ve got tenure.

You can still keep doing some normal research and it is good to keep your hand in it (for the reasons for instance that Hotelling recommended stats professors do so). But I don’t think you would be self actualized by solely studying the statistics of tsetse flies or whatever.

I don’t know about the million dollar job either. You still die at the end either way and who cares if you have a fancier apartment or house or the like. If you were going to go that way, you would have already done so. There are some people who love to figure out blackjack and then actually make money on it. Others who love to figure it out and then write a book like Thorpe. (And I realize Ken Huston is sort of in between…it’s analog, not digital.)

Keep up the good work and don’t let the careerist fucks get you down. If you get a little depressed from dealing with recalcitrant liars then take a break and work on tsetse flies for a while.

P.s.s. I REALLY LOVED the commenter who talked about criticizing work by expanding as a mechanism to criticize. IOW, instead of fighting front on about a study you disagree, try to generalize to another area (show it flawed) and then come back and re examine the original. I just think this really gets past the “pedestal” effect of accepted papers and the high hurdle of pure rebuttal papers. It’s almost just psychological…but it works (plus there are other learnings, get more LPUs on resume, etc.) The meta-analysis replication of the psychology work was also interesting. Just for the psychological way it was more accepted and different than a pure rebuttal paper.
Corey on March 6, 2014 3:57 PM at 3:57 pm said:

The Reinhardt and Rogoff Excel error was relatable — lots of people have screwed up a spreadsheet — so it drew most of the news coverage. But the Excel error was not the main driver of the results of the paper; the results were mostly driven by questionable data analytic choices R&R made. Fixing the Excel error would not have made a changed the direction of R&R’s conclusions, although it did increase the magnitude. The main questionable data analytic choice was to summarize a period of growth/decline by its average rate without accounting for the length of time being averaged, leading to, e.g. and IIRC, a growth period of 17 years being counted as a single data point, equal in evidential weight to decline periods that lasted a single year.

Just doing my part to correct the record.
digithead on March 6, 2014 5:45 PM at 5:45 pm said:

The real problem beyond the over-hyping of these papers is the poor statistical education given in social science. We’re not being taught statistics, we’re being taught how to work statistical software in a cookbook fashion without emphasizing that each application of such software is unique to the data being analyzed. Hence, we have several generations of scholars who claim expertise in statistics when what they really have is expertise in software. They really have no understanding of the methods themselves beyond the software and output and moreover, no understanding when such methods are useless when severe theoretical violations occur, especially non-randomness.

Are there social science folks good at stats/methods? Of course, but they’re the exception and not the rule. And with several generations of scholars having a false sense of ability where people are really only taught how to get a p-value from software, we get the crap and will continue to get the crap that Andrew and others rightly criticize until we figure out a way to undo this in statistical education.

But I fear with the balkanization of statistical education in social sciences where every department has their own statistics courses that we will never be able to fix this properly. I’ve been sitting on several doctoral committees in other social science departments, ostensibly because of my stats background, and I’ve been saddened by the fact that all of these ABD folks, all of them, didn’t know where to start their analyses or merely wanted to skip the beginning despite having had multiple methods and stats courses throughout their undergraduate and graduate coursework. One of them had done an HLM model and only wanted me to bless her output as correct (her words). I didn’t and made her start at the beginning and work her way up to model both theoretically (why a particular set of variables she chose for the model are important) and methodologically (do these variables have validity). She and the others had never been taught this and other important factors in data analysis such as data quality that will compromise results. My having her check her variables caught serious problems with her age variable (negative ages or ages>25 for what was a juvenile justice project). The other sad fact is that I replaced an ill faculty member on her committee who probably would’ve blessed her original analysis.

So unless someone can come up with a good solution to fix the statistical education problem, the reality is that Andrew and others are the only real effort at exposing this, however minimal the effect.
Nony on March 6, 2014 6:11 PM at 6:11 pm said:

Hoteling called out the same issue of stats departments (he was a stats professor, wanted to defend that).

I do wonder about having stats professors teaching nurses and the like though. The really useful lessons are things like making sure your sample is representative, watching your metrics, looking for confounding factors. And the way to teach them is much more practical than theoretical. Definitely don’t want some overly philosophical Bayesian versus frequentist or super advanced math person discussion with these young ladies.

What I actually am more interested in is “analysis”. I see it more and more in daily life, in business, in office workers. The ability slap together a pie chart, a very simple (no rsq) regression. Basic trending. Etc. Doing it at all (over not doing it) can be an advance. Then there can be ways to do it better. And sure, it can be wrong, so ways to watch for that. But then lots of decisions being made in business off of imperfect data and imperfect analysis. Even things like how to do a little “adapt and overcome” and collect primary data when there’s not time/money for proper market research. I just think this is an interesting area and that society is actually becoming more astute (at least in terms of more people being aware…not in terms of anything being new to the race). ANDREW THIS IS WHAT YOU SHOULD DO. Don’t go off and be a millionaire or a tsetse fly analyzer. ;)
- reallY? on March 6, 2014 9:07 PM at 9:07 pm said:
  
  “Definitely don’t want some overly philosophical Bayesian versus frequentist or super advanced math person discussion with these young ladies.”
  
  Wow, sexist AND condescending. A two-fer.
  - Nony on March 6, 2014 9:09 PM at 9:09 pm said:
    
    Are they independent or confounded? ;)
Russell Lyons on March 6, 2014 6:11 PM at 6:11 pm said:

“in a process well documented by Blalock and Duncan, positivist sociology, like so many other professions, has tended to become immune to the recognition of flaws in its work”: Positivism’s Twilight?, by Bernd Baldus, Can. J. Sociol. 15 (2), 1990, 149–163 http://www.jstor.org/discover/10.2307/3340748?uid=3739664&uid=2&uid=4&uid=3739256&sid=21103601495367
- Nony on March 6, 2014 7:09 PM at 7:09 pm said:
  
  What’s your take on this, Russel/’drew?
  
  http://www.amazon.com/Making-Count-Improvement-Social-Research-ebook/dp/B003AU4FHQ/ref=la_B001IXMOCU_1_1?s=books&ie=UTF8&qid=1394150694&sr=1-1
  - Russell Lyons on March 7, 2014 8:21 PM at 8:21 pm said:
    
    I’ve never read it. A cursory glance shows it has some things in it with which I agree, but if you can specify some details about which you’d like an opinion, I’ll try to oblige.
    - Nony on March 7, 2014 9:07 PM at 9:07 pm said:
      
      That was good enough.
WB on March 6, 2014 9:45 PM at 9:45 pm said:

Total irrelevant point, but … love the Death Wish image. That final scene in the Chicago train station always made my skin crawl: the point at which I felt guilty and uncomfortable for enjoying a brutal revenge fantasy.
Martin on March 7, 2014 7:48 AM at 7:48 am said:

“People talk about the self-correcting nature of the scientific process. But this self-correction only happens if people do the correction. And, in the meantime, bad ideas can have consequences.”

I am sometimes thinking about this when it comes to discussion papers and working papers that are so pervasive in economics. First of all, this makes sense from a rather practical POV: policy-relevant papers often have direct implications for policy makers. A paper about counter-cycalical measures written in 2008 surely comes late when published in, say, 2011. On the other hand, if it’s wrong, it might have sorry consequences because it has not been checked (you mention Reinhardt/Rogoff).

But I think there is more. Often enough, a working/discussion paper gets shot down even before entering a formal peer review process. But it’s still around, for everybody to point to and use in a discussion. One might have to wait a decade or so to definitely be able to say that this and that discussion paper is dead – without being revived in a revised form, even. Simply abandoned. Until then, it’s difficult to say if one does not happen to be an expert in the very field.

A rather complicated example pertains to Acemoglu/Johnson/Robinson, “The Colonial Origins of Comparative Development: An Empirical Investigation.” The authors used settler mortality data in former colonies as an instumental variable for early institutions, and subsequently showed the importance of the latter for comparative development – among former colonies, of course. Their original working paper is here:

http://ideas.repec.org/p/nbr/nberwo/7771.html

Now, their approach was an implicit (the institutional approach) and explicit (as concerns several specific claims) of much of the work on developmpent by Jeffrey Sachs. Sachs quickly wrote a comment in the form of a working paper, also published via NBER (Sachs is not listed at ideas/repec, so here is the direct link):

http://www.nber.org/papers/w8114

But Sachs also included the colonizers themselves in his analysis, which does not make much sense given A/J/R’s approach. Still, if you read the final version of the A/J/R as it was published in the AER, you’ll find that they addressed Sachs’ criticism:

http://ideas.repec.org/a/aea/aecrev/v91y2001i5p1369-1401.html

To quote:”In contrast to McArthur and Sachs’ results, we find that only institutions are significant. This difference is due to the fact that McArthur and Sachs include Britain and France in their sample. Britain and France are not in our sample, which consists of only ex-colonies (there is no reason for variation in the mortality rates of British and French troops at home to be related to their institutional development). It turns out that once Britain and France are left out, the McArthur and Sachs’ specification generates no evidence that geography/health variables have an important effect on economic performance.”

So, in a sense, this is awesome – this is as close as it gets to “open review” without having it institutionalised, I guess: a working paper is criticised by another, and the criticism – even if it is rebutted – is included in the published version. And everything is out there to follow the discussion – no anonymous reviewers the opinions of whom is not known to anybody outside the review process. On the other hand, the Sachs working paper died an early death: it has not subsequently been published. But it’s right there, accessible via NBER, not withdrawn, but obviously also abandoned by its authors.
- Andrew on March 7, 2014 9:47 AM at 9:47 am said:
  
  Martin:
  
  I agree. I like how they do things in econ, where a researcher will give a bunch of talks on a working paper and then get feedback which gets incorporated into the final version.
  
  There is a risk, though, and that is that economists can assume that a paper, once published in a top journal, has been thoroughly vetted and thus does not need to be checked. This may have been one of the reasons why Reinhardt and Rogoff felt no need to share their data for all those years, and we also saw this in our discussion of a controversial (and, in my opinion, fatally flawed) paper on genetic diversity and economic development that appeared in the American Economic Review. One commenter suggested that all the many critics of this paper (including me) must be wrong because:
  
  this paper passed to scrutiny of 5 referees, and it was presented in about 50 leading universities in the world. With all due respect, a-priori, the likelihood that your comments are off-mark is higher than the likelihood that 1000s of people who heard the paper in seminar and conferences, and referees and editors who read it very carefully as part of your editorial duties.
  
  Of course, once you start with that attitude, it’s game over. We have to recognize the existence of groupthink, blind spots, and bandwagon effects—as well as the willingness of journals to publish papers that are seriously flawed, if they are provocative enough.
  - Martin on March 7, 2014 10:43 AM at 10:43 am said:
    
    Well, yes, and I think there is a general misconception about peer review out there. As I understand it – if not from the concept itself, but from all the crap that passes peer review – it should be clear that peer review is not some seal of quality, but an absolute minimum sign of independent appraisal. One might thusly expect that the most obvious blunder to be filtered out, but not much more. Also, as the economist Richard Tol once mentioned in a paper (I do not recall which one) about estimates of total damages due to climate change, this is an extremely narrow research focus – with the inevitable consequence that perhaps a handfull of researchers have real expertise on the topic, that they know one another rather well, and probably even have some sort of (former) teacher-student relationship. So, real “outside” scrutiny is, IMO, not even to expect at this stage, and it’s easy to imagine how a close-knit group of researchers gets off the track for quite some time until someone outside the field, but with necessary methodological expertise, looks at the whole mess (might this be an explanation for the kerfuffle concerning psychological journals in recent years?).
    
    Anyway, I remember the genetic diversity/economic development paper and debates thereof. I also think that this episode (another economist veering off into “interdisciplinary” research, though he only has expertise in one of the disciplines involved) sounds somewhat strange:
    
    http://languagelog.ldc.upenn.edu/nll/?p=3756
    
    It’s difficult to say who can actually judge such things. To me, such things sounds rather like “thought-provoking ideas” that are needlessly bloated by mathturbation. Generally speaking, it sounds like a bad idea to fish outside one’s field without having seriously studied the other in order to be able to assess at least the plausibility of the claims involved. Again, no claim to expertise from my part, so I might be completely wrong here – but following Taleb, at least, one should be prepared to find spurious relationships in Big Data. Keeping this in mind, it seems like a very bad start to make a claim about linguistics that makes acutal linguists make roll their eyes…
    - K? O'Rourke on March 7, 2014 11:28 AM at 11:28 am said:
      
      Martin:
      
      You can’t rule out an hypothesis by how or who generated it.
      
      As I vaguely recall (it was my first year undergrad philosophy paper) Whorfian hypothesis failed due to lack of empirical support. Maybe someone has found how to get better data?
      
      But you do need to decide on how to most productively spend your time.
      
      And I am not going to spend any of my time on the Whorfian hypothesis!
      (unless there are multiple data sets yet _gamed_ and qualified others do the analysis and usually get similar support)
    - Martin on March 7, 2014 2:17 PM at 2:17 pm said:
      
      Actually, no, Whorfian hypotheses did not “fail” in any way, but are still discussed – though not without controversy. You might, for example be interested in the research (and opposition to it) by Berlin and Kay on color terms. Also, the linked to language log post provides links to discussion on the topic. What is true is that Whorfian hypotheses have little to do with Whorfs original formulation. That is not only because it’s somehow wrong, but so vague and truly esoteric that one cannot possibly work with it as formulated.
      
      Anyway, I did not mean to say that one should rule out a hypothesis based on who generated it or how she did it. And I think I did not say so. What I said is that under the assumption that one can rather easily find spurious relationships in Big Data (correct me if I’m wrong) one should have some idea about the plausibility of one’s claim. Specifically, the Chen paper claims to show a relationship between time preference and (overt) future tense marking. There is a crucial implicit assumption here: that future tense marking is in any way related to the question if future action is expressed or not. This seems obviously untrue given the perfectly plausible English sentence “Meg’s mother arrives tomorrow.”, as quoted in the language log comment. Time of action: tomorrow – morphological marking: present tense. So, while it is true that it does not matter who made the claim, it’s utterly nonsensical regardless. The other problem mentioned is that it is already unclear what future tense marking is according to Chen – as the English “going to”-construction indicates future, but is grammatically present tense! Right there, you can shift around definitions until you get your relationship as you wish. The danger to trick oneself gets arguably more pronounced as one looks at languages one does not know. While in English the claims that a “going to”-future construction is present in action or shows morphological future tense marking is too obviouly wrong to not force you to choose a clear criterion – how do you judge such claims e.g. in Khoekhoegowab?
      
      That also reminds me that I have a blog.
    - Fernando on March 7, 2014 11:41 AM at 11:41 am said:
      
      Martin: “One might thusly expect that the most obvious blunder to be filtered out, but not much more.”
      
      Actually, the whole publication bias approach suggests blunders are “filtered in”.
      
      As presently implemented peer review is probably a negative on the net.
    - Martin on March 7, 2014 1:49 PM at 1:49 pm said:
      
      Not sure how to show that peer review is a net negative. Anyway, with “obvious blunder” I really just meant stuff like a wrong formula, or, say, an estimate of total externalities of coal-fired power plants without considering CO2 emissions. Also (see below) the idea that an instrumental variable is chosen for no other reason than ‘just the assumption’ that it’s a suitable instrument – without excluding other effects. That is, there can still be egregious mistakes after taking out those obvious blunders, but high-school stuff and simple oversights should be filtered out, at least as a rule of thumb. But granted, this is more a guess.
- Wonks Anonymous on March 7, 2014 10:33 AM at 10:33 am said:
  
  What I find most bizarre of Acemoglu/Robinson is that in a paper about the effects of colonialism they don’t bother to check examples like Thailand or Ethiopia that evaded colonization in comparison with colonized neighboring countries. Just assume that mortality rates from disease have an effect only on colonial institutions, it couldn’t possibly have any other effect in the absence of colonialism! Ethiopia is actually included in their sample of colonized countries even though it retained its emperor until he was overthrown by communist officers in the second half of the 20th century.
  - Martin on March 7, 2014 10:53 AM at 10:53 am said:
    
    Wonk Anonymous,
    
    I did not chose the A/J/R paper because I feel it is so awesome, and I’d also think that a discussion about its content is completely off-topic here. Rather, I presented it as an example of a somewhat informal review system that has developed in economics, for whatever reasons. If you have serious concerns about their approach and conclusions (e.g. that excluding Ethiopia and Thailand topples their results) and also feel that you really understood what they/you are talking about: they all have email addresses.
    - Andrew on March 7, 2014 11:02 AM at 11:02 am said:
      
      Martin:
      
      I don’t know A, J, or R, but I recently I sent two emails to an economist about a paper he’d written that I had blogged on that I had questions about. I first sent him an email directly, then I sent another one through an intermediary (and I know it reached him). He did not respond to me in either case. I find that sort of thing very frustrating! Even though (as has been discussed in various threads on this blog) the authors have no obligation to respond to me, I find it disturbing when they avoid responding to criticism. Even a simple, “I don’t find your criticism plausible because . . .” would be something.
    - Martin on March 7, 2014 11:26 AM at 11:26 am said:
      
      Point taken! (Though Acemoglu tends to respond.)
      
      But again, I feel a specific criticism of their paper seems somewhat off-topic here (but it’s your blog, so perhaps I should shut up :) However, I would have taken Wonk Anonymous’ criticism a bit more seriously if he wouldn’t have given away the fact that he has not read the paper with the claim “Just assume that mortality rates from disease have an effect only on colonial institutions, it couldn’t possibly have any other effect in the absence of colonialism!” Actually, they discuss the effect of settler mortality on colonial istitutions at rather patience-exhausting length. Specifically, they discuss why the disease environment should not be expected to have had an effect on the indigenous population that matters in this context. So, whatever the (perhaps convincing) grounds that their reasoning is faulty, the ‘Just assume” is a dead give-away, here. Also, the paper is intensively discussed since over a decade, now – and serious criticism has been uttered (Easterly, for example, sees the effect on development due to Human Capital more than to institutions alone, and wrote a paper to this effect). So, there is no necessity to invent something just because one feels like disliking A/J/R’s approach without really having read it – there is really enough out there to choose from.
      
      Another possibility that will show those non-responders is to write a rebuttal.
Nick Brown on March 8, 2014 5:45 AM at 5:45 am said:

I think the amount of time spent on rebutting bad work can usefully be guided by the impact of that work, whether that’s theoretical or in the public sphere.

For example, here’s the actual timeline of a recent psychological study:
Day 0: Study write-up article accepted by (major) journal.
Day 28: The Economist publishes an article about the research, strongly implying that there were significant health-related main effects that were not, in fact, found by the study.
Day 75: One of the co-authors launches a mass-market self-help book which makes extensive use of the principles of this research.
Day 120: The same co-author writes a New York Times op-ed piece based on the premise that this research discovered the same significant health-related main effect that it did not, in fact, find.
Day 175: Journal article published online.
Day 177: Popular newspaper write-ups of this finding begin, with headlines suggesting that this psychological intervention replaces the need for exercise.
Day 178: Journal’s official press release appears, making much the same claims.

I think it would be irresponsible for someone with time available who became aware of the limitations of this study not to at least attempt to get their concerns published in a peer-reviewed forum, even if this may not lead to retractions either of the academic article or of the dangerous nonsense published in the popular media as a result.
Pingback: Briefly | Stats Chat
Denny on March 28, 2014 11:39 PM at 11:39 pm said:

“I think the amount of time spent on rebutting bad work can usefully be guided by the impact of that work, whether that’s theoretical or in the public sphere.”

Also, I think that there is a greater moral duty to critique poor/spun/crappy research that will serve to work against the interests of those in a weakened position in society. A poor quality paper which argues that Bankers benefit from being psychosocially managed by the state is less likely to harm the interests of bankers (who are quite able to defend themselves from attempts to encroach on their liberty) than poor quality work which argues that disabled people should be psychosocially managed by the state (the affect of which we’ve seen through recent reforms to Britain’s welfare system).
Pingback: The Notorious N.H.S.T. presents: Mo P-values Mo Problems « Statistical Modeling, Causal Inference, and Social Science Statistical Modeling, Causal Inference, and Social Science
Dan Rice on April 6, 2014 8:44 AM at 8:44 am said:

Andrew, great point at the end about how all research ultimately should be viewed as bumbling (in other words prone to error). I would also argue that this is true of new methodology development. But we need to be careful and pinpoint the reasons for the error when we criticize as I know that you do.

Simply labeling something as “lame statistics” without providing any reason for why one makes this charge is what I find especially irritating and non-productive though. In my ranting against your colleague in the past few days about his characterization of our methodology as “lame statistics”, the fact that he provided no reason for this charge was what set me off and not the fact that he made the charge as I strongly encourage such criticism as long as it is supported by reasons.

Thanks for the excellent commentary.

Dan
Pingback: Discussion with Steven Pinker on research that is attached to data that are so noisy as to be essentially uninformative « Statistical Modeling, Causal Inference, and Social Science Statistical Modeling, Causal Inference, and Social Science
jackinthebox on May 8, 2014 10:29 AM at 10:29 am said:

Social science *did* used to be full of people who were mainly pointers. They called themselves theorists. They called themselves methodologists. Mostly they called other people stupid. So I get the psychology professor’s point, but I think Andrew is right that if we dig deep we can both bumble and point in some fair proportion.
Pingback: Bad Statistics: Ignore or Call Out? « Statistical Modeling, Causal Inference, and Social Science Statistical Modeling, Causal Inference, and Social Science
Pingback: “The Critic as Artist,” by Oscar Wilde « Statistical Modeling, Causal Inference, and Social Science

Comments are closed.