“I would like to share some sad stories from economics related to these issues”

Per Pettersson-Lidbom from the Department of Economics at Stockholm University writes:

I have followed your discussions about replication, criticism, and the self-correcting process of science. I would like to share some sad stories from economics related to these issues. It is the stories about three papers published in highly respected journals, i.e., the study by Dahlberg, Edmark and Lundqvist (2012, henceforth DEL) published in the Journal of Political Economy and the study by Lundqvist, Dahlberg and Mork (2014, henceforth LDM) published in American Economic Journal: Economic Policy and the study by Aidt and Shevts (2012, henceforth AS) also published in AEJ: Economic Policy. I decided to write comments on all 3 papers since I discovered that they all have serious flaws. Here are my stories (I will try to keep them as short as possible).

Starting with DEL’s analyzes of whether there exists a causal relationship between ethnic diversity and preferences for redistribution, we (myself and Lena Nekby) discovered 3 significant problems with their statistical analysis: (i) an unreliable and potentially invalid measure of preferences for redistribution, (ii) an endogenously selected sample and (iii) a mismeasurement of the instrumental variable (the refugee placement policy). We made DEL aware of some of these problems before they resubmitted their paper to JPE. However, they did not pay any attention to our critique. Thus, we decided to write a comment to JPE (we had to collect all the raw data ourselves since DEL refused to share their raw data). When we re-analyzed the data we found that correcting for any of these three problems reveal that there is no evidence of any relationship between ethnic diversity and preferences for redistribution. However, JPE desk-rejected (without sending it to referees) our paper twice (the first time by the same editor handling DEL and the second time by another editor when the original editor had stepped down). We then submitted our papers to 6 other respected economic journals, but it was always rejected (typically without sending it to referees). Nonetheless, most of the editors agreed with our critique but said that it was JPE’s responsibility to publish it. Eventually, Scandinavian Journal of Economics has recently decided to publish our paper.

The second example is from AS which study the effect of electoral incentives on the allocation of public services across U.S. legislative districts. I realized that they have 3 serious problems in their differences-in-difference design: (i) serial correlation in the errors, (ii) functional form issues, and (iii) omitted-time invariant factors at the district level since AS do not control for district fixed effects. When I reanalyze their data (posted on the journals website) I find that correcting for any of these three problems reveals that there is no evidence of any relationship. I submitted my comment to AEJ:Policy long before the paper was published but I was told by the editor that that they do not publish comments. Instead, I was told to post a comment on their website. So that is what I did (see https://www.aeaweb.org/articles.php?doi=10.1257/pol.4.3.1)

The third example is from LDM which use a type of regression-discontinuity design (a kink design) to estimate causal effects of intergovernmental grants on local public employment. I discovered that their results depends on (i) extremely large bandwidths and (ii) mis-specified functional forms of the forcing variable since they omit interactions in the second and third order polynomial specification. I show that when correcting for any these problems there is no regression kink that can be used for identification. I again wrote to the editor of AEJ:policy (another editor this time) long before the paper was published making them aware of this problem but I was once more told that the AEJ: policy do not publish comments. Again, I was told to post my comment on their website and so I did (see https://www.aeaweb.org/articles.php?doi=10.1257/pol.6.1.167)

What bothers me most about my experience with replicating and checking the robustness of others people’s work is two things: (i) the reluctance of economic journals to publish comments on papers that are found to be completely and indisputably wrong (I don’t think posting a comment on a journals website is satisfactory procedure. I am probably the only one stupid enough to do it!) and (ii) that researchers can get away with scientific fraud. The last point is about that I discovered that both DEL and LDM (the two papers have two authors in common) intentionally misreport their results. For example, in DEL they analyze at least 9 outcomes but only choose to report those 3 who confirm their hypothesis. Had they reported these other results, it would have been clear that there is no relationship between ethnic diversity and attitudes for redistribution. DEL also make a number of, sample restrictions, often unreported, which reduces the number of observations from 9620 to 3834 thereby creating a huge sample selection problem. Again, had they reported the results from the full sample it had been very clear that there is no relationship. DEL also misreport the definition of their instrumental variable even though previous work has used exactly the same variable and where the definition has been correct. Had they reported the correct definition it had been obvious that their instrument is actually a poor instrument since it does not measure what is purported to measure. Turning to LDM, there are 4 estimates in their Table 2 (which show the first-stage relationship) that have been left out intentionally. Had they reported these 4 estimates it would have very clear that the first-stage relationship is not robust since the sign of the estimate switches from being positive (about 3) to negative (about -3). Moreover, had they reported smaller bandwidths (for example, a data driven optimal RD) it also had been clear that there is no first-stage relationship since for smaller bandwidths almost all the estimates are negative. Also, had they reported the correct polynomial functions it had also been very clear that the first-stage estimate is nonrobust.

So the bottom line of all this is that “the self-correcting process of science” does not work very well in economics. I wonder if you have any suggestions how I should handle this type of problem since you have had similar experiences.

I don’t have the energy to look into the above cases in detail.

But, stepping back and thinking about these issues more generally, I do think there’s an unfortunate “incumbency advantage” by which published papers with “p less than .05” are taken as true unless a large effort is amassed to take them down. Criticisms are often held to a much higher standard than held for the reviewing of the original paper and, as noted above, many journals don’t publish letters at all. Other problems included various forms of fraud (as alleged above) and a more general reluctance of authors even to admit honest mistakes (as in the defensive reaction of Case and Deaton to our relatively minor technical corrections to their death-rate-trends paper).

Hence, I’m sharing Pettersson’s stories, neither endorsing nor disputing their particulars but as an example of how criticisms in scholarly research just hang in the air, unresolved. Scientific journals are set up to promote discoveries, not to handle corrections.

In journals, it’s all about the wedding, never about the marriage.

33 thoughts on ““I would like to share some sad stories from economics related to these issues”

  1. While I was a grad student at Columbia (Econ) I worked as an RA for a BSchool professor. He asked a fellow grad student and me to replicate a paper that had just come out in the Journal of Finance. We got the data (not that hard, it was a time-series regression) and found that the results of the paper were due to a mistake: the author had regressed the variable of interest y(t) on x(t+1) instead of x(t-1). Exited about this, we proposed to our employer to send a note to the journal pointing this out. His reaction: no one is interested in this, it would just be “throwing water”.

  2. Of course many editors are not interested in making sure that the articles they publish are replicable. The JPE has a long-standing history in this regard. See

    http://www.handelsblatt.com/politik/konjunktur/oekonomie/nachrichten/steven-levitt-blocks-an-undesired-statement-no-comment-please/2976444.html

    for an example with a striking resemblance to the presently-discussed incident.

    It is not surprising either that the editors say, “It is JPE’s responsibility to publish the comment.” Being the editor of an economics journal is belonging to a club where most of the members implicitly agree not to keep each other honest, thus perpetuating the stream of unreplicable research.

    Until the editors really are interested in publishing replicable research, economics cannot be a science.

    • Bruce:

      Levitt in particular is on record with some very cynical attitudes toward the scientific publishing process. He wrote, “Is it surprising that scientists would try to keep work that disagrees with their findings out of journals? . . . Within the field of economics, academics work behind the scenes constantly trying to undermine each other.”

      Levitt seems to have been correct in a descriptive sense (including describing his own behavior at times), but I was bothered by what seemed to me his calm acceptance of this state. When scientists behave badly, it upsets me and I want to scream about it.

  3. In journals, it’s all about the wedding, never about the marriage.

    This is a wonderful line. I’d love to quote it in a book I’m writing on a href = “http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2656871”> How to be a Good Professor. Should I cite this post, or have you or others used it before? (A quick search didn’t reveal anything helpful.)

        • Also funny that in English “marriage” can be used as synonymous with wedding (e.g., the marriage took place on such and such a date) or for the state of having a spouse. (Wow, I had to stop and think about how to say the second usage clearly.)

        • Hey, Bob Carpenter is a really big-shot linguist himself. I learnt categorial grammar from his textbook, Type Logical Semantics.

          Unfortunately, my expertise does not extend to the lexical level, I only work on syntax related issues. Words are only feature bundles to me.

        • Rahul, how about shaadi vs shaadi-shuda in Hindi? The first means wedding, except when one adds a marker for a completed action (meri shaadi ho cukii hai), when it means “I’m married”. But the latter can only be used to say “(I’m) married”.

  4. There are various mechanisms which can create science to be path dependent rather than self-correcting. Here we discuss various phenomena in the context of model based problem solving (with mainly operations research examples) Paper: http://www.sciencedirect.com/science/article/pii/S2214716015300117 Poster: http://sal.aalto.fi/publications/pdf-files/clah15b.pdf

    This connects strongly also to Andrew’s work on garden of forking paths. (I will mention the link in future work, unfortunately I was unaware of it when the above mentioned paper was being written.)

    Perhaps I can here also express my gratitude for this great blog. It has taught me much. I try to take the message forward. E.g. here is my presentation related to statistical analysis and experimental design in behavioral Operations Research studies. http://bor.aalto.fi/experiments_tuomas_bor.pptx

  5. Surely this old joke must have been cited here before. If so, I apologize for the duplication (replication?):
    A man who gets thrown into a jail cell with a long-term occupant and then begins a series of attempts to escape, each by some different method. He fails every time, getting captured and thrown back in the cell. The older prisoner looks at him silently after each failure. Finally, after six or seven attempts, the man loses his patience with the old prisoner and says “Well, couldn’t you help me a little?” “Oh,” says the old guy, “I’ve tried all the ways you thought of — they don’t work.” “Well why the hell didn’t you tell me?!” shouts the man. The old prisoner replies, “Who reports negative results?”

  6. A really important point that needs to come in here is also that many of the errors described above are things that generally could not have been caught in peer review, even by a very dedicated and careful reviewer. There are lot of statistical missteps you just can’t catch until you actually have the replication data in front of you to work with and look at.

    Andrew, do you think we will ever see a system implemented where you have to submit the replication code with the initial submission of the paper, rather than only upon publication (or not at all)? If reviewers had the replication files, they could catch many more of these types of arbitrary specification and fishing problems that produce erroneous results, saving the journal from the need for a correction.

    Obviously there would need to be very careful safeguards and rules in place to prevent reviewers from stealing a rival’s data from an unpublished paper and using it for themselves before the first paper gets published. Nobody would ever agree to this system unless there was some mechanism to prevent against this kind of thing. But assuming there could be some way to figure this out (maybe reviewers can only analyze the data by logging into an external desktop application to connect to an external computer hosted by the journal?), I think this would be a very positive step in peer review.

    I review papers all the time and sometimes I suspect there might be something weird going on in the data, but without the data itself I often just have to take the author(s) word for it that when they say they do X, they actually did X, etc. And there are limits to how much I can reasonable speculate in a review about what doing Y would have looked like instead without being able to just do Y and see. Its often really fair to the authors to reject an otherwise excellent paper on the grounds of your pure speculation about a hypothetical error, so reviewers usually don’t. But then bad science gets through and people can only catch the mistakes post-publication, triggering all this bs from journals about not publishing corrections.

    • Lewis:

      There might be some computer science journals that require runnable code to be submitted? I don’t know. As a reviewer I am not going to want to spend the time finding flaws in a submitted paper. I’ve always been told that it is the author, not the journal, who is responsible for the correctness of the claims. As a reviewer, I will, however, write that the paper does not give enough information and I can’t figure out what it’s doing.

      Ultimately I think the only only only solution here is post-publication review. The advantage of post-publication review is that its resources are channeled to the more important cases: papers on important topics (such as Reinhart and Rogoff) or papers that get lots of publicity (such as power pose). In contrast, with regular journal submission, every paper gets reviewed, and it would be a huge waste of effort for all these papers to be carefully scrutinized. We have better things to do.

Leave a Reply to Jay Livingston Cancel reply

Your email address will not be published. Required fields are marked *