Skip to content
 

Against optimism about social science

tip_of_the_iceberg

Social science research has been getting pretty bad press recently, what with the Excel buccaneers who didn’t know how to handle data with different numbers of observations per country, and the psychologist who published dozens of papers based on fabricated data, and the Evilicious guy who wouldn’t let people review his data tapes, etc etc. And that’s not even considering Dr. Anil Potti.

On the other hand, the revelation of all these problems can be taken as evidence that things are getting better. Psychology researcher Gary Marcus writes:

There is something positive that has come out of the crisis of replicability—something vitally important for all experimental sciences. For years, it was extremely difficult to publish a direct replication, or a failure to replicate an experiment, in a good journal. . . . Now, happily, the scientific culture has changed. . . . The Reproducibility Project, from the Center for Open Science is now underway . . .

And sociologist Fabio Rojas writes:

People may sneer at the social sciences, but they hold up as well. Recently, a well known study in economics was found to be in error. People may laugh because it was an Excel error, but there’s a deeper point. There was data, it could be obtained, and it could be replicated. Fixing errors and looking for mistakes is the hallmark of science. . . .

I agree with Marcus and Rojas that attention to problems of replication is a good thing. It’s bad that people are running incompetent analysis or faking data all over the place, but it’s good that they’re getting caught. And, to the extent that scientific practices are improving to help detect error and fraud, and to reduce the incentives for publishing erroneous and fradulent results in the first place, that’s good too.

But I worry about a sense of complacency. I think we should be careful not to overstate the importance of our first steps. We may be going in the right direction but we have a lot further to go. Here are some examples:

1. Marcus writes of the new culture of publishing replications. I assume he’d support the ready publications of corrections, too. But we’re not there yet, as this story indicates:

Recently I sent a letter to the editor to a major social science journal pointing out a problem in an article they’d published, they refused to publish my letter, not because of any argument that I was incorrect, but because they judged my letter to not be in the top 10% of submissions to the journal. I’m sure my letter was indeed not in the top 10% of submissions, but the journal’s attitude presents a serious problem, if the bar to publication of a correction is so high. That’s a disincentive for the journal to publish corrections, a disincentive for outsiders such as myself to write corrections, and a disincentive for researchers to be careful in the first place. Just to be clear: I’m not complaining how I was treated here; rather, I’m griping about the system in which a known error can stand uncorrected in a top journal, just because nobody managed to send in a correction that’s in the top 10% of journal submissions.

2. Rojas writes of the notorious Reinhardt and Rogoff study that, “There was data, it could be obtained, and it could be replicated.” Not so fast:

It was over two years before those economists shared the data that allowed people to find the problems in their study. If the system really worked, people wouldn’t have had to struggle for years to try to replicate an unreplicable analysis.

And, remember, the problem with that paper was not just a silly computer error. Reinhardt and Rogoff also made serious mistakes handling their time-series cross-sectional data.

3. Marcus writes in a confident tone about progress in methodology: “just last week, Uri Simonsohn [and Leif Nelson and Joseph Simmons] released a paper on coping with the famous file-drawer problem, in which failed studies have historically been underreported.” I think Uri Simonsohn is great, but I agree with the recent paper by Christopher Ferguson and Moritz Heene that the so-called file-drawer problem is not a little technical issue that can be easily cleaned up; rather, it’s fundamental to our current practice of statistically-based science.

And there’s pushback. Biostatisticians Leah Jager and Jeffrey Leek wrote a paper, which I strongly disagree with, called “Empirical estimates suggest most published medical research is true.” I won’t go into the details here—my take on their work is that they’re applying a method that can make sense in the context of a single large study but which won’t generally work with meta-analysis—my point is that there remains a constituency for arguments that science is basically OK already.

I respect the view of Marcus, Rojas, Jager, Leek, and others that the current environment of criticism has in some ways gone too far. All those people do serious, respected research, and those of us who do serious research know how difficult it can be to publish in good journals, how hard we work—out of necessity—to consider all possible alternative explanations for any results we find, how carefully we document the steps of our data collection and analysis, and so forth. But many problems still remain.

Thomas Basbøll analogizes the difficulties of publishing scientific criticism to problems with the subprime mortgage market before the crash. He quotes Michael Lewis:

To sell a stock or bond short you need to borrow it, and [the bonds they were interested in] were tiny and impossible to find. You could buy them or not buy them but you couldn’t bet explicitly against them; the market for subprime mortages simply had no place for people in it who took a dim view of them. You might know with certainty that the entire mortgage bond market was doomed, but you could do nothing about it.

And now here’s Basbøll:

I had a shock of recognition when I read that. I’ve been trying to “bet against” a number of stories that have been told in the organization studies literature for years now, and the thing I’m learning is that there’s no place in the literature for people who take a dim view of them. There isn’t really a genre (in the area of management studies) of papers that only points out errors in other people’s work. You have to make a “contribution” too. In a sense, you can buy the stories people are telling you or not buy them but you can’t criticize them.

This got me thinking about the difference between faith and knowledge. Knowledge, it seems to me, is a belief held in a critical environment. Faith, we might say, is a belief held in an “evangelical” environment. The mortgage bond market was an evangelical environment in which to hold beliefs about housing prices, default rates, and credit ratings on CDOs. There was no simple way to critique the “good news” . . .

Eventually, as Lewis reports, people were able to bet against the subprime mortgage market, but it wasn’t easy. And the fact that some investors, with great difficulty, were able to do it, doesn’t mean the financial system is A-OK.

Basbøll’s analogy may be going too far, but I agree with his general point that the existence of a few cases of exposure should not make us complacent. Marcus’s suggestions on cleaning up science are good ones, and we have a ways to go before they are generally implemented.

P.S. Coincidentally, Jeff Leek posted something today on the same topic, but with a slightly different perspective (he refers to “the current over-pessimism about science”). Leek argues, reasonably enough, “that people are using a few high-profile cases to hyperventilate about the real, solvable, and recognized problems in the scientific process” and he worries that “the rational reasonable problems we have, with enough hyperbole, will make it look like the scientific process ‘sky is falling’” and lend support to political attacks on science more generally. I think Jeff and I should be able to agree to the following:

- Science is hard, we all make mistakes, the system has problems but all human systems have problems, in working to fix these problems we shouldn’t thrown the research baby out with the bathwater that is the changing rules of scientific communication.

- We’re not there yet, we still live in a world in which it’s easier to publish and hype a elaborate flawed claim than to report a simple correction, a world in which data sharing is far from the norm, and where social and statistical biases lead to systematic overreporting of dramatic claims and systematic overestimation of effect sizes.

Leek is making the valid point that the sort of doomsaying that has been needed to draw attention to problems in scientific communication and to motivate improvements, can also be used, in guilt-by-association sense, to disparage good science. And, even in popular culture, my impression is that things aren’t as bad as they used to be. Sure, vaccine deniers and global warming deniers and all the other deniers are out there, but it’s not like the 70s when people were buying millions of copies of Chariots of the Gods, The Jupiter Effect, and The Bermuda Triangle, right?

29 Comments

  1. LemmusLemmus says:

    Interesting take. Is it possible to see the letter you mention?

  2. WB says:

    Why don’t journals simply post serious criticisms and important corrections on their websites? Any reluctance to admit mistakes and publish corrections seems inexcusable given how easy it is to post items online. Obviously, websites don’t face the strict space limitations of print journals. So online sections could be used to publish items that aren’t “in the top 10% of submissions to the journal,” but are nonetheless important and worth the attention of readers.

    • Andrew says:

      Wb:

      I agree completely, and I find it frustrating that this is not done. I suppose one reason they don’t do it is that it would take effort and expense to set up the website. Another difficulty is the need to review the critiques. If it were easier to publish a letter to the editor, I suppose the journal would get more submissions, then they’d need to find more reviewers, etc.

  3. gwern says:

    > I had a shock of recognition when I read that. I’ve been trying to “bet against” a number of stories that have been told in the organization studies literature for years now, and the thing I’m learning is that there’s no place in the literature for people who take a dim view of them. There isn’t really a genre (in the area of management studies) of papers that only points out errors in other people’s work. You have to make a “contribution” too. In a sense, you can buy the stories people are telling you or not buy them but you can’t criticize them.

    So, if only there were active prediction markets on scientific claims…

    • Thomas says:

      I think that may be stretching the analogy a bit. I’m saying that there’s an analogy to be drawn between financial markets and academic discourses. There was no market in which to express doubts about mortgage-backed CDOs. Likewise, there is not much of a discourse in which to express doubts the results of social science.

      • K? O'Rourke says:

        I think to analogy is insightful.

        > you can buy the stories people are telling you or not buy them but you can’t criticize them

        There is a group of statisticians that work on methods for evidence based medicine, methods that critically question published finds and their interpretation. At the same time, the methods they developed to do this are often protected from criticism. Sort of a “you can use our methods or help us improve them but you can’t criticise them nor suggest anything to different”.

        The same folks seem to control the agenda, the _too different suggestions_ keep falling off the agenda, various people complain about the _group think_ privately asking to remain anonymous, their email blog is private and they have suspended members for sending confrontational emails that detract from the usual collaborative spirit, etc.

        I have had trouble understanding why those that promote critically questioning evidence would do the exact opposite with regard to questioning their methods. But I think that we all do this at some level, effectively preventing rather than facilitating getting less wrong as in some perceived sense it’s too costly.

        Some similarities possibly here Stelfox HT, Chua G, O’Rourke K, Detsky AS. Conflict of interest in the calcium channel antagonist debate. The New England Journal of Medicine 338(2): 101-106, 1998.

  4. Robert says:

    Richard Feynman’s talk on Cargo Cult Science ( http://neurotheory.columbia.edu/~ken/cargo_cult.html )is worth reading and makes some good points.

  5. Wonks Anonymous says:

    My recollection was that Dr. Anil Potti worked in medicine, not social science.

    Other than the excel error, the choices made by R&R are not necessarily “mistakes”.

  6. Conan DeWitt says:

    >This got me thinking about the difference between faith and knowledge. Knowledge, it seems to me, is a belief held in a critical environment. Faith, we might say, is a belief held in an “evangelical” environment.

    Before we go too far on the faith/knowledge dichotomy here, it might be useful to acknowledge that, for most of human history (until the modern age), these two entities were not generally viewed as opposite ends of a dilemma. To borrow a random sample from a major pre-modern philosopher, where “faith” and “doubt” are explicitly acknowledged to share fundamental characteristics:

    http://www.newadvent.org/summa/3002.htm#article1

    I believe the original poster was attempting to oppose, in the pre-modern lingo, “rational demosntration (science)” against some other non-demonstrative form of knowledge (nous, suspicion, opinion, etc.)

  7. Fernando says:

    Andrew: “but because they judged my letter to not be in the top 10% of submissions to the journal”

    I find this attitude on part of journals ridiculous. Surely if a paper was in top 10% to be published, then showing that said paper is wrong, or flawed, or whatever, ought also to be in top 10%. For example, finding that X is a cure for HIV is an important finding, but showing that X is not a cure after all is also as important, or no?

    The asymmetry is the tell tale sign of an _inconsequential_ science. Or, at least, of editors who treat social science as if it were inconsequential. This is easy to see. Currently evidence that updates my prior from A to B is published, but additional evidence that updates back to A is not. Yet if we were betting money on A vs B we would want to hear about the correction. Ignoring the correction suggests we don’t care about being wrong, presumably because the findings are inconsequential.

    • Fernando says:

      PS If a branch of science is inconsequential, it may receive less funding. It is incumbent on social scientists to treat their science as consequential if they want others to do so too.

    • Thomas says:

      “Surely if a paper was in top 10% to be published, then showing that said paper is wrong, or flawed, or whatever, ought also to be in top 10%.”

      This is really the basic principle I hope will one day be applied. One of the frustrating aspects of my work is watching “appreciations” of influential scholars get published in journals that reject my work because they don’t publish “critical essays”.

      I’m not sure how to measure it, but there seems to be a strong bias in the publishing world towards promoting new truths over exposing old falsehoods. But I actually think our time would be better spent dismantling false views than inventing new ones. Suppose truths could be counted somehow. Now imagine a situation where we believe 40 truths and 60 falsehoods. Would you rather “fix” this problem by discovering 20 new truths to believe (making it 60/60) or by discovering the falsehood of 20 existing beliefs (making it 40/40)? I’d prefer the latter.

  8. jonathan says:

    If I may about R&R, I think the issue with data is more complicated. Putting aside when they put the data set up on their website, issues weren’t discovered until they handed over their actual Excel work. But the interesting thing to me is this: they seem to have examined the years of data omitted from their 2010 famous paper and used the corrected data in a 2012 paper without ever noting the data was different. This has been cited to me as “proof” of their goodness. It is to me absolutely damning: they knew before 2012 that their 2010 paper was wrong in key ways and said nothing, posted nothing on their website, etc. (There is more to this, read on.) It looks to me like they wanted to keep their famous paper at the top of the heap while hiding the changes so they’d never be put together. To be clear, the key ways are not that growth doesn’t correlate with debt but that the impact is not as large as the 2010 paper and there is no dramatic cliff at 90% debt levels.

    Now for the neat part: turns out the corrected data set is still wrong. With additional corrections, the results change somewhat again – but not that much. Even the recent critical UMass paper wasn’t aware of the latest revealed data set errors – and they’ve been good about making changes on their website. This makes me wonder about something far more substantial than spreadsheet errors: the integrity and reliability of data sets. The very first thing I do when going through a paper is to identify the data set. I sometimes have to look up what it really is so I can understand its limitations. I have a general sense of the paper’s intent so I effectively construct in my head a prior for the reliability of the manipulations performed on the data given its apparent limitations.

  9. Steve Sailer says:

    Dr. B. says:

    “The mortgage bond market was an evangelical environment in which to hold beliefs about housing prices, default rates, and credit ratings on CDOs. There was no simple way to critique the “good news” . . .”

    The mortgage bond market of the mid-2000s was to a large extent a bet on the wealth generating capability of Hispanic immigrants. There was no socially acceptable way to critique the “good news” preached by George W. Bush, Angelo Mozilo, and Henry Cisneros.

    A general problem with the social sciences is that there’s little market for unwelcome news. For example, one of the great triumphs of the social sciences over the last century has been the field of IQ. But how many people want to hear about that?

    • Thomas says:

      I don’t really believe it was a bet on anything. I think it was at some levels an entirely conscious (and callous) attempt to pump money out of the economy pretty quickly. I don’t think the people who were making the key decision were thinking long term enough to think about anybody’s “wealth generating capability”. By the time the bet was “lost” (i.e., when the Lewis’s short sellers won), the real winners had already cashed out.

      • Steve Sailer says:

        As Michael Lewis pointed out, the Housing Bubble of 2004-2007 was overwhelmingly concentrated in four states, what Wall Street guys who traded securitized mortgages called the Sand States: California, Arizona, Nevada, and Florida. By one estimate, about 7/8ths of nationwide decline in home values in 2008 were in those four states, all of which have seen huge influxes in Hispanics.

        You can trace the beginning of the worst phase of the Housing Bubble to George W. Bush’s October 15, 2002 White House Conference on Increasing Minority Homeownership, in which he told his federal regulators that down payment and documentation requirements were standing in the way racial equality in homeownership. (In his memoirs, Bush tersely apologized for the role his Ownership Society initiatives played in the economic debacle.)

        Numerous studies since the Bubble burst have shown that the Bubble was heavily driven by minority home buyers, and that minorities defaulted on mortgages at a much higher rate.

    • Anonymous says:

      Really? You’re attributing the housing crisis to some sort of neo-liberal bet on political correctness? Short term profit and misaligned incentives is a perfectly sufficient explanation (and much more plausible considering the actors involved).

      Also, IQ has got to be one of the prime examples of statistical reification stupidity in academia. Cozma Shalizi is pretty good at dissecting this kind of garbage – for example – http://vserver1.cscs.lsa.umich.edu/~crshalizi/weblog/520.html

      • gwern says:

        Cosma Shalizi does nothing of the sort! Read through your link: all this material on heredity is well and good, but besides being nitpicking methodological points which only open up small *possibilities* that the twin studies are not conclusive and his page ignoring that the genome-wide-associations *are showing genetic influences on IQ*, it’s not even relevant to the question of whether IQ predicts real-world outcomes like health, longevity, income, crime, education, and so on and so forth!

        • Anonymous says:

          The “scientific contribution” of IQ is that it is supposedly something intrinsic, heritable and causative (not just predictive) of those real world outcomes. If it’s a made up number which is a marker for an individual’s environment (which will be correlated, or “predictive” of those outcomes by design), it’s as useful a scientific contribution as a zip code.

          “I hope to persuade you that the current estimates are not reliable, that the notion of a value for IQ’s heritability is silly, and that we do, indeed, know squat about that question. ” That says a lot about the sum total of contributions made by the IQ literature. The “evidence” of it as an intrinsic genetically heritable property of individuals is, as I said, garbage.

          • gwern says:

            > The “scientific contribution” of IQ is that it is supposedly something intrinsic,

            ‘Intrinsic’? As opposed to ‘extrinsic’, which is what, ‘IQ scores are generated by a coin flip every time ‘

            > heritable

            IQ *is* heritable. This is a fact disputed by no one at all. What Shalizi is illustrating, at tedious length in your link, is that ‘heritable’ is a technical term which does not mean what you apparently think it means and does not necessarily show anything like a genetic contribution. This is true, but increasingly irrelevant due to the genetics studies.

            > and causative (not just predictive) of those real world outcomes

            Merely being ‘predictive’ is extremely interesting to both psychologists and anyone interested in prediction (like employers such as the military).

            > If it’s a made up number which is a marker for an individual’s environment (which will be correlated, or “predictive” of those outcomes by design), it’s as useful a scientific contribution as a zip code.

            Your sarcasm about the invaluable zip code aside, even if it were nothing but a ‘marker for an individual’s environment’, no one has yet come up with another marker that outpredicts IQ across such a wide variety of environments or roles or variables and so IQ would still be a major success and a scientific contribution.

            > “I hope to persuade you that the current estimates are not reliable, that the notion of a value for IQ’s heritability is silly, and that we do, indeed, know squat about that question. ” That says a lot about the sum total of contributions made by the IQ literature. The “evidence” of it as an intrinsic genetically heritable property of individuals is, as I said, garbage.

            ‘know squat’ is an extraordinary assertion; one wonders if Shalizi really does thnk that all the studies have led to absolutely no information and the best guess about the question is still a blank ignorant guess of 50%… (So if we grabbed 20 things which are herditable according to twin studies and all the other kinds of studies done, we would expect to find that 10 were simple ‘zip code’ delusions?) And again, Shalizi ignores the actual genetics studies which are exactly in line with the estimates compiled over all these years (although to be fair, I should note that I think his posts were written before several of the studies, so I actually mean, you are ignoring them).

            Garbage indeed. No doubt we’ll be seeing links to Shalizi even years after the studies nail down particular genes.

            • Anonymous says:

              “‘Intrinsic’? As opposed to ‘extrinsic’, which is what, ‘IQ scores are generated by a coin flip every time ‘” – this is a great example of how null hypothesis testing gets misused. No, extrinsic is the opposite of intrinsic, not ‘generated by coin flips’ or whatever other null distribution you might be imagining. This is precisely the kind of flawed logic that leads people to interpret IQ correlations incorrectly.

              “increasingly irrelevant due to the genetics studies.” – it’s extremely relevant. Genetic studies never estimate the causal effect of genetic mutations unless you’re doing something like site-directed mutagenesis (which is light-years away from the association studies you’re talking about). The problems relating to environment and confounding are everywhere in genetic association studies.

              “Q *is* heritable. This is a fact disputed by no one at all. What Shalizi is illustrating, at tedious length in your link, is that ‘heritable’ is a technical term which does not mean what you apparently think it means and does not necessarily show anything like a genetic contribution. This is true, but increasingly irrelevant due to the genetics studies.”

              If you stretch the definition of heritability to include things like “lives in the same place and income”, heritability ceases to be a well defined, falsifiable hypothesis (i.e. it is no longer scientific). Likewise, if IQ is merely a description, something something like a principle component for various outcomes, it ceases to become falsifiable as a scientific concept. It doesn’t play a role in the things you’re relating it to, it simply _is_ those things.

              Furthermore, if one is fine accepting IQ as a projection for all these environmental factors, it’s rather ethically questionable to discriminate based on them, isn’t it? Unless one’s goal is to have an excuse to discriminate based on circumstances and environment, in which case, reifying a number that’s correlated with them makes perfect sense.

              “studies have led to absolutely no information and the best guess about the question is still a blank ignorant guess of 50%… (So if we grabbed 20 things which are herditable according to twin studies and all the other kinds of studies done, we would expect to find that 10 were simple ‘zip code’ delusions?) And again, Shalizi ignores the actual genetics studies which are exactly in line with the estimates compiled over all these years (although to be fair, I should note that I think his posts were written before several of the studies, so I actually mean, you are ignoring them).”

              Your statement is a misuse of null hypothesis testing. Also, as I mention above, genetic association studies are as susceptible to ecological confounding as any other observational correlation (which is what they are).

      • Steve Sailer says:

        “Also, IQ has got to be one of the prime examples of statistical reification stupidity in academia. Cozma Shalizi is pretty good at dissecting this kind of garbage”

        Here’s a recent discussion of one of Dr. Shalizi’s older posts on IQ:

        http://isteve.blogspot.com/2013/04/is-g-factor-myth.html

  10. Andrew, I agree (of course!) on both publishing replications (positive or negative) and null results, and I also agree (of course) that journals ought to publish legitimate criticisms of published results (even given the fact that reviewing them is non-trivial). One thing I am interested in is: Are these two things related? In some ways they both call for a change to what the journal is doing.

    For example, sometimes I have written to a journal editor and said “it would be good if the journal published more of X” or “less of Y” and the response has always been, quite sensibly “we go with what our reviewers say; we don’t over-rule or micro-manage our reviewers”. This is a key idea; in some sense it relates to academic freedom: The journals publish what the peer community says it should publish. If we insist that the journal also publish replications and third-party retraction requests, we are insisting it go beyond peer review and be more active, in some sense.

    Of course this is a longer argument, that doesn’t fit into a blog comment-it involves the 10-percent number you mention in your post, which seems like a strange request to put on a reviewer, etc-but in some sense what we need is for the journals to be actively more sensible than the peer reviewing community (which by assumption is not as interested in replications and nulls and retractions as it should be; maybe I’m wrong), and that is a hard thing to make happen, at least in my field, where there is a strong tradition of letting the peer reviewers make all final calls on content.

    There is some deep idea in my field (which has few journals, almost all of which are scientific-society-run) that the journal should be a relatively neutral entity; it might have strange or bad long-term consequences to mess with that. Not to imply, in any way, that I am against publishing nulls, replications, and retraction requests! Just to say that messing with the journal must be done with care. Certainly there should be a part of the journal that is totally peer-review-run, if for tradition’s sake at least. Not that I am a huge fan of peer review either; it sure doesn’t catch most of the wrongness in my field.

    Etc. I would love to call out Basbøll on the “activeness” of the journal relative to peer control.

  11. Manuel says:

    Andrew, I’ll extend your argument beyond the social sciences. As a biologist, I see similar reluctance to develop explicit protocols to handling criticisms/corrections or replication in my field. In addition to the problem with publishing c/c or replication studies in journals, the funding model at federal agencies works against replication with its consistent emphasis that proposed research be . Perhaps we need a focused effort to re-define elegant replication as one form of acceptable novelty in grant proposals. As it stands now, we are thoroughly disincentivized from replicating research by the funding models in play.

  12. Rademaker says:

    Excell errors happen in both social sciences and the technical sciences; there isn’t anything that should make them any less easy to detect in the former area than in the latter. So it doesn’t seem right to attribute the problem to social science in principle, even though a contingent problem with such things in establishment social science may exist to a greater extent than one exists in physics, etc.

  13. [...] you are the easiest person to fool.” But the whole thing is well worth your time (HT to a commenter on Andrew Gelman’s [...]

  14. [...] pessimist and cautionary tale is from Andrew Gelman, who is my go-to blogger  for all things [...]