Insecure researchers aren’t sharing their data

Jelte Wicherts writes:

I thought you might be interested in reading this paper that is to appear this week in PLoS ONE.

In it we [Wicherts, Marjan Bakker, and Dylan Molenaar] show that the willingness to share data from published psychological research is associated both with “the strength of the evidence” (against H0) and the prevalence of errors in the reporting of p-values.

The issue of data archiving will likely be put on the agenda of granting bodies and the APA/APS because of what Diederik Stapel did.

I hate hate hate hate hate when people don’t share their data. In fact, that’s the subject of my very first column on ethics for Chance magazine. I have a story from 22 years ago, when I contacted some scientists and showed them how I could reanalyze their data more efficiently (based on a preliminary analysis of their published summary statistics). They seemed to feel threatened by the suggestion and refused to send me their raw data. (It was an animal experiment, so no issues with confidentiality etc., and it was a government lab, so it had nothing to do with trade secrets or proprietary data.)

P.S. Regular readers will not be surprised to find that I hate the pie charts, dislike the bar chart, and absolutely detest the tables in the above-linked article. It’s great work but I think the paper would’ve been improved through a more careful graphical presentation in which the authors thought a bit about their goals in presenting the numbers and their comparisons of interest.

32 thoughts on “Insecure researchers aren’t sharing their data

      • Indeed, and I also learned recently that a single Y-axis is best; ggplot taught me that.

        Let’s hope that the Stapel incident will encourage more data-sharing (IRB concerns notwithstanding). Not having much experience with IRBs (and it’s been a fair number of years since I took a test for Columbia’s IRB), I don’t know how much more effort it’ll take to maintain the confidentiality of shared data but should not a protocol have already been drafted and put into place for the study to be approved in the first place? I think if IRBs were to impose a requirement that all submissions detail a data-sharing protocol (e.g. how much data to be shared, what timeframe before the data will be made public (I wonder how much raw data is sitting in file drawers just begging to be cleaned and analyzed), a mandatory citation to the original research study, etc.), that would help jump-start the movement.

        But enough about that, back to ggplot2 and (trying to make) elegant graphics for data analysis.

  1. It’s unfortunate that IRB concerns are not mentioned, they are a common reason for not being able to share data. (Though these concerns may not apply to the studies in the paper)

    • Good point, but none of these authors failed to share data because of IRB concerns and/or because the data were part of ongoing study. That’s the main reason we chose these journals rather than the other two in the original request study.

      • Yes, I saw that bit – and I liked the paper.

        My concern comes from experience with researchers from areas where data-sharing is free of IRB concerns; when some of them cross over into other areas, legal (and legitimate) non-disclosure problems get confounded with the unhelpfulness your paper illustrates. The debates can quickly get testy and unproductive; mentioning IRB concerns, even briefly, can be a helpful prophylactic.

    • Tom:

      Replotting the data would take effort which could otherwise be directed to blogging! It would be an excellent project for a student with some free time, however.

    • Tufty advertises HSI, written by Andrew Montford, aka Bishop Hill, who seems to have some identity confusion:
      (“Bishop Hill is not a bishop. He’s not actually called Hill either. He is an Englishman who lives in rural Scotland.”)

      I will hereafter use an accurate label, He Who Quotes Dog Astrology Journal, or HWQDAJ (later).*

      Since I study anti-science, I collect books like this, yet another channeling of Steve McIntyre. Having written a SSWR, 250 page analysis of the 91-page Wegman Report, and finding it riddled with errors, distortions, biases, mis-uses of statistics and other problems, I wasn’t going to bother doing the same with the 482-page HSI, especially given the relative unimportance. Life is short. But, I noticed a few items that might help people assess the credibility of HWQDAJ and his HSI.

      1) HWQDAJ is fond of the Wegman Report and all around it, see Chapter 9, but compare with SSWR’s. Among other things, p.254 adapts Wegman’s social network analysis, which used plagiarized text and incompetent SNA. See Strange Tales and Emails. HWQDAJ didn’t know that it was incompetent, just repeated it. I knew it was junk, but just to make sure, contacted 3 SNA experts, 2 of whom are quoted (and the 3rd agreed). This is the material re-used in Said, Wegman, et al (2008), whose retraction was forced by Elsevier earlier this year. Relying on Wegman’s credibility may turn out not to have been optimal.

      2) HSI p.153 shows the “Twelve hockey sticks” from McIntyre (with related Figure 4.4 in Wegman Report). This used graphs generated with wrong parameters, and then a 100:1 cherry-pick to find the most positive hockey-stick-like charts, all irrelevant anyway, given the relatively small fraction of data for which PCA was used. Deep Climate analyzed this in some detail last year and I’ve looked at the code, too. Statisticians might check this.

      3) But why HWQDAJ?

      The Journal of Scientific Exploration (JSE) is not Science magazine, although some try to confuse the two. It really has published An Empirical Study of Some Astrological Factors in Relation to Dog Behaviour Differences by Statistical Analysis and Compared with Human Characteristics. That has statistics! It has p-values! It does permutation tests on planets!

      HWQDAJ relied on an article from JSE (sort of) as a major theme element in HSI. (I say sort of, because HSI cites the source, but links to a copy of it at Fred Singer’s website. It really is a good idea to check the real source. Real scholars do.

      The key article was a book review by Petroleum geophysicist David Deming (read this) of Michael Crichton’s fiction book State of Fear.** See description of That issue of JSE: crop circles (a good debunk, actually), reincarnation of Japanese soldiers as Myanmar children, sadness at the inability to get UFO articles into real journals, and a lament for the passing of PEAR (sadly, they did so much statistics.)

      This is terrific stuff! What a credible journal! But, all this was used earlier by McIntyre & McKitrick, in a talk that formed the blueprint for the Wegman Report, so must have been OK. Sadly, Deming never brought forth evidence of his email claim, not even when Testifying for Inhofe. Perhaps fact that lieing to Congress is a felony, 18USC1001, may have been relevant.

      But then, HWQDAJ goes off into (falsification OR inability to read simple English) for this key theme of the book.
      See details in Archived discussion from Wikipedia, “HSI pp.23-30, 421 … dog astrology.

      A fierce battle had long raged over HSI’s Wikipedia page. HSI fans fought to include every positive review, from such folks as local business writers and minimize negative reviews. The *discussion* page averaged ~20 comments per day.

      I needed quick break from writing SSWR, so for fun I wrote this. A stunned silence ensued for a day, then people started inventing rules to try to *remove* the comment from the discussion page, a non-no. No one actually seemed willing to refute it or even discuss it seriously, but they kept trying to delete it. Finally, it got archived away, but Wikipedia keeps complete histories. It was entertaining to watch.

      Thank Tufty (a commenter at Bishop Hill) for reminding me, as this was an instructive episode, filled with anti-science, bad statistics and offering interestig data on Internet behavior and intense willingness to believe, no matter what.

      * A Harry Potter film had recently appeared.
      ** OK, James Schlesinger liked Crichton’s book, also. See Weird Anti-Science, A.5 He’s on Sandia’s Board as well and quoted Crichton often.

      • Wow. I didn’t expect such a rabid attack. But then, I’ve just looked up who John Mashey is and it now makes sense. Has anyone who actually knows about statistics and is not totally barking mad taken a serious look at the hockey stick and come to a conclusion?

  2. Ironically, I’ve had trouble with PLoS itself in terms of backing up their data sharing policies. I asked the authors of the paper “Non-Visual Effects of Light on Melatonin, Alertness and Cognitive Performance: Can Blue-Enriched Light Keep Us Alert?” (http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0016429) for their data, and at first they said no. After I informed them that they were bound by PLoS:One policy to share, they stalled and eventually ceased responding to my emails. I eventually emailed the journal itself, which indeed intervened and sent me some data. Unfortunately the data was useless because it lacked critical condition information. I emailed PLoS informing them that the data was insufficient, but received no reply. Several re-sends later I have still not received any response from PLoS.

  3. How is the method of data collection in this article substantively different from the ruse employed by Akinola and Milkman in the paper on discrimination by university professors? You strongly objected to that survey design, and it seems that most of your complaints (enumerated in a previous blog entry) apply just as well to this study. What gives?

      • I’d seen that section before posting my original question, of course. Even so, it’s not clear from your terse answer what you see as the critical difference between the two studies that makes this one “great” and Akinola and Milkman suited for “the trash heap” of “useless” research. My gut feeling is that you don’t actually make a distinction between the papers with respect to methods, but place a higher value on the subject matter of one. Neither team asked permission before wasting other researchers’ time, and neither provided compensation to the subjects. Wicherts et al. inconvenienced fewer people, but surely that’s not the deciding factor!

        If I’m wrong, feel free to elaborate on your one-sentence response. I’m genuinely interested in how you’re evaluating these projects. Before reading this blog entry, I thought that I understood.

        • I do think it would’ve been appropriate for them to compensate the researchers. To get back to the difference between the two studies: I agree that it’s all a matter of degree, no sharp lines, but to me the difference involves three factors: the effort of the authors of the paper, the effort required of the research subjects, and the potential gains from the research.

  4. Interesting paper. Since Jelte Wicherts seems to be reading these comments I’ll direct a couple of questions toward him…

    The supplemental analysis about power (Text S2) surprised me a bit. You say that researchers who shared their data did not have larger sample sizes that researchers who did not. If I’m understanding Figure 2 right, sharers had smaller p-values than non-sharers. Putting those two things together suggests that the papers where researchers shared their results had bigger effect sizes. Is that right? I’m not sure what to make of that.

    Also, did you look at the experience levels of the authors (e.g. students as corresponding author versus established researchers) or other author characteristics? For example, perhaps students are more likely to have changed email addresses (because they got jobs) and therefore not respond to requests for data; and perhaps students are also more likely to make mistakes. (Then again I could see it going the other way – established, powerful researchers might be more comfortable ignoring this kind of a request…)

  5. So… well, I’m not a scientist at all, but I’ve read enough about the philosophy of science to see how scientists think it’s supposed to work. What the aims are, what the mechanisms are. (Most scientists I’ve met seem to take a line through Popper, more or less.) And so I have a question.

    Why is it that sharing raw data alongside published research isn’t the usual, expected practice? (I mean in cases where there aren’t good reasons for confidentiality, of course. I don’t mean to say that the data needs to be carefully organized and labelled. Just throw it all in a zip file. How hard is that? Include the scraps of computer code you used to process the data. Without those the study is not reproducible. Duh.

    When I see an interesting study, if I want to reproduce the result or try a slightly different technique, I always have to ask—not just for the raw data, but also about details of the methodology. My experience is that the researchers are invariably super helpful and wonderful about it. I’ve never run across the reluctant bad-guy scientists described in the paper. Sometimes they even thank me for giving them a reason to go back and archive their data. And, you know… that’s great. But I always have to laugh to myself a little. Like, come on! It’s not 1820. We have good technology for sharing information now. Get your act together already!

    With just a little friendly self-policing by scientists, the situation described in this paper would be impossible in many fields.

    I don’t know that anyone even disagrees. It just seems like we’ve got a massive cultural failure of scientists to hold one another to a slightly higher standard of communication when communication technology improves.

    • Jason:

      I agree with you. I think the problem is that raw data are often a mess. A couple times I’ve published in poli sci journals that, as a matter of policy, require replication materials for all empirical papers. We did it, but it took effort, and I wouldn’t have done it if I hadn’t been required to. I’ve published hundreds of papers but have posted the data to very few of them.

  6. Tufty, the Hockey Stick is not an illusion! Mann made statistical mistakes when he created the original plot, but he got lucky in the sense that even when you do the stats correctly you get the same answer. Wikipedia has a decent write-up of it.

    • I don’t think Mann made any ‘mistakes’, I think they were quite deliberate. If you do the stats correctly, the first thing you notice, among many, is that the correlations between the data used and actual temperatures are very poor indeed. This is enough to discredit the whole thing, even before getting to the complicated stuff about short centering and principle component analysis. My original point, relating to the point of the article above, is that Mann did not disclose the data or methods used when the paper was published, which by itself makes it all a bit fishy. Only after several years of badgering by others did he partially give out the details. It’s all in the Montford book.

  7. This issue came up in the back-and-forth my co-author and I have had with the Millennium Villages Project (based at Columbia’s Earth Institute) over our published paper critiquing the MVP evaluation. Recently, we wrote this post:

    http://www.guardian.co.uk/global-development/poverty-matters/2011/oct/19/millennium-villages-project-proper-evaluation

    Here’s one paragraph:

    “The claims are also impossible to independently verify. A critical element of persuasive impact evaluation is that it is independent and transparent. An independent and transparent analysis of its data could make the MVP evaluation more persuasive. The MVP has told us, however, that it will only consider making data available to outside researchers after it has completed publishing all of its work on data collected through 2016. This suggests the MVP will not share any of the data it has collected until roughly 2020, 15 years after the project began.”

    Their response on this point is part of this post:
    http://blogs.millenniumpromise.org/index.php/2011/10/23/perspectives-on-monitoring-and-evaluation-in-the-african-millennium-villages/

    I’d be interested to get your perspective on this.

    • Gabriel:

      I know some of the people involved in the Millennium Village project but have not looked into the issue of their data sharing. I know there is controversy that they did not implement their interventions in the form of a controlled experiment, but I respect their position that the Millennium Village project is in essence a pilot study—what they’re trying to study is not the effectiveness of the interventions (which have already been shown to be effective) but rather they are trying to demonstrate the feasibility of implementation.

      Regarding data sharing, in your link, Paul Pronyk of the Millennium Village project justifies not sharing data on the following grounds:

      Intellectual investment: The basic principle here is that scientists who spend the years of investment in the fundraising, intervention design, data collection and analysis should be the first to analyze the results of their efforts. In academia, data is our currency – it is the fruit of innovation and exploration, and why many of us come to work in the morning. . . .

      Confidentiality and Institutional Review Boards

      Data archiving systems: Anyone who has worked with complex longitudinal data sets knows the amount of time, energy and resources required to transform a data set into something that can be a public resource. The MV is currently working with more than 1000 datasets. . . .

      I have very little sympathy for any of these three rationales; that said, I recognize the difficulty of dealing with the problems. For rationale #1, you have to convince piggy researchers to share their data, and it’s always easier to just let such people alone than to try to get something out of an unwilling participant. For rationale #2, what can I say? IRBs are horrible, and who wants to fight these professional academic infighters? For rationale #3, I’m sure that with a reasonable budget the data could be made available, but given the roadblocks coming from rationales 1 and 2, why spend the money? So I can see where they’re coming from on this, even though I deplore the obstacles that make it difficult for them to release the data.

      • Andrew,
        Thanks for the reply. I agree with your overall take, and this is something I may write about on the World Bank’s blog soon. The Bank is now in a big push for “open data” in general, and it would be good to focus some attention on the issue as it pertains to research in particular.

        On the IRB issue: I don’t actually understand what the problem is. In my experiences with IRBs at Berkeley and Columbia, I don’t recall the IRB insisting on any provision that data I collected would NOT be shared. I think the implicit understanding was that someone else coming along who would want to use the data would have to apply to their IRB for an OK. To the extent that the (original) IRB would have concerns, I would think this could be dealt with by the IRB application explicitly laying out procedures for sharing an anonymized version of the data. I know experiences with IRBs vary–certainly my experiences have been maddening encounters with bureaucracy–so maybe IRBs have genuinely created a problem for data sharing. Has that been your experience?

        • Gabriel:

          I don’t know about the details here. In my experience the IRB has been such a paperwork monster that it’s very existence can deter people from doing research on a topic. My colleagues with a lot of NIH experience seem to be able to handle IRBs but they’re too much for me.

        • Andrew thanks very much for this thoughtful exchange. I’m a longtime fan of your work and I have the highest respect for your views. You make several points about data availability and IRBs with which I agree strongly.

          I do want to clarify one thing: It’s not the case that the Millennium Villages Project sees itself simply as a pilot study to demonstrate the feasibility of implementation. The project has made countless claims of measured, quantitative “impacts”, “results”, and “achievements”. It claims to have caused precisely quantified changes in school enrollment, vaccination rates, malaria prevalence, crop yields, cell phone ownership, and numerous other outcomes in the villages. Those claims appear here and here, among other places. Both of those reports are prominently labeled as products of Columbia University. The Columbia researchers running the project have not retracted or adjusted any of those strong quantitative claims, insisting that all of them are fully backed by “peer-reviewed science“.

          Yes, the project has issued statements to the effect that it isn’t trying to achieve specific impacts, it’s just trying to demonstrate feasibility of service delivery. Those statements are not possible to square with the project’s other statements regarding its impacts. It’s like claiming to my funders: “My conversation with Andrew Gelman’s dean caused Andrew’s salary to rise by $93,587”, and when you ask me for evidence of my strong quantitative claim, I turn around and say “I was never trying to specifically raise Andrew’s salary, I was just trying to demonstrate the feasibility of conversing with the dean.” If that’s true then why did I make my first statement, and why do I steadfastly refuse to retract or modify it?

          While doublespeak is common in the political world of governments and NGOs, research products of Columbia should be held to a higher standard. As the child of a former Columbia professor, I have the highest respect for the disinterested and transparent search for truth that the institution represents.

  8. I’m struck by how much more energy has been devoted over the years to denouncing a book based on publicly available data, the National Longitudinal Study of Youth, 1979 — namely “The Bell Curve” — than to looking into claims based on nonpublic data. You might almost think that political correctness has something to do with it.

  9. 1) It seems there are at least 3 separate issues here:
    a) Data collection/quality
    b) Statistical analysis
    c) Presentation

    2) I do wonder at the extent to which this varies by field. Sciences like physics get to deal with identical fundamental entities and also have all sorts of limits and conservation laws that quickly identify bad results. [While it would be delightful if neutrinos turned out to be faster than light, I’d doubt it.] The social sciences have a much harder task, given that there is no “Standard Model” for humans and it is all too easy to miss confounders. [How many resutls are based on college udnergrads? Are they representative? :-)]

    3) This does show up elsewhere. Computer benchmarking long used proprietary codes (for lack of anything better) and awful metrics like Dhrystones. Everybody used mixtures of different tests. In 1988, we had to create the SPEC to improve this, developing standard tests and requiring that people report *all* the results, not just ones they liked.
    They also had to report hardware and software configurations, flags, etc. Anyway, this formalized an informal behavior in which performance analysts often traded code and data across rival companies, behind their marketeers’ backs.

    4) Of course, there is always a tradeoff between getting data and generating useful results, and the extent to which the data is not only archived, but usefully so, which may involve archiving the software and taking the extra effort to do good software engineering. All that isn’t necessarily easy, given the velocity of computing environments. At Bell Labs, there was a wide range of handling, as one expected pure researchers to generate results, and having them spend a lot of time software engineering and archiving probably didn’t make sense. At the other extreme, managing data and software engineering were crucial. I think John Chambers’ S at least in part grew out of the wish to not have researchers spending their time writing C code to do statistics.

  10. Pingback: More frustrations trying to replicate an analysis published in a reputable journal « Statistical Modeling, Causal Inference, and Social Science

Comments are closed.