Data sharing update

Posted on December 13, 2011 10:27 PM by Andrew

Fred Oswald reports that Sian Beilock sent him sufficient amounts of raw data from her research study so allow him to answer his questions about the large effects that were observed. This sort of collegiality is central to the collective scientific enterprise.

The bad news is that IRB’s are still getting in the way. Beilock was very helpful but she had to work within the constraints of her IRB, which apparently advised her not to share data—even if de-identified—without getting lots more permissions.

Oswald writes:

It is a little concerning that the IRB bars the sharing of de-identified data, particularly in light of the specific guidelines of the journal Science, which appears to say that when you submit a study to the journal for publication, you are allowing for the sharing of de-identified data — unless you expressly say otherwise at the point that you submit the paper for consideration.

Again, I don’t blame Beilock and Ramirez—they appear to have been as helpful as can be given their IRB. I hope the journal’s rules will have some impact on IRB decisions. Maybe once the university realizes that they won’t be getting articles in Science, they’ll curb the IRB’s arbitrary exercises of power.

P.S. Russell makes a good comment:

I think the problem is not the IRB, but rather the terms of the consent agreement. The IRB is objecting not to the sharing of data, but rather to the fact that the participants in the study did not consent to be included in the database. I can understand that. I would participate in a study about political views that somebody might be using for a paper, but not necessarily one in which the database was published, because of the risks of identification described above and because if my political views were widely known it could affect my chances of tenure (i.e., somebody might vote against me for purely political reasons).

The cure is not to change IRBs, but rather to change our consent letters so that we talk about building the database in addition to the journal publications. This is what we need to raise the consciousness of researchers about.

8 thoughts on “Data sharing update”

Andrew on December 13, 2011 11:01 PM at 11:01 pm said:

I think some of this might stem from the possibility of re-identifying study participants. Given enough demographic covariates, it can be possible to map them to a person or small set of people, even just using publicly available databases, such as voter’s registration rolls. Bradley Malin at Vanderbilt (see http://dl.acm.org/citation.cfm?id=1882992.1883017&coll=DL&dl=GUIDE&CFID=73239050&CFTOKEN=59301272 for one such paper that uses the above logic) done some work in this. I doubt that it would be possible taking a quick look at the study referenced, but these are the sort of things that IRBs have to weigh.
- C Ryan King on December 14, 2011 11:39 AM at 11:39 am said:
  
  In genetics this is a huge problem. It’s easy to identify someone in a dataset with lots of SNPs, or even just SNP rates in several groups. Similar to clinical data, we’re having to look at trusted intermediaries to execute limited analyses rather than just shipping data around.
DavidC on December 14, 2011 12:36 AM at 12:36 am said:

I was also going to point out that anonymizing data can be harder than it seems. I think Netflix canceled plans to host a second round of their prediction competition because a few examples turned up of people being identified.

Some more examples: http://arstechnica.com/tech-policy/news/2009/09/your-secrets-live-online-in-databases-of-ruin.ars
freddy on December 14, 2011 3:03 AM at 3:03 am said:

Here’s a review article on the same problem;

http://www.i-journals.org/ss/viewarticle.php?id=74&layout=abstract

Saying exactly what is and is not de-identified data is hard, when masses of covariates have been measured; IRBs need help from statisticians to figure out what the risks actually are, it’s often not easy.
Alex on December 14, 2011 8:26 AM at 8:26 am said:

“Maybe once the university realizes that they won’t be getting articles in Science, they’ll curb the IRB’s arbitrary exercises of power.”
Yeah, but Science don’t actually require their authors to share data. They say they do, but try requesting data from some authors and see the success rate.
Bruce McCullough on December 14, 2011 8:32 AM at 8:32 am said:

Andrew,

While it may be true that “This sort of collegiality is central to the collective scientific enterprise.”, sharing of data and code should not at all depend on collegiality. Each journal should have a mechanism whereby anyone (e.g., an antagonistic colleague with a diametrically opposed theory) can get the data and code to vet it for errors, regardless of whether the researcher and the replicator “like” each other or not.

Regards,

Bruce
- Andrew on December 14, 2011 8:46 AM at 8:46 am said:
  
  Bruce:
  
  I agree. But I try to be collegial even with colleagues who have a diametrically opposed theory. In this case, I don’t think it was that the theories were so opposed, it was more of a suspicion about the magnitudes of reported results.
ralmond on December 14, 2011 1:46 PM at 1:46 pm said:

I think the problem is not the IRB, but rather the terms of the consent agreement. The IRB is objecting not to the sharing of data, but rather to the fact that the participants in the study did not consent to be included in the database. I can understand that. I would participate in a study about political views that somebody might be using for a paper, but not necessarily one in which the database was published, because of the risks of identification described above and because if my political views were widely known it could affect my chances of tenure (i.e., somebody might vote against me for purely political reasons).

The cure is not to change IRBs, but rather to change our consent letters so that we talk about building the database in addition to the journal publications. This is what we need to raise the consciousness of researchers about.

Comments are closed.