Jon Zelner writes:
Reproducibility is becoming more and more a part of the conversation when it comes to public health and social science research. . . .
But comparatively little has been said about another dimension of the reproducibility crisis, which is the difficulty of re-generating already-complete analyses using the exact same input data. But as far as I can tell, the ability to do this is a necessary precondition to the new-data replication. . . .
But I think the ease with which we can re-generate complete analyses, on our own computers and those of others, plays directly into the bigger questions of openness and integrity that underlie some of the challenges to reproducibility.
Many of you will have experienced the shiver of fear that comes from reviewer comments suggesting that a group of cases should or should not have been dropped, or that a variable should have been coded in a different way.
My first reaction in such situations has historically involved a slightly queasy feeling as I imagine laboriously stepping through each of the downstream things that has to happen (re-run models, re-generate figures, re-construct tables!) all as a result of a small modification of the way input data were cleaned or transformed.
The friction involved in making these changes increases the incentive to cut corners and to not take potentially useful feedback seriously. It also makes it difficult to do incorporate new data as it becomes available, perform sensitivity analysis by re-running the analysis on perturbed datasets, etc. . . . we end up with finished papers backed by a morass of spaghetti code that we hope to never to have to run again.
That is soooo true.
Zelner then gets into details:
So, all of the elements of reproducibility I will discuss over the next series of posts are collected here, which is a git repository demonstrating a toy example of a project using R and Stan that can be fully replicated.
Specifically, it’s focused on fitting a Gaussian finite mixture model to simulated data. Here’s what the output should look like.
I picked R and Stan for this because they are what I live and breathe in my day-to-day research. . . .
And this is what Zelner has so far:
Part 1: Your R script is a program!
Part 2: Makefiles for fun and profit
Part 3: Knotes on Knitr
P.S. As indicated in the title above, I like the term “workflow” for this sort of thing.