Skip to content

They want help designing a crowdsourcing data analysis project

Michael Feldman writes:

My collaborators and myself are doing research where we try to understand the reasons for the variability in data analysis (“the garden of forking paths”). Our goal is to understand the reasons why scientists make different decisions regarding their analyses and in doing so reach different results.

In a project called “Crowdsourcing data analysis: Gender, status, and science”, we have recruited a large group of independent analysts to test the same hypotheses on the same dataset using a platform we developed.

The platform is essentially Rstudio running online with few additions:

· We record all executed commands even if they are not in the final code

· We ask analysts to explain these commands by creating semantic blocks explaining the rationale and alternatives

· We allow analysts to create graphical workflow of their work using these blocks and by restructuring them

You can find the more complete experiment description here. Also a short video tutorial of the platform.

Of course this experiment is not covering all considerations that might lead to variability (e.g. R users might differ from Python users), but we believe it is a step towards better understanding how defensible, yet subjective analytic choices may shape research results. The experiment is still running but we are likely to receive about 40-60 submissions of code, logs, comments, and explanations of decisions made. We are also collecting various information about analysts like their background, methods they usually use and the way they operationalized the hypotheses.

Our current plan is to analyze the data from this crowdsourced project using inductive coding by splitting participants into groups that reached similar results (effect size and direction). We then plan to identify factors that can explain various decisions as well as explain the similarities between participants.

We would love to receive any feedback and suggestions from readers of your blog regarding our planned approach to account for variability in results across different analysts.

If anyone has suggestions, feel free to respond in the comments.


  1. This is the kind of experiment we very much need. I wish you the best of luck in getting a lot of responses and giving us insight about variation in approaches. I have been concerned for many years about the variance of statisticians.

  2. Anoneuoid says:

    Hypothesis 1: “A woman’s tendency to participate actively in the conversation correlates positively with the number of females in the discussion.”
    Hypothesis 2: “Higher status participants are more verbose than are lower status participants.”

    How about a better hypothesis? I personally would not even bother with “testing” these positive/negative correlation ones.

    Something like: “The word frequency distribution in this dataset follows Zipf’s law”.

    Replace zipfs law with whatever “laws” are known about gender and status (if there are none, then just explore the data to look for universalities rather than testing a vague, meaningless hypothesis).

    • Mason says:

      That’s not helpful, Anoneuoid. The goal is to see sources of variability, given a basic prompt. It isn’t to understand the relationship between gender and status. Its to understand how one goes from hypothesis to conclusion, given the same data. I wouldn’t be surprised if there were 40 to fifty unique solutions.

      • Anoneuoid says:

        It isn’t to understand the relationship between gender and status. Its to understand how one goes from hypothesis to conclusion, given the same data.

        Then it’d be good to include the (precise) Zipf’s law hypothesis, you don’t even need to modify it. I bet the process used to analyze data and the results will be very different than for the two (vague) NHST hypotheses.

  3. Elin says:

    I actually am not so sure that I think that sorting initially by similarity of results is going to be the best way to go. It’s a version of sampling on the outcome. I think it might make sense to code certain things like whether graphics were used, how much (if any) recoding was done (and when), what kinds of models, …. but most of all I’d like to see some kind of sequential analysis and what kinds of patterns there are in that.

  4. D Kane says:

    I tried my best to participate in this project. But, IMHO, it is already a failure because it makes honest participation essentially impossible.

    The main flaw is the authors’ refusal/inability to allow participants to record their activities in a sensible way.

    If other readers were able to participate, I would appreciate hearing about your experience.

    • Martha (Smith) says:

      Please elaborate on your sentence “The main flaw is the authors’ refusal/inability to allow participants to record their activities in a sensible way.” Specific details would be more informative than this bottom-line sentence.

      • I just completed my analysis, and I agree that the platform used for the analysis (access to an online version R Studio through a server adapted to prompt / require analysts to “log” the steps they carried out) was problematic in some ways. That said, I’m not sure that they either refused or made it difficult to do the analysis in a sensible way. I analyzed the way I wanted to, and only logged the key steps I took.

  5. D Kane says:

    Perhaps they have changed this since I tried three weeks, or I could be an outlier in my behavior, but:

    1) I work with small data sets like this by writing a script line-by-line, and regularly re-running the entire script. With small data sets, this works instantaneously and avoids errors of having objects floating around the workspace. I generally re-run the script after adding each line, usually by just hitting cntrl-shift-enter, or, sometimes, by going through each line.

    2) This means that all those lines are executed in R at the console.

    3) There system records each of those lines. And that is OK! Although it is redundant (and littered with errors as I go along), it is an accurate record of everything that I did.

    4) But it also generates hundreds/thousands of lines. Again, that is OK! I really did instruct R to evaluate all those commands.

    5) The problem is that there system requires you, after every 30 evaluated lines, to fill out a bunch of forms about what you are doing and why. But, most of the time, what I am doing is just re-running the script I just ran a moment before, but with a new line added! There is no there there.

    6) If they provided a way to easily edit the record and only keep meaningful lines, and then, 10 or so times, answer questions about those key code chunks, then that would be fine. But I could find no way to do that.

    JR writes: “only logged the key steps I took.” Perhaps I missed this option. How did you do that? I got a prompt, every 20 or 30 lines, requiring me to log stuff.

Leave a Reply