Here’s what I wrote:
The general idea of gathering comprehensive data seems reasonable to me. I’ve often made the point that careful data collection and measurement are important. Data analysis is the glamour boy of statistics, but you can’t do much if your data are no good. Regarding your other question: I prefer not to use the pejorative term “fishing”; rather, I’d say it’s great that they’re gathering a large dataset that will allow them to make discoveries and formulate hypotheses that can always be tested on new data. In general, hypothesis testing is overrated and hypothesis generation is underrated, so it’s fine for these data to be collected with exploration in mind.
Finally, I’m skeptical about their claims that with “big data” they will be able to somehow resolve causal issues. For example there’s this quote: “while correlations have been found between retirement and cognitive decline, it is not known whether this is due to retirees’ lower levels of mental engagement, reverse causation whereby cognitive decline induces retirement, or the resulting reduction in social contact. . . . The inability to resolve issues of causation reflects, to put it bluntly, a data limitation.” I disagree and I think it’s naive of them to think that by gathering a bunch of observational data, that they’ll be able to answer such questions. To answer such causal questions you need experimental data or quasi-experiments. Sure, some genetic inputs can be taken as approximately equivalent to randomization, so there’s some leverage you can get from such large-scale observational studies, but ultimately if you want to understand the effects of a reduction in social contact, you have to experiment on this or observe some natural experiment. Getting a bit pile of data can help you formulate some hypotheses, but I do think it’s naive, or maybe just hype-y, to attribute inability to resolve issues of causation to a “data limitation.”
It’s too bad they had to spoil this interesting project with all that hype. I’m doing a little fight against the hype by waiting until February to post my thoughts on this big big October news story.
P.S. Hannah Bayer, chief scientist of the project, gives some details here on their plans for data sharing.