About 50 people pointed me to this press release or the underlying PPNAS research article, “Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates,” by Anders Eklund, Thomas Nichols, and Hans Knutsson, who write:
Functional MRI (fMRI) is 25 years old, yet surprisingly its most common statistical methods have not been validated using real data. Here, we used resting-state fMRI data from 499 healthy controls to conduct 3 million task group analyses. Using this null data with different experimental designs, we estimate the incidence of significant results. In theory, we should find 5% false positives (for a significance threshold of 5%), but instead we found that the most common software packages for fMRI analysis (SPM, FSL, AFNI) can result in false-positive rates of up to 70%. These results question the validity of some 40,000 fMRI studies and may have a large impact on the interpretation of neuroimaging results.
This is all fine (I got various emails with lines such as, “Finally, a PPNAS paper you’ll appreciate”), and I’m guessing it won’t surprise Vul, Harris, Winkielman, and Pashler one bit.
I continue to think that the false-positive, false-negative thing is a horrible way to look at something like brain activity, which is happening all over the place all the time. The paper discussed above looks like a valuable contribution and I hope people follow up by studying the consequences of these FMRI issues using continuous models.
But without a framework for identifying false-positives, how could I be 95% confident that your brain is actually capable of thought? ;p
If you want a slightly more accurate cat-fMRI picture, go through the slide show at http://www.simplyphysics.com/flying_objects.html until you find the one labeled ‘Tony Tiger’.
Andrew, please correct the spelling: Thomas Nichols (University of Warwick).
Tom gave a talk about this issue and the subsequent reactions at the CRiSM workshop in which you virtually took part.
I believe that this statement “These results question the validity of some 40,000 fMRI studies and may have a large impact on the interpretation of neuroimaging results”
has been edited in the original paper (see the correction) to read
“These results question the validity of a number of fMRI studies and may have a large impact on the interpretation of weakly significant neuroimaging results.”
I believe the reason is that relatively few of the 40,000 studies used the problematic methodology.
Indeed, see for example:
http://www.ohbmbrainmappingblog.com/blog/keep-calm-and-scan-on
Lion… cheatin…. hurtin…
https://youtu.be/qZVsGxa_Vzk?t=59s
I, on the other hand, love the big cat picture. A true-positive in brain response studies.
Sometimes I spend more time wondering where oh where Andrew finds the pictures the anything else (or sometimes what the link is to the topic). But I think this one takes the cake. How do you come up with that picture.
And maybe that explains the results – they only thought the scans were human scans!
Roy:
It’s simple, I just use Google to find the images.
Have you heard about the dead fish MRI?
Worth reading about how the authors tried to correct the article and how PPNAS initially rejected the errata:
http://blogs.warwick.ac.uk/nichols/entry/errata_for_cluster/
As well as the added bibliographic analysis they did:
http://blogs.warwick.ac.uk/nichols/entry/bibliometrics_of_cluster
As I understand it, the problem was that they were using a spatial smoother with a Gaussian tail when the data were in fact heavy-tailed. This would be handleable in stan by implementing a Gaussian Process with a covariance function that includes a “tail-heaviness” parameter (like the df parameter in student-t), then let the model do inference on this parameter.
I think the description “A bug in fMRI software” is unfortunate.
As I understand it problem isn’t that the software had errors in its implementation of the algorithms or that the algorithms didn’t produce the intended estimator. The problem is that the model was wrong.
>”The problem is that the model was wrong.”
Yep, this issue has been very mischaracterized, even by those who discovered it. It isn’t a bug, it isn’t a “false positive” problem. It is a “true positive” problem. The stats machinery correctly identified their null model as a poor fit for the data.
Anon:
Yup. The null model in this case is always wrong, except for that example where they scanned a dead fish. So the false positive rate is 0/0.
At the risk of self-promotion, along with a few others I put out a preprint (https://arxiv.org/abs/1608.01274) that argues that while in many cases it seems that family-wise error (FWE) control wasn’t achieved using parametric, random field theory (RFT), that FWE is perhaps an unnecessarily high bar to clear. We propose that control of false discovery rate is a more natural target, and introduce a nonparametric method to accomplish this. When we look at the same task data considered in this paper, we see that things are not quite as bleak as they appear (similar to what others, including the authors, have since suggested).
Daniel:
1. Self-promotion is fine on this blog!
2. I think the so-called familywise error rate is pretty much always useless; see this paper for further discussion of this point.