Following our recent discussion of p-values, Anne commented:
We use p-values for something different: setting detection thresholds for pulsar searches. If you’re looking at, say, a million independent Fourier frequencies, and you want to bring up an expected one for further study, you look for a power high enough that its p-value is less than one in a million. (Similarly if you’re adding multiple harmonics, coherently or incoherently, though counting your “number of trials” becomes more difficult.) I don’t know whether there’s another tool that can really do the job. (The low computing cost is also important, since in fact those million Fourier frequencies are multiplied by ten thousand dispersion measure trials and five thousand beams.)
That said, we don’t really use p-values: in practice, radio-frequency interference means we have no real grasp on the statistics of our problem. There are basically always many signals that are statistically significant but not real, so we rely on ad-hoc methods to try to manage the detection rates.
I don’t know anything about astronomy–just for example, I can’t remember which way the crescent moon curves in its different phases during the month–but I can offer some general statistical thoughts.
My sense is that p-values are not the best tool for this job. I recommend my paper with Jennifer and Masanao on multiple comparisons; you can also see my talks on the topic. (There’s even a video version where you can hear people laughing at my jokes!) Our general advice is to model the underlying effects rather than thinking of them as a million completely unrelated outcomes.
The idea is to get away from the whole sterile p-value/Bayes-factor math games and move toward statistical modeling.
Another idea that’s often effective is to select as subset of your million possibilities for screening and then analyze that subset more carefully. The work of Tian Zheng and Shaw-Hwa Lo on feature selection (see the Statistics category here) might be relevant for this purpose.