An experimental psychologist was wondering about the standards in that field for “acceptable reliability” (when looking at inter-rater reliability in coding data). He wondered, for example, if some variation on signal detectability theory might be applied to adjust for inter-rater differences in criteria for saying some code is present.
What about Cohen’s kappa? The psychologist wrote:
Cohen’s kappa does adjust for “guessing,” but its assumptions are not well motivated, perhaps not any more than adjustments for guessing versus the application of signal detectability theory where that can be applied. But one can’t do a straightforward application of signal detectability theory for reliability in that you don’t know whether the signal is present or not.
I think measurement issues are important but I don’t have enough experience in this area to answer the question without knowing more about the problem that this researcher is working on.
I’m posting it here because I imagine that some of the psychometricians out there might have some comments.