Justin Kinney writes:
I wanted to let you know that the critique Mickey Atwal and I wrote regarding equitability and the maximal information coefficient has just been published.
We discussed this paper last year, under the heading, Too many MC’s not enough MIC’s, or What principles should govern attempts to summarize bivariate associations in large multivariate datasets?
Kinney and Atwal’s paper is interesting, with my only criticism being that in some places they seem to aim for what might not be possible. For example, they write that “mutual information is already widely believed to quantify dependencies without bias for relationships of one type or another,” which seems a bit vague to me. And later they write, “How to compute such an estimate that does not bias the resulting mutual information value remains an open problem,” which seems to me to miss the point in that unbiased statistical estimates are not generally possible and indeed are often not desirable.
Their criticisms of the MIC measure of Reshef et al. (see above link to that earlier post for background) may well be reasonable, but, again, there are points where they (Kinney and Atwal) may be missing some possibilities. For example, they write, “nonmonotonic relationships have systematically reduced MIC values relative to monotonic ones” and refer to this as a “bias.” But it seems to me that nonmonotonic relationships really are less predictable. Consider scatterplots A and B of the Kinney and Atwal paper. The two distributions have the same residual error sd(y|x), but in plot B (the nonmonotonic example) sd(x|y) is much bigger. Not that sd is necessarily the correct measure—in my earlier post, I asked what would be the appropriate measure of association between two variables whose scatterplot looked like a circle (that is, y = +/- sqrt(1-x^2)). More generally, I fear that Kinney and Atwal could be painting themselves in a corner if they are defining the strength of association between two variables in terms of the distribution of y given x. I’m not so much bothered by the asymmetry as by the implicit dismissal of any smoothness in x. One could, for example, consider a function where sd(y|x)=0, that is, y is a deterministic function of x, but in a really jumpy way with lots of big discontinuities going up and down. This to me would be a weaker association than a simple y=a+bx.
The two papers under discussion differ not just in their methods and assumptions but in their focus. My impression of Reshef et al. was that they were interested in quickly summarizing pairwise relations in large sets of variables. In contrast, Kinney and Atwal focus on getting an efficient measure of mutual information for a single pair of variables. I suppose that Kinney and Atwal could apply their method to a larger structure in the manner of Reshef et al., and I’d be interested in seeing how it looks.
I’d also be interested in a discussion of the idea that the measure of dependence can depend on the scale of discretization, as discussed in my earlier post.
In any case, lots of good stuff here, and I imagine that different measures of dependence could be useful for different purposes.