Josh Cherry writes:

This isn’t in the social sciences, but it’s an egregious example of statistical malpractice:

Below the abstract you can find my [Cherry’s] comment on the problem, which was submitted as a letter to the journal, but rejected on the grounds that the issue does not affect the main conclusions of the article (sheesh!). These folks had variables with Spearman correlations ranging from 0.02 to 0.07, but they report “strong correlations” (0.74-0.96) that they obtain by binning and averaging, essentially averaging away unexplained variance. This sort of thing has been done in other fields as well.

The paper in question, by A. Diament, R. Y. Pinter, and T. Tuller, is called, “Three-dimensional eukaryotic genomic organization is strongly correlated with codon usage expression and function.” I don’t know from eukaryotic genomic organization, nor have I ever heard of “codon”—maybe I’m using the stuff all the time without realizing it!—but I have heard of “strongly correlated.” Actually, in the abstract of the paper it gets upgraded to “very strongly correlated.”

In the months since Cherry sent this to me, more comments have appeared at the above Pubmed commons link, including this by Joshua Plotkin which shares his original referee report with Premal Shah from 2014 recommending rejection of the paper. Key comment:

Every single correlation reported in the paper is based on binned data. Although it is sometimes appropriate to bin the data for visualization purposes, it is entirely without merit to report correlation coefficients (and associated p-values) on binned data . . . Based on their own figures 3D and S2A, it seems clear that their results either have very small effect or do not at hold at all when analyzing the actual raw data.

And this:

Moreover, the correlation coefficients reported in most of their plots make no sense whatsoever. For instance, in Fig1B, the best-fit regression line of CUBS vs PPI barely passes through the bulk of the data, and yet the authors report a perfect correlation of R=1.

A follow-up comment by Plotkin has some numbers:

In the paper by Diament et al 2014, the authors never reported the actual correlation (r = 0.022) between two genomic measurements; instead they reported correlations on binned data (r = 0.86).

I think we can all agree that .022 is a low correlation and .86 is a high correlation.

But then there’s this from Tuller:

In Shah P, 2013 Plotkin & Shah report in the abstract a correlation which is in fact very weak (according to their definitions here), r = 0.12, without controlling for relevant additional fundamental variables, and include a figure of binned values related to this correlation. This correlation (0.12) is reported in their study as “a strong positive correlation”.

So now I’m thinking that everyone in this field should just stop calling correlations high or low or strong or weak. Better just to report the damn number.

Tuller also writes:

If the number of points in a typical systems biology study is ~300, the number of points analyzed in our study is 1,230,000-fold higher (!); a priori, a researcher with some minimal experience in the field should not expect to see similar levels of correlations in the two cases. Everyone also knows that increasing the number of points, specifically when dealing with non trivial NGS data, also tends to very significantly decrease the correlation.

Huh? I have no idea what they’re talking about here.

But, in all seriousness, it sounds to me like all these researchers should stop talking about correlation. If you have a measure that gets weaker and weaker as your sample size increases, that doesn’t seem like good science to me! I’m glad that Cherry put in the effort to fight this one.