In politics we’re familiar with the non-apology apology (well described in Wikipedia as “a statement that has the form of an apology but does not express the expected contrition”). Here’s the scientific equivalent: the non-retraction retraction.

Sanjay Srivastava points to an amusing yet barfable story of a pair of researchers who (inadvertently, I assume) made a data coding error and were eventually moved to issue a correction notice, but even then refused to fully admit their error. As Srivastava puts it, the story “ended up with Lew [Goldberg] and colleagues [Kibeom Lee and Michael Ashton] publishing a comment on an erratum – the only time I’ve ever heard of that happening in a scientific journal.”

From the comment on the erratum:

In their “erratum and addendum,” Anderson and Ones (this issue) explained that we had brought their attention to the “potential” of a “possible” misalignment and described the results computed from re-aligned data as being based on a “post-hoc” analysis of a “proposed” data shift. That is, Anderson and Ones did not plainly describe the mismatch problem as an error; instead they presented the new results merely as an alternative, supplementary reanalysis.

And here’s the unusual rejoinder to the comment on the correction. It’s pretty annoying that, even to the end, they refuse to admit their mistake, instead referring to “clerical errors as those alleged by Goldberg et al.” and concluding:

When any call is made for the retraction of two peer-reviewed and published articles, the onus of proof is on the claimant and the duty of scientific care and caution is manifestly high. Yet, Goldberg et al. (2008) have offered only circumstantial and superficial explanations . . . As detailed above, Goldberg et al. do not and cannot provide irrefutable proof of the alleged clerical errors. To call for the retraction of peer-reviewed, published papers on the basis of alleged clerical errors in data handling is sanctimoniously misguided. We continue to stand by the analyses, findings and conclusions reported in our earlier publications.

That’s the best they can do: “no irrefutable proof”?? That’s like something the killer says in the last act of a Columbo episode, right before the detective tricks him into giving himself away. Once you say “no irrefutable proof,” you’ve already effectively admitted that you did it. And in science, that should be enough.

By the way, here’s the “sanctimonious” graph from Goldberg et al. featuring the “no irrefutable proof”:

It’s unscientific behavior not to admit error.

One of the implications of the rejoinder is that peer review is infallible–or at least close to it. While I am a big supporter of the existence of peer review, I find it slightly annoying when people treat refereed papers as infinitely more secure and unassailable than unrefereed papers, as if the referee has a magic ability to locate all possible paper flaws. This (erroneous) view of refereeing is often the basis of arguments against publication on the arXiv and other open-science initiatives, when in fact those open-science initiatives

increasethe probability that errors get caught and wrong results get corrected before entering the refereed literature. And etc. Sorry, I should [insert joke here] but that rejoinder was so annoying!My one grudge with arxiv and similar repositories is that they almost go out of the was to make themselves “flat”. i.e. One axiv paper is similar to all others.

There ought to be a way for public peer-review or rating. arxiv increases the probability of errors being caught no doubt, but there’s no good way of knowing a well-vetted paper from crap that no-one has read.

Rahul: Agreed!

Wow. Sanctimonious it may be… but it also nigh-on irrefutable. An interesting example of the use of graphics, Andrew. It was puzzling over the graph that made me go read the text of the rejoinder.

Can someone explain the argument a bit more? I can understand intuitively why a mis-aligned dataset will give low correlations but why is “standard deviation of those cross-inventory correlations” the metric of interest?

Pardon my ignorance.

A misaligned dataset gives low correlations. Thus, if there is expected to be a relatively high correlation, it will only appear in the correctly aligned dataset. Exactly one possible alignment has the high correlation, and it wasn’t the one used in the original study. QED. (Note that if none of the alignments had high correlations, then the original author’s conclusions would have essentially been correct even given misalignment, though by chance.)

I guess my question (probably naive) was why not plot the average correlation of each alignment? Why plot the Standard deviation?

Shouldn’t the correlation of every mis-alignment be close to zero?

Rahul: Anderson and Ones (2003) and Ones and Anderson (2003) presented results showing that three separate personality inventories that ostensibly measured the same traits showed only modest correlations where they were expected, among other results. The authors interpreted this finding as resulting from different test authors’ divergent conceptualizations of the traits being measured–what is meant by “Conscientiousness” differs for the three inventories under examination, for example. While this was a novel result in 2002, several studies since then have similarly shown that there is relatively poor convergent validity across different inventories (Connelly & Ones, 2007; Ones & Connelly, 2007; much of Lew Goldberg’s work on the International Personality Item Pool and the Eugene Springfield Community Sample). The authors of the three inventories Anderson and Ones examined, obviously, were not pleased with the results. They claimed that such results could only have resulted from a clerical error that mis-matched respondent scores with the scales they were supposed to measure. They proceeded to consecutively relabel the scores of the tests until they found a labeling scheme that gave them the results they preferred. Their argument for the graph is that misalignment of the scores and labels would decrease correlations to very low values. If all of the correlations are low, then their standard deviation will also be small. If there are some larger correlations (e.g. if all of the Extraversion scales are highly correlated, while other scales do not correlate with them), then the standard deviation of the correlations would be larger. Goldberg, Lee, and Ashton assert that, because they could find a single arrangement of scores that resulted in their expected results, Anderson and Ones’ not finding these results must have resulted from the data being entered improperly.

I think their argument holds no water. The “non-retraction retraction” Anderson and Ones give should more correctly being understood as a polite “non-retraction”. In their addendum, Anderson and Ones fairly clearly state that they reject the accusation that their data were entered improperly. Given that a team of several research assistants and the authors entered and checked the data many times, I would be inclined to believe that statement. In their addendum, they acknowledge that, if you systematically relabel the data, you will eventually find a correlation matrix that more resembles the expected relationships. They don’t however condone this practice, as it amounts to changing one’s data to fit one’s hypotheses. In their addendum, Anderson and Ones are calling out the poor practice evident in Goldberg et al.’s commentary. There are stating that it is inappropriate to hold to one’s hypothesis when the data say otherwise. The evidence that Goldberg et al. present does not prove that data were misaligned, and given the conscientious reputations of the authors, such a possibility is unlikely.

I think this post is incorrectly identifying the research sins. It’s unscientific behavior to suppress results that disagree with your theory and change data to match ones hypothesis. Refusing to let others do so is not.

Brenton: did you see the graph? While it’s clearly true that “if you sytematically relabel the data you will eventually find a correlation matrix that more closely resembles the expected relationships” the graph clearly and unambiguously rejects the “finding what you’re looking for” hypothesis. There is an obvious (and irrefutable) null that all but one of the realignments must have expected near-zero correlations. Thus, it is finding one result many sd away from all the others, and none intermediate, that refutes that particular theory, or so it would seem to me.

Thanks Brenton!

So what was the erratum that Anderson and Ones did issue?

I am stunned that someone would write what Brenton writes above: “They don’t however condone this practice, as it amounts to changing one’s data to fit one’s hypotheses.” Damn straight you fix your data when someone points out a coding error!

This is the only comment that I fully understand in the whole conversation!

Your argument that the critics were fishing would hold more water if the graph they present had a whole normal distribution of values and they picked the highest one but it doesn’t. The realignment they suggest is far and away the best one.

As for the idea that because a whole team of people checked it, it can’t be wrong, that is rather naive. What probably happened, in my experience, is that the first guy entered the data and the rest of the people checked their analyses as they went along but they checked it against the original data, never checking the original data entry because they might not even know who did it in the first place.

I completely agree. Here is some R code to illustrate:

#———————————–

sigma <- 20

num=500

x <- rnorm(num,0,10)

y <- x + rnorm(num,0,sigma)

cors <- rep(NA,num)

for (i in 1:num){cors[i] <- cor(x,y[c(i:num,1:(i-1))[1:num]])}

plot(cors,type="l")

points(cors[1],pch=15)

#———————————–

You can vary sigma to see how the results change. Obviously just a simple correlation between 2 variables here but I think it illustrates the problem.

I’m shocked that Andrew posted the graph without commenting on the fact that the y-axis does non-sensically reach down to negative numbers (beyond the niggling, though, I agree with Jonathan – that’s one very effective graph…)

It’s good to see that this story is becoming more widely known.

When Anderson and Ones wrote a reply to our comment, we weren’t pleased with it, and we sent a response to the journal.

However, the editor (who had been very reluctant to publish anything about this matter) rejected it.

Here is a link to that still-unpublished response. One challenge in writing it was to find out the p-value associated with z = 18.6 (see the graph above). It’s p < 10^-76. We think that's pretty close to "irrefutable proof".

The unpublished rejoinder can be downloaded from http://people.ucalgary.ca/~kibeom/Anderson%20Ones/AndersonOnes.html