Skip to content

It’s Too Hard to Publish Criticisms and Obtain Data for Replication

Peter Swan writes:

The problem you allude to in the above reference and in your other papers on ethics is a broad and serious one. I and my students have attempted to replicate a number of top articles in the major finance journals. Either they cannot be replicated due to missing data or what might appear to be relatively minor improvements in methodology may either remove or sometimes reverse the findings. Almost invariably, the journal is reluctant publish a comment. Due to the introduction of a new journal, Critical Finance Review, by Ivo Welsh,, that insists on the provision of data/code and encourages the original authors to further comment, this poor outlook is improving in the finance discipline.

See for example: Gavin S. Smith and Peter L. Swan, Do concentrated institutional investors really reduce executive compensation whilst raising incentives?. Code, CFR 3-1, 49-83.

and the response:

Jay C. Hartzell and Laura T. Starks, Institutional Investors and Executive Compensation Redux: A Comment on “Do Concentrated Institutional Investors Really Reduce Executive Compensation Whilst Raising Incentives”, CFR 3-1, 85-97.

The model of criticism and rebuttal is fine, but it’s disturbing that the people criticized never seem to back down and say they were wrong. I don’t think people should always admit they’re wrong, because sometimes they’re not. But everybody makes mistakes, while the rate of admission of mistakes seems suspiciously low!

Sokal: “science is not merely a bag of clever tricks . . . Rather, the natural sciences are nothing more or less than one particular application — albeit an unusually successful one — of a more general rationalist worldview”

Alan Sokal writes:

We know perfectly well that our politicians (or at least some of them) lie to us; we take it for granted; we are inured to it. And that may be precisely the problem. Perhaps we have become so inured to political lies — so hard-headedly cynical — that we have lost our ability to become appropriately outraged. We have lost our ability to call a spade a spade, a lie a lie, a fraud a fraud. Instead we call it “spin”.

We have now travelled a long way from “science,” understood narrowly as physics, chemistry, biology and the like. But the whole point is that any such narrow definition of science is misguided. We live in a single real world; the administrative divisions used for convenience in our universities do not in fact correspond to any natural philosophical boundaries. It makes no sense to use one set of standards of evidence in physics, chemistry and biology, and then suddenly relax your standards when it comes to medicine, religion or politics. Lest this sound to you like a scientist’s imperialism, I want to stress that it is exactly the contrary. . . .

The bottom line is that science is not merely a bag of clever tricks that turn out to be useful in investigating some arcane questions about the inanimate and biological worlds. Rather, the natural sciences are nothing more or less than one particular application — albeit an unusually successful one — of a more general rationalist worldview, centered on the modest insistence that empirical claims must be substantiated by empirical evidence. [emphasis added]

Well put.

Sokal continues:

Conversely, the philosophical lessons learned from four centuries of work in the natural sciences can be of real value — if properly understood — in other domains of human life. Of course, I am not suggesting that historians or policy-makers should use exactly the same methods as physicists — that would be absurd. But neither do biologists use precisely the same methods as physicists; nor, for that matter, do biochemists use the same methods as ecologists, or solid-state physicists as elementary-particle physicists. The detailed methods of inquiry must of course be adapted to the subject matter at hand. What remains unchanged in all areas of life, however, is the underlying philosophy: namely, to constrain our theories as strongly as possible by empirical evidence, and to modify or reject those theories that fail to conform to the evidence. That is what I mean by the scientific worldview.

And then he discusses criticism:

The affirmative side of science, consisting of its well-verified claims about the physical and biological world, may be what first springs to mind when people think about “science”; but it is the critical and skeptical side of science that is the most profound, and the most intellectually subversive. The scientific worldview inevitably comes into conflict with all non-scientific modes of thought that make purportedly factual claims about the world.

He might also discuss certain pseudo-scientific modes of thought, those methods that follow various forms of science but which lack the elements of criticism. I’m thinking in particular of what we’ve been calling “Psychological Science”-style work in which a researcher manages to find a statistically significant p-value and uses this to make an affirmative claim about the world. This is not so much a “non-scientific mode of thought” as a scientific mode of thought that doesn’t work.

The Use of Sampling Weights in Bayesian Hierarchical Models for Small Area Estimation

All this discussion of plagiarism is leaving a bad taste in my mouth (or, I guess I should say, a bad feeling in my fingers, given that I’m expressing all this on the keyboard) so I wanted to close off the workweek with something more interesting.

I happened to come across the above-titled paper by Cici Chen, Thomas Lumley, and Jon Wakefield. I haven’t had a chance to read it in detail but these people know what they’re doing and so it seems like it could be worth a look.

And here’s some related work:

- On the applied side, this paper with Yair in the American Journal of Political Science from 2013, on deep interactions with MRP. In particular, take a look at the section on Accounting for Survey Weights on p. 765. I wonder how this relates to the Chen, Lumley, and Wakefield approach.

- From the more theoretical direction, this paper to with Yajuan and Natesh, to appear in Bayesian Analysis, on Bayesian nonparametric weighted sampling inference.

I think we, as a field, are getting closer on this problem but we’re still not quite there.

Defense by escalation

Basbøll has another post regarding some copying-without-attribution by the somewhat-famous academic entertainer Slavoj Zizek. In his post, Basbøll links to theologian and professor Adam Kotsko (cool: who knew there were still theologians out and about in academia?) who defends Zizek, in part on the grounds that Zizek’s critics were being too harsh. Kotsko writes of “another set of trumped-up complaints about [Zizek’s] supposed ‘self-plagiarism.’ Apparently he needs to write things fresh every single time he publishes, or else he’s doing something akin to the most serious ethical violation in academia.”

Now, my goal here is not to pick a fight with Kotsko, someone whom I’ve only heard of through Basbøll’s blog. But I do want to disagree with that above-quoted statement, because I see it as symptomatic of a more general problem in how people sometimes respond to criticism.

Here’s what I wrote on Basbøll’s blog:

I followed the link, and Kotsko characterizes plagiarism as “the most serious ethical violation in academia.”

I disagree. I think that making shit up or falsifying data is a more serious ethical violation. The two violations can go together, for example Karl Weick, by plagiarizing the Alps story, was then free to make shit up, in a way that he couldn’t have done so easily had he cited his source.

Beyond this, Kotsko seems to me to be doing something that I find very annoying: when someone defends himself, or a friend, from some criticism by first exaggerating the criticism (and perhaps characterizing it as an “accusation”) and then denying the larger claim.

Kotsko did this by taking concerns about Zizek’s misleading lack of attribution of quotes, and interpreting this as the position, “Apparently he needs to write things fresh every single time he publishes, or else he’s doing something akin to the most serious ethical violation in academia.” Nobody’s saying this (or, at least, you’re not saying this!) but now Kotsko can argue against it. (Remember, with plagiarism, it’s not about the copying, it’s about the attribution.)

I felt a similar feeling of frustration after Eric Loken and I raised methodological problems with that fecundity-and-clothing-color study, the authors of that study (Alec Beall and Jessica Tracy) responded that we “imply that [they] likely analyzed our results in all kinds of different ways before selecting the one analysis that confirmed [their] hypothesis.” They then defend themselves against this claim, or implication, that we never made.

Beall and Tracy’s response was more understandable to me than Kotsko’s—after all, Eric and I were criticizing their research and saying (correctly, I believe) that their experiments are dead on arrival, essentially too noisy for them to ever learn anything interesting about the research questions they’re studying, so that’s bad news even though we were not accusing them of ethical violations. In contrast, Kotsko is a third party so it seems particularly ridiculous to see him first exaggerating the criticisms of Zizek, and then shooting down the exaggeration.

But in any case, perhaps it would be useful to give a name to this sort of behavior (or maybe it already has a name)?

P.S. Before we slam these postmodernists too much, let me remind you of this excellent quote from Frederic Jameson. He speaks truth.

Message to Booleans: It’s an additive world, we just live in it

Boolean models (“it’s either A or (B and C)”) seem to be the natural way that we think, but additive models (“10 points if you have A, 3 points if you have B, 2 points if you have C”) seem to describe reality better—at least, the aspects of reality that I study in my research.

Additive models come naturally to political scientists and economists, including myself. We think of your political attitudes, for example, as a sum of various influences (as for example in this paper with Yair). Similarly for economists’ models of decisions in terms of latent continuous variables. But my impression is that “civilians” think in a much more Boolean way, with different factors being switches that flip you to one state or another.

And, when it comes to statistics, applied people often think Booleanly or lexicographically (“Use rule A, with rule B as a tiebreaker”) and, I think, make mistakes as a result. For example, consider the attitude that seems to be prevalent in econometrics, that you want to use an unbiased estimate and then reduce variance only as a secondary concern. As we’ve discussed elsewhere in this space, such an attitude is incoherent because in practice the only way to get an unbiased estimate is to pool data and thus assume the effect of interest does not vary. Also recall the foolish survey researchers who don’t want to let go of the fiction that they are doing theoretically-justified inference using the principles of probability sampling.

We live in an additive world that our minds try to model Booleanly. Sort of like how Mandelbrot pointed out that mountains and trees are fractals but we like to think of them as triangles, circles, and sticks (as exemplified so clearly in childrens’ drawings).

Hey, I just wrote my April Fool’s post!

(scheduled to appear in a few months, of course).

I think you’ll like it. Or hate it. Depending on who you are.

Wegman Frey Hauser Weick Fischer Dr. Anil Potti Stapel comes clean

Thomas Leeper points me to Diederik Stapel’s memoir, “Faking Science: A True Story of Academic Fraud,” translated by Nick Brown and available online for free download.

I’d like to see a preregistered replication on this one

Under the heading, “Results too good to be true,” Lee Sechrest points me to this discussion by “Neuroskeptic” of a discussion by psychology researcher Greg Francis of a published (and publicized) claim by biologists Brian Dias and Kerry Ressler that “Parental olfactory experience [in mice] influences behavior and neural structure in subsequent generations.” That’s a pretty big and surprising claim, and Dias and Ressler support it with some data: p=0.043, p=0.003, p=0.020, p=0.005, etc.

Francis’s key grounds for suspicion is that Dias and Ressler in their paper present 10 successful (statistically significant) results in a row, and, given the effect sizes they estimated, it would be unlikely to see such an unbroken string of successes.

Dias and Ressler replied that they did actually report negative results:

While we wish that all our behavioral, neuroanatomical, and epigenetic data were successful and statistically significant, one only need look at the Supporting Information in the article to see that data generated for all four figures in the Supporting Information did not yield significant results. We do not believe that nonsignificant data support our theoretical claims as suggested.

Francis followed up:

The non-significant effects reported by Dias & Ressler were not characterised by them as being “unsuccessful” but were either integrated into their theoretical ideas or were deemed irrelevant (some were controls that helped them make other arguments). Of course scientists have to change theories to match data, but if the data are noisy then this practice means the theory chases noise (and the findings show excess success relative to the theory).

I would also like to say that it’s probably not a good idea for Dias and Ressler to wish that all their data are “successful and statistically significant.” With small samples and small effects, this just isn’t gonna happen—indeed, it shouldn’t happen. Variation implies that not every small experiment will be statistically significant (or even in the desired direction), and I think it’s a mistake to define “success” in this way.

Do a large preregistered replication

In any case, the solution here seems pretty clear to me. Do a large preregistered replication. This is obvious but it’s not clear that it’s really being done. For example, in a news article from 2013, Virginia Hughes describes the research in question as “tantalizing” and that “other researchers seem convinced . . . neuroscientists, too, are enthusiastic about what these results might mean for understanding the brain,” and she talks about further research (“A good next step in resolving these pesky mechanistic questions would be to use chromatography to see whether odorant molecules like acetophenone actually get into the animals’ bloodstream . . . First, though, Dias and Ressler are working on another behavioral experiment. . . . Scientists, I have to assume, will be furiously working on what that something is for many decades to come . . .”) but I see no mention of any plan for a preregistered replication.

I’d like to see a clean, pure, large, preregistered replication such as Nosek, Spies, and Motyl did in their “50 shades of gray” paper. I recognize that this costs time, effort, and money. Still, replication in a biological study of mice seems so much easier than replication in political science or economics, and it would resolve a lot of statistical issues.

Expectation propagation as a way of life

Aki Vehtari, Pasi Jylänki, Christian Robert, Nicolas Chopin, John Cunningham, and I write:

We revisit expectation propagation (EP) as a prototype for scalable algorithms that partition big datasets into many parts and analyze each part in parallel to perform inference of shared parameters. The algorithm should be particularly efficient for hierarchical models, for which the EP algorithm works on the shared parameters (hyperparameters) of the model.

The central idea of EP is to work at each step with a “tilted distribution” that combines the likelihood for a part of the data with the “cavity distribution,” which is the approximate model for the prior and all other parts of the data. EP iteratively approximates the moments of the tilted distributions and incorporates those approximations into a global posterior approximation. As such, EP can be used to divide the computation for large models into manageable sizes. The computation for each partition can be made parallel with occasional exchanging of information between processes through the global posterior approximation. Moments of multivariate tilted distributions can be approximated in various ways, including, MCMC, Laplace approximations, and importance sampling.

I love love love love love this. The idea is to forget about the usual derivation of EP (the Kullback-Leibler discrepancy, etc.) and to instead start at the other end, with Bayesian data-splitting algorithms, with the idea of taking a big problem and dividing it into K little pieces, performing inference on each of the K pieces, and then putting them together to get an approximate posterior inference.

The difficulty with such algorithms, as usually constructed, is that each of the K pieces has only partial information; as a result, for any of these pieces, you’re wasting a lot of computation in places that are contradicted by the other K-1 pieces.

This sketch (with K=5) shows the story:

Screen Shot 2014-12-14 at 6.23.09 PM

We’d like to do our computation in the region of overlap.

And that’s how the EP-like algorithm works! When performing the inference for each piece, we use, as a prior, the cavity distribution based on the approximation to the other K-1 pieces.

Here’s a quick picture of how the cavity distribution works. This picture shows how the EP-like approximation is not the same as simply approximating each likelihood separately. The cavity distribution serves to focus the approximation in the zone of inference of parameter space:

Screen Shot 2014-12-14 at 6.23.45 PM

But the real killer app of this approach is hierarchical models, because then we’re partitioning the parameters at the same time as we’re partitioning the data, so we get real savings in complexity and computation time:

Screen Shot 2014-12-14 at 6.23.54 PM

EP. It’s a way of life. And a new way of thinking about data-partitioning algorithms.

Damn, I was off by a factor of 2!

I hate when that happens. Demography is tricky.

Oh well, as they say in astronomy, who cares, it was less than an order of magnitude!