In my [Keith] previous post that criticised a publish paper, the first author commented they wanted some time to respond and I agreed. I also suggested that if the response came in after most readers have moved on I would re-post their response as a new post pointing back to the previous. So here we are.

Now there has been a lot of discussion on this blog about public versus private criticism and their cost and benefit trade offs. One change I am making is to refer to first, second or third author rather than names. Here I should also clarify that I have previously worked with the first and second author (so they are not strangers) and that the first author posted the paper on my blog post (they brought it to my attention). Now my three main points in that original post were – 1. failing to distinguish what something is versus what to make of it, 2. ignoring the ensemble of similar studies (completed, ongoing and future) and 3. neglecting important non-random errors. So when the first author brought the paper to my attention, I thought is was going to be an example of not neglecting those three things. But when I read the paper I felt it pretty much did neglect 1 and 3.

One of the main points in the response by all three authors was clarification of the goal of the paper (all other responses were just by the first author). They claimed the goal was to simply clarify that the fixed effects estimate is an estimate of *some* population’s average though not necessary one that would be of interest (depending on the context). Quoting the response “Given the didactic goal of the paper, the issue is not so much whether such a population is of interest, but just the realization that the analysis is informing us about such a population.” I fully agree with that (with a caveat below). In my experience, getting an adeqaute sense of that population and generalising from it to a population of interest – is a very big stretch. The first author responded that his experience is different and especially given the epidemiologists he works with, it is often doable. Fair enough – experiences and research team expertise differ. Now my reading of the paper was that it suggested more than just that clarification and gave the impression fixed effects should be used much more and often it was more scientifically relevant. But those are just my interpretations and that can vary as does our apparent views on what is meant by scientifically relevant if not also a population. So I was expecting the paper not to fail to distinguish what something is versus what to make of it, but apparently the authors never intended to.

One of my points the first author chose not to respond to was my review of Rubin’s conceptual ideas. Again fair enough. However that is where I believe there is a serious technical disagreement. This became clearer in the first author’s response e.g. “[Keith referring to] varying study quality… [first author] This is a misconception. The paper goes to considerable lengths to allow for underlying effects to differ”. That is the caveat I referred to above in that fixed effects estimate is an estimate of *some* populations average only if the between study variation is not importantly driven by design (AKA study quality or methodological) variation. This kind of variation is usually/mostly the result of haphazard biases and has different implications for what is to be made of the variation and expectation. Briefly, the variation needs to be included in the uncertainty quantification and the expectation is no longer of direct interest (more below). So while science (AKA clinical or biological) driven variation allows it to be excluded in the uncertainty quantification and the expectation is of direct interest for the average of some population. These are very different.

Fisher was one of the first to bring attention to this issue (that is why I gave the reference), the Rubin references discusses it and it is even in the Cochrane Handbook (section 9.5.1) which was edited by the second author (I believe I wrote the first draft for this section and the second author revised it). There also was full discussion of this issue at an RSS meeting in 2002. A simple example may make the issue clear. If a fixed object is measured with three measuring instruments with differing (haphazard) biases, there will be variation in the measurements that is not from the object and the average won’t be estimating the object but the object plus average bias (whatever that is). With three fixed objects each measured with the same unbiased measuring instrument there will be variation in the measurements that is “real” but the average of the three objects is fixed. Here, the average measurement will be estimating the average of the three objects and the measurement variation (above the instrument’s variance) need not be included in the uncertainty of the estimated average. And this would be true for any population involving various proportions of the three objects – fixed population average that the properly weighted the average would estimate it. Now one could argue the expectation with varying bias is of indirect interest being the populations true average plus some weighted combination of the biases. But then one should clearly warn of the presence of such biases even in the absence of how to address those biases. So here the paper is neglecting important non-random errors. The authors, if they agree, may wish to consider adding a note to their paper about this issue.

One of my other points the first author choose not to respond to was to work through the likelihood mechanics involved. Again fair enough. But they referred to a paper by Danyu Lin and co-author and claimed that it was pivotal and that suggested it may not be as straightforward as I thought. When I read the paper I saw exactly the likelihood mechanics being worked as I expected but there was something I had not seen before. The paper worked through the properties of a mis-specified model which is what a fixed effects model implemented with the full raw data actually is. You consider the effects to vary by study but in the model implemented with all the raw data you purposely (mis)specify the parameter as being exactly the same (give it the exact same symbol) for all studies. Now when I first started doing meta-analysis in the 1980,s the outcomes were usually binary. I had just taken a generalised linear models course, so I implemented meta-analysis using logistic regression with a common odds parameter and differing control rate parameters by study. To formally test for heterogeneity (that we explicitly argued should not be depended upon) the common odds parameter would be replaced with differing odds ratio parameters by study. (See Model 4: Partial Pooling (Log Odds) in Bob’s extensive tutorial for a full Bayesian approach to this.) But I knew with binary outcomes whether I code the data by individual patient (0 or 1) versus number of failures and successes – the answers would be exactly the same (given the same parameter specifications and the magic of sufficiency) so this summary data versus raw data meta-analysis quest (given you had both) seemed completely moot to me.

So I learned something. The fixed effects model implemented with the raw data actually is mis-specified – that is wrong – but it estimates the correct average for some population (depends on sufficiency and other technicalities – but it often does). So that must have been puzzling to many and that needed to be sorted out. No doubt there is also asymptotic things that needed to be sorted out but Charlie Geyer has convinced me that such considerations are not a good use of my time.

As before, I (the first author of the paper) will respond, but it will take time. Briefly, the claims of “neglect” here don’t seem to be supported by what’s actually in the paper.

Given that neither post on this paper has been prefaced with a summary of what it actually says, I encourage anyone reading the posts to first tackle the paper, and perhaps also the slide set available under “Supporting Information”.