Lack of free lunch again rears ugly head

We had some discussion on blog the other day of prior distributions in settings such as small experiments where available data do not give a strong inference on their own, and commenter Rahul wrote:

In real settings I rarely see experts agree anywhere close to a consensus about the prior. Estimates are all over the place.

Fair enough. And that’s the way it is. When data are weak and there’s no prior consensus, there will be no posterior consensus. It would be unscientific to think otherwise.

Also worth remembering that this is true of the data model as well. In real settings there may be a consensus on the data model (logistic regression or whatever) but that consensus may be inappropriate. Which could be worse than no consensus at all.

This post is not a criticism of Rahul’s comment. Rather, I’m just clarifying its implications: When data are weak and there is a lack of consensus about the prior, Bayesian inferences can be indeterminate, nothing so clean as the classical options:

– p less than .05. Win. A discovery. Wrap it some theory and send it off to PPNAS!

– p greater than .05. Keep playing around with new analyses and new experiments until you get the win.

26 thoughts on “Lack of free lunch again rears ugly head

  1. To clarify, I agree with Andrew on this mostly but the scenario I watch for is people cherry-picking a prior from the set of all reasonable priors & then selectively using the posterior that results to further whatever be their pet agenda.

    i.e. This then would be a flavor of fishing that we need to guard against. Not declaring the lack of a consensus prior would be a new fork in the path.

    I didn’t intend the comment as a Bayes vs NHST argument but more pointing out that the possibility of promoting favorable conclusions using Bayesian models.

    • “but the scenario I watch for is people cherry-picking a prior from the set of all reasonable priors & then selectively using the posterior that results to further whatever be their pet agenda.”

      Is there an example of anyone having done that? What I have seen is opposing opinions (about the prior) being used to derive divergent posteriors, in order to understand better how to interpret the data given different prior beliefs.

  2. This *ought* to be fairly easy to spot by an informed, engaged referee who requests to see the analysis under alternative reasonable priors, but it’s always an issue. Worse though, is the prior that *looks* reasonable but really isn’t owing to some mathematical subtlety…

      • I’m not sure I have a great example, but it’s certainly possible that there are priors with reasonable means (which is the main thing people not paying close attention look at) but which have substantial probabilities in unlikely spaces which nonetheless are influential in integration. Andrew has certainly written about how uniform priors *sound* reasonable but really aren’t in some problems.

      • Hugely diffuse BUGS-style priors for logistic regression favor extreme outcomes. Not subtle enough? People do it a lot, especially with hierarchical models.

      • In high dimensions it can be especially problematic. Imagine for example a function on [0,1]x[0,1] which is non-decreasing and lipschitz with an unknown maximum slope. You’re going to approximate this as 2000 piecewise linear segments. Each one should have non-negative slope, but if you put an independent prior on each of these, you’re going to wind up favoring certain types of functions, and not others, also you may wind up favoring impossible functions (ie. ones which exceed 1 in their range).

        • Andrew has a nice paper about this, “Bayesian Model Building Through Pure Thought”, from sometime in the very early 90s I think. One of his examples is very much like yours, only even simpler: you’re modeling survival as a function of age. You assume that at each age some fraction f(age) of the remaining survivors will die. You estimate a different f for each age, with a uniform prior from f(age-1) to 1; that is, you assume that a higher fraction of people of age 80 will die than people of age 79, and that that will be higher than age 78, and so on. It all seems totally simple and reasonable. Now you put in some data from a real cohort, and you find that pretty much no matter what, the estimated fraction increases almost perfectly quadratically as a function of age for high age, even if this doesn’t fit the data very well. It turns out that modeling the priors like that will implicitly build in a pretty strong assumed relationship between them.

        • Well, you’d probably make it converge via data and some kind of likelihood, but it would converge into an extreme tail region of the prior. Draws from a “good” prior should be the full range of functions that you might plausibly believe are real, you could be “blanketing” that range by including additional functions that you know are impossible, but if almost none of the draws from the prior are reasonable then it’s a bad prior, and similarly if the range of the draws doesn’t cover the range of plausible values then you have a problem.

          Phil’s example is perfect. To favor functions that are reasonable you’d need something like a gaussian process, and the covariance matrix of such a process looks nothing like a diagonal matrix, dependency is necessary for modeling realistic functions. But, at first, constructing the prior like this seems reasonable because you are using real valid information, you’re just not using enough of it.

          for example, it’s on [0,1]x[0,1] so the slope should be about 1 on average, so let’s put an exponential(1) prior on each increment….

          plot(x=(0:2000)/2000,c(0,cumsum(rexp(2000,1)))/2000,type=”l”);

          Basically the only function you’re picking out is the line y=x. Independence here is actually a WAY TOO STRONG assumption.

      • The dangers of being naive are nicely discussed with worked examples here – http://www.tandfonline.com/doi/abs/10.1080/00031305.2012.695938

        Lessening these dangers discussed by X here – http://arxiv.org/pdf/1402.6257.pdf

        Still getting most folks to plot the critical marginal priors or do realistic sensitivity analyses of priors is like pulling teeth from an angry alligator – as one of Andrew’s co-authors once explained to me – “most Bayesian’s just want to turn the Bayesian crank and get an answer, they don’t want to think hard about answering the question and that’s what you asking them to do so don’t be surprised.”

    • Well, if we had “informed, engaged referees” I’m sure we’d not publish most of the noisy-measurement, p-value shenanigans Andrew publishes about (fat arms, red dresses, power poses etc.).

      Then again, when people start refereeing papers in 15 minutes all bets are off about what will be caught.

  3. But if people disagree about the prior is that not data?

    E.g. can we not go all meta on this and do some kind of averaging?

    Or would this also require a prior nobody agrees on, and so on ad infinitum?

    Put differently, is there not some level / recasting of the problem at which we might all agree on a prior?

        • Yes, there are other groups as well, it is much harder than meta-analysis of data (e.g. multiplying independent likelihoods) but it is not hopeless.

        • In principle, I agree. But I did have one bad experience trying this approach. A panel of 6 experts who were hired to participate in a facilitated discussion for an entire day divided into two camps (4 on one side, 2 on the other). Suffice it to say that in the end, the supports of their priors were disjoint!

          I learned a lot about “expertise,” the difficulty of getting non-statisticians to understand the concept of a prior, and a lot more about ego!

        • Clyde:

          The trick is to control the timing of the dinner for the panel and have it delayed until they _converge_.

          More seriously, and expert group facilitator can be key.

          But given that, the outcome is the same as in a meta-analysis that concludes that the studies cannot be made sense of jointly (combined in whole or any part) – it has arrived at a very important conclusion/contribution – there is no acceptable empirical evidence at present!

        • Keith:

          I like your dinner suggestion or, more generally, putting in place strong incentives for pooling as opposed to posturing. Although it implies an odd weighting scheme whereby priors are weighted by the degree of hunger of the persons deciding. E.g. Had Joe not eating a Twinkie that morning, the afternoon’s decision might well have been different.

          Incidentally, my sense is that the Vatican council that selects the Pope works under this kind of incentives. My understanding is selectors are all locked in a room until a decision is made. Nobody leaves.

        • Fernando:

          The dinner suggestion was not mine but the author of group censuses methods for developing judgemental prediction rules, who told us in person it was very important – but they were too reluctant to put that in a published paper.

          We replicated the group consensus process three times (different panels, same prediction problem and same facilitator).

          Very fun research, group processes/dynamics were prominent, different and important in each but the predictive rules fairly similar (recalling from memory.)

          Naglie G, Silberfeld M, O’ROURKE K, Fried B, Durham N, Bombardier C, Destky A. Convening Expert Panels to Identify Mental Capacity Assessment Items. Canadian Journal on Aging 14 (4):697-705, 1995.

Leave a Reply

Your email address will not be published. Required fields are marked *