Philosophy of Bayesian statistics: my reactions to Senn

Continuing with my discussion of the articles in the special issue of the journal Rationality, Markets and Morals on the philosophy of Bayesian statistics:

Stephen Senn, “You May Believe You Are a Bayesian But You Are Probably Wrong”:

I agree with Senn’s comments on the impossibility of the de Finetti subjective Bayesian approach. As I wrote in 2008, if you could really construct a subjective prior you believe in, why not just look at the data and write down your subjective posterior. The immense practical difficulties with any serious system of inference render it absurd to think that it would be possible to just write down a probability distribution to represent uncertainty. I wish, however, that Senn would recognize my Bayesian approach (which is also that of John Carlin, Hal Stern, Don Rubin, and, I believe, others). De Finetti is no longer around, but we are!

I have to admit that my own Bayesian views and practices have changed. In particular, I resonate with Senn’s point that conventional flat priors miss a lot and that Bayesian inference can work better when real prior information is used. Here I’m not talking about a subjective prior that is meant to express a personal belief but rather a distribution that represents a summary of prior scientific knowledge. Such an expression can only be approximate (as, indeed, assumptions such as logistic regressions, additive treatment effects, and all the rest, are only approximations too), and I agree with Senn that it would be rash to let philosophical foundations be a justification for using Bayesian methods. Rather, my work on the philosophy of statistics is intended to demonstrate how Bayesian inference can fit into a falsificationist philosophy that I am comfortable with on general grounds.

28 thoughts on “Philosophy of Bayesian statistics: my reactions to Senn

  1. “if you could really construct a subjective prior you believe in, why not just look at the data and write down your subjective posterior” – because people are bad at math, so they aren’t going to update the prior optimally. After all, I can recognize 582984190785401 as a number, and 25.2907197841 as a number, but I wouldn’t care to raise the first to the power of the second in my head if I wanted an accurate answer.

    I am working with a dataset of about 800 observations and ten variables that interact in occasionally mildly complex ways. If the engineering expert gives me a prior on the degradation rate of the stuff I’m looking at, it’s only an approximation to a prior. But even if it were a real, true, exact prior, the only part of the Bayesian modeling process I’d skip would be a sensitivity analysis of the choice of prior. I certainly wouldn’t skip the rest of it.

    • John:

      I think you misunderstood my point. I was not claiming that people could write down their subjective posterior distributions! My point was the contrapositive: Just as it is silly to imagine that people could internalize all the data and write down a subjective posterior, it’s similarly silly to imagine they could write down a subjective prior (except in some special cases such as coin flipping where you can determine the probability distribution from physical principles).

      • I know that my car is longer than me and shorter than my garage. So my prior for the length of my car is N( mean = 13ft, sd = 7ft).

        As long as the true length is in the high probability manifold of this prior – which it is gaurunteed to be – this prior will work fine (i.e. the true length will be in the high probability manifold of the posterior and a reasonable Bayesian interval will contain the true length)

        • Joseph:

          Yup, priors are great. My problem is with the doctrine of subjective priors, in which one’s prior is considered unrefutable because it is said by definition to represent one’s subjective information.

        • Are these two different points? :

          (1) the difficulty to construct and express priors that capture all prior knowledge (prior to beginning the experiments, capturing the data, or dealing specifically with the supplied data under consideration)

          (2) allowing critical pushback against one’s subjective prior, not using subjectivity as an excuse against a responsibility to defend to meet fair criticism – because the prior must be substantial enough to allow work to be done with weak data towards a controversial inference, and a worker may (in good faith) have a prior of *excessive* strength that would tilt the calculation to a preferred outcome.

          Against the de Finetti subjective Bayesian approach, (1) is a strike because of practical impossibility, and (2) is a strike against applicability when seeking interesting sound inferences where there isn’t enough data from enough related situations to make frequentist best-practices possible.

          Am I right you are making two different points (1) & (2) in the posting and comment thread?

        • But you still need to say clearly what the role of the prior is in the particular problem in order to even understand what it would mean to “check” it. Next, you need your account to be able to distinguish the different obstacles to its serving that role. If, say, the model is serving to adequately capture the statistical information, that might allow the check to proceed in a stringent manner. If the prior-model combo is just left as a vague combination of a number of things you might want, e.g., adequacy, simplicity, introducing some background info, assurance of data-prior fit, then the notion of testing its adequacy, or “refuting” it, is left hopelessly vague.

        • Andrew, I do not understand this comment. Maybe I do not understand the “doctrine of subjective priors.”

          It seems to me that your prior has to reflect your subjective information before you look at the data. How can it not?

          But this does not mean that the (subjective) prior that you choose is irrefutable; Surely a prior that reflects prior information just does not have to be inconsistent with that information. But that still leaves a range of priors that are consistent with it, the sort of priors that one would use in a sensitivity analysis, for example.

          Maybe I am missing something…

        • Mayo:
          Let me remove the vagueness for at least one example. Suppose you want to measure the length of the car and so have a “measurements = length + errors”. Here then is precisely what I require:

          (A) “The actual length of the my car has to be in the high probability region of the posterior”

          I want this because then any reasonable Bayesian interval from this posterior will contain the true length. All I require of the prior is the following:

          (B) “The actual length of the car has to be in the high probability region of the prior”

          I know for certain that the length of my car is between 6 -20 ft. So any prior that puts its high probability region over that interval is guaranteed to meet requirement B. No further testing of the prior is required. If 6-20ft was just a guess, I could test it by verifying that my car is longer than 6 feet and shorter than 20 feet, by for example seeing that it is longer than me and fits in my garage.

          Admittedly there are an infinite number of priors that satisfy B, but that isn’t a problem. Each of these priors will lead to A (assuming the likelihood hasn’t been screwed up) and therefore will lead to Bayesian intervals that contain the true length. The length of these intervals can vary greatly depending on which prior was used – which is a matter of great practical importance – but from a philosophical perspective they will all be correct since they contain the true value 100% of the time.

        • No it wasn’t guesswork. I examined 10^20 copies of this universe and measured the length of my car in each one. Then I made a histogram of all the answers I got. It’s all very Frequentist and Objective, but definitely not tongue-in-cheek.

        • joseph,
          I find your guesswork more interesting than your car. Let`s guess the posterior distribution and estimate what the best prior distribution ought to be. We can do that just the same, no?

        • The posterior depends on unknown data (measurements) which havne’t been taken yet. The prior however was based on the known fact (not guesswork!) that my car’s lenght is between 6-20ft. So the prior can be known now, but the posterior has to wait for the data.

          I reiterate that any prior will do that puts the true value of my car’s length in its high probability region. Priors that satisfy this condition will lead (with a good likelihood) to posterior intervals that catch the actual length of my car.

          There are an infinite number of priors that satisfy this condition, but not every prior does. So priors are both non-unique and refutable.

          I seem to be the only person here who holds this view. Everybody else is either a Subjectivist (priors are non-unique and irrefutable) or Objectivist (there is only one true prior and its refutable).

          All I can say is I hope you Objectivists find that “real, true, exact prior” for the length of my car one day, and I hope you Subjectivists have fun with the following personalistic prior for my car’s length: prior ~ N(mean=1,000,000 ft, sd=1ft).

        • Joseph,
          I think I understand your points and can see there is no perfect objective prior. It also appears that a subjective prior is useful mostly to the person who conjured it up, and they are the only person who can afford to trust it for anything important. To wit, if you have a reason to invent a prob manifold for the length of your car, then you might find this swag acceptable for the task at hand. But I know that while you NEED the length of the car to be in the high density part of the manifold, there is no reason for me to ACCEPT that the length is ACTUALLY PRESENT in that high density interval. From my perspective, it would be equally reasonable to guess the posterior distribution and determine an optimal prior distribution using the likelihood. Either direction leaves me with an uncomfortable distrust of the method employed.

        • johnbyrd,

          My car is sitting in my garage. My garage is 20ft long. So my car must be less than 20ft. This is a simple physical fact. Right off the bat, I know

          0ft < car length <20ft.

          So if I create a P(length) whose high probability region is equal to the interval [0,20], then the true length of the car has to be in the high probability region of P(length). That is just a trivial mathematical fact.

          There is nothing subjective about this. It is based on trivial physical and mathematical facts which can be tested and verified. So if you tell me you see no reason to accept it, I don't know what to say. That's kinda bizarre.

        • Joseph,
          The minimum and maximum are only just that. You have nothing that gives a prob manifold, unless it is a rectangle. I fear you are passing off cheap speculation as something more meaningful. The mathematical tidiness for organizing your thoughts does not translate into making inferences others can trust. I like my ability to guess a car length better than yours. That is a problem for scientists.

        • Johnbyrd,
          I have no idea where you get this “guessing” stuff from. I haven’t mentioned one word about anybody guessing the length of cars. Nothing I’ve said pertains to guessing in any way, shape, or form.

          It’s true that the max and min are only partial information, but all statistics is based off partial information, since if we already knew the true value of the car we wouldn’t bother with statistics.

          It’s also true that the requirement that P(length) place it’s high probability manifold over [0,20] doesn’t define a unique prior. I might for example have two distributions P_1 and P_2 that satisfy this.

          But here’s the thing you can rely on – because it’s a simple mathematical fact – both priors will lead to Bayesian posterior intervals that contain the true value (assuming the Likelihood hasn’t been screwed up).

          P_1 might lead to [13,16] while P_2 leads to [12,15]. Both of these contain the true value (14ft) and so both are right!

          For that matter, I might know that my car is longer than me, so I choose a different prior P_3 that places it’s high probability manifold over [6,20]. This new prior P_3 might lead to a Bayesian posterior interval [13.5,14.5]. But guess what? It’s right too!

          Verifiable physics facts and the mathematical consequences worked out from them are not “speculation”. Cheap or otherwise.

        • johnbyrd,

          I’ll try one more time. I claim you’ll get posterior intervals with that cover the true lenght of my car if you combing the following three pieces:

          Part 1) Information such as length < 20ft. This is either known or can be verfied. Usually such peices of information are known as "facts".

          Part 2) Construct a distribution P(length) whos high probability region covers this interval {0,20]. This prior is gaurnteeed to contain the true lenght of my car in its high probability region. This is true by construction/defintion. Most people would call this a "tautology".

          Part 3) Compute the posterior distribution with a good Likelihood. Then the true length of my car will be in the high probability region of the posterior. Any resonable posterior interval will then contain the true length. This is just mathematics (given precise definitions of what I'm talking about). Most people call this a "theorem".

          So everything is based on a fact, a tautology, and a theorem. Now if you think one or more of these suffers from the sin of subjectivity, then respectfully, you need to think harder about what I'm saying.

        • joseph,
          Your Part 2 is hopelessly bankrupt from my view (save for any self-satisfaction it might give as you stand pondering outside the garage). Agreed you appear to assign 30% prob to the tails outside your min and Max, but worst of all to me is you appear to say you can guess the density values of all possible lengths of the car in advance. That is guesswork cloaked behind math. I am afraid it fools alot of people into thinking it is borne from more objective methods.

        • You keep talking about guesses and a subjective Bayes approach which I’ve never once mentioned (and don’t believe in). Since you can’t drop the “guesses”, let’s just see what happens when two people do guess priors.

          Suppose you come up with a guess P_johnbyrd and I come up with a guess P_joseph. Further suppose our two guesses differ in many respects (especially since you are a better guesser than me), but they agree on one thing: they both put their respective high probability regions over the interval [0,20].

          Then I claim the following is a mathematical fact (with some assumptions about the Likelihood):

          if the true length of the car is actually in the interval [0,20] => the true length of car will be in the posterior interval constructed from either P_johnbyrd or P_joseph

          I’m not claiming the posterior intervals constructed by you or me will be identical. And I definitely don’t claim I can find the one true, exact prior for the length of my car. What I’m claiming is that I can find a prior that works in the one way I need it to work. It may not be the best, it may not even be “right”, but it will do the job.

          Why am I so sure I can achieve this very limited goal? Because its trivially possible to construct a distribution which puts it high probability region over the interval [0,20] and I can verify that my car actually is less than 20 ft by driving it into my garage. The above theorem does the rest.

          Its an empirical fact that these kinds of priors do work in practice. Why do think that is? Neither of us believes the Subjective Bayesians. But there’s no hope of interpreting a prior on the length of my car as Frequency. And the Objective Bayesians claim there is such a thing as the “true, exact prior” for the length of my car, but no one has ever actually seen it.

          If you think carefully about the mathematical fact above you’ll find the reason they work. Or better yet, just verify it yourself with some simulated data. After all, I’m not argueing a belief or a philosophy. I’m making a claim which can be proved or disproved.

        • Joseph, that is potentially a horrible prior – you are assigning 30% probability to something that you think is basically impossible – that the car is either shorter than you, or too big to fit in your garage. Depending on the decisions you are considering, this might be an expensively wrong belief.

          This sort of problem is not just a hypothetical and theoretical detail, but has been a major feature of a lot of climate science in recent years, where it has been popular to use a broad (supposedly “ignorant”) prior that assigns high probability to catastrophe. When updated with limited and weak data, the posterior probability of catastrophe is still somewhat worrying (albeit rather lower than in the prior), and the researcher says “look – the data say there’s a high chance of disaster”.

        • Agreed. You simply can’t go wrong putting zero probability on values that are known to be impossible, and it is all too easy to run into disasters if you don’t. This seems to be pretty well known – I saw a paper by William Briggs on “probability leakage” about this topic just the other day for example.

          Although it should be noted this almost always occurs with the Normal Distribution. Have you ever seen a real example where X~ Normal Distribution but where X/StanDev (a dimensionless quantity) really could take on any value between -infinity and +infinity?

          When I talk about a prior being ok above, I’m only referring to whether it leads to interval estimates that contain the true value. The intervals [6,20] or [13,15] or [13.9,14.1] are all “right” in the sense that they contain the true value of car length=14ft. I make no other claim as to their usefulness for any particular purpose.

      • No I don’t think JohnB is missing it. The idea is that understanding logical or mathematical implications is hard and that that is what Bayesian machinery does for you: gives you statements of the form: “if I were to believe [prior and likelihood] and I observed [data] then this would imply […]” The only difference of this from a regular deductive argument is that the prior is a distribution not a proposition, so the posterior is too. But nobody assumes that proving theorems is as easy as stating axioms.

        • Will:

          I strongly believe in stating one’s assumptions clearly, and as my work demonstrates, I’m a big fan of using Bayesian inference to combine assumptions with data to get conclusions. My problem is with the doctrine of subjective priors, in which one’s prior is considered unrefutable because it is said by definition to represent one’s subjective information. I am much more comfortable thinking about the prior (and, for that matter, the likelihood) is the product of assumptions rather than as a subjective belief.

        • Andrew:

          I was assuming that the view I sketched substantially agrees with your practice, right down to the demand for posterior checks. Your model tells you what would be reasonable to conclude under some coherent assumptions – which may not in fact be yours – but not whether they are in fact reasonable. And the checking allows you to probe whether or in what respects they _are_ actually reasonable. I suppose that it is after this second step that MAYO might say that a mere (logical) implication turns into a (psychological) inference i.e. something that you personally might conclude and walk around believing thereafter. Or not. Perhaps she’d like to say.

          In any case, the thing seems to me to be defused when, as I think MAYO might be suggesting, we allow that a subjective prior is the set of coherent beliefs of a subject who need not, and perhaps never quite could be, us. This decoupling is useful. For example, when considering some claim we might index a set of otherwise identical hypothetical subjects according to the extremity of their prior presumptions against it and allow the Bayesian machinery to tell us how extreme their prior beliefs would have to be to be ‘persuaded’. The priors in this exercise are still possible coherent sets of beliefs, that would, I suppose, be ‘irrefutable’ if any of the imaginary subjects existed, but since they don’t (and even if they do somewhere) they would seem here to operate as innocent statements of assumptions in the way you prefer.

        • The problem is that stating “if I were to believe…and I observed then this would imply” is NOT yet to infer anything. This is what many people miss. As with any other argument, in order to “detach” a conclusion, however qualified, the premises must be at least approximately true. This involves empirical assumptions and inductive (evidence-transcending) inferences and claims. Unless you can do that, you don’t have an account of inference.

        • Can you give a single example of a real research project using the type of inference you are talking about? I’d like to read it because I can’t, for the life of me, figure out what you mean in practical terms even going through your website.

  2. Hi—I just found you from Mayo’s errorstatistics blog and I think it is amazing that we can have these types of discussions not only in but across blogs! I have been reading her posts on Senn (jan 14, 15, 23 & 24) but didn’t realize you were also posting on him until her Feb 3 note about it and I was so happy to get that cross link. So I just wanted to comment that it is really helpful for us readers to access the full conversation between blogs when they include cross links–(esp. for us who have low “prior” info on “auxiliary blogs”:).

  3. Pingback: What is a prior distribution? « Statistical Modeling, Causal Inference, and Social Science

Comments are closed.