Skip to content
 

Prior information, not prior belief

The prior distribution p(theta) in a Bayesian analysis is often presented as a researcher’s beliefs about theta. I prefer to think of p(theta) as an expression of information about theta.

Consider this sort of question that a classically-trained statistician asked me the other day:

If two Bayesians are given the same data, they will come to two conclusions. What do you think about that? Does it bother you?

My response is that the statistician has nothing to do with it. I’d prefer to say that if two different analyses are done using different information, they will come to different conclusions. This different information can come in the prior distribution p(theta), it could come in the data model p(y|theta), it could come in the choice of how to set up the model and what data to include in the first place. I’ve listed these in roughly increasing order of importance.

Sure, we could refer to all statistical models as “beliefs”: we have a belief that certain measurements are statistically independent with a common mean, we have a belief that a response function is additive and linear, we have a belief that our measurements are unbiased, etc. Fine. But I don’t think this adds anything beyond just calling this a “model.” Indeed, referring to “belief” can be misleading. When I fit a regression model, I don’t typically believe in additivity or linearity at all, I’m just fitting a model, using available information and making assumptions, compromising the goal of including all available information because of the practical difficulties of fitting and understanding a huge model.

Same with the prior distribution. When putting together any part of a statistical model, we use some information without wanting to claim that this represents our beliefs about the world.

55 Comments

  1. Anonymous says:

    The appropriate response to that question is “if two different frequentists use the same information, they can come to two conclusion. Does that bother you?”

    Frequentist significance tests are sensitive to the choice of test function, even when all physical data+assumptions are unchanged. If there’s a series of tests as is usually the case in an extended analysis, the outcomes are typically sensitive to the order in which the tests are performed. If data consists of two parts D1+D2 and they are processed in different orders under the same assumptions (information) it can lead to different conclusions. I go could go on.

    I wonder if any of them have ever considered what formalism results if you start requiring that the same information should lead to the same conclusions? What’s a good word for this property anyway? Coherence?

    • Andrew says:

      Anon:

      It’s difficult to require that the same scientific information lead to the same inferences, because in practice there is only a loose link between scientific information and a statistical model. To put it another way: a model is of course only an approximation to reality, but it also only an approximation to the information that goes into it.

      • george says:

        …so two statisticians using approximately the same information (regardless of which formalism they are using) should come to approximately the same conclusions?

        • Andrew says:

          George:

          I should hope so. But some methods are better adapted to certain information. For example, if you want to make a prediction using a linear additive model and you have lots of predictors and your only tool is least squares, you will lose. You need to regularize in some way. So it’s not that all methods are equivalent. Of course there’s a reason we do all this research into coming up with new methods: we need new methods to handle all the information we’d like to include in our models.

      • Anonymous says:

        I’m not talking about two scientists with different information. I’m talking about two statistical analysis based on the same precisely defined information. Whatever might be in your head, you have to set down specific assumptions to carry out any calculation. Based on those well specified assumptions, frequentist conclusions are highly sensitive to arbitrary choices such as the order the tests are performed and all the rest.

        It is in fact easy to require mathematically that results from the same precise data+assumptions be the same.

      • Commenter #? says:

        Thank you for the comment: “in practice there is only a loose link between scientific information and a statistical model.” This seems like a very important point. I’m wondering, if this basically means that one can’t derive all the relevant scientific information starting from just the statistical model.

        And could the point about the model being only an approximation to the information that goes into it be illustrated in the following way? When one has a regression model with two explanatory variables and includes just the main effects, the model is only approximation to the information for example because the interaction term is left out.

        I’d like to read more on these topics. Does anyone have suggestions for relevant literature?

    • Corey says:

      ‘if two different frequentists use the same information, they can come to two conclusion. Does that bother you?’

      I vaguely recall reading about some research in which the investigators actually did this specifically to study the variability in research conclusions caused by statisticians’ personal idiosyncrasies. Can’t find any trace of it, alas.

        • Keith O'Rourke says:

          This is somewhat different in that no one has/had information needed to come to credible conclusions in these observational studies. (And although a Bayesian multiple bias analysis could be incorporate such information it was not available.)

          Nice slides – though very sad that the problem WG Cochran raised 50+ years ago – “Traditional p-values and confidence intervals require empirical calibration to account for bias in observational studies” is finally getting some pioneering work in address it.

          OK, I forgot many still think the distribution of p_values under the Null hypothesis in observational studies is Uniform(0,1) :-(

          • James says:

            I have the impression that something is flying over my head in your last sentence. Is this a point about discreteness of the statistic? Or something else entirely? Is there an example where the distribution, under the null hypothesis, of the p-value is continuous but is not Uniform(0,1)? Maybe there is a meaning in the phrase “in observational studies” that I am not getting.

            • fred says:

              James: not sure it’s what Keith means, but many situations with nuisance parameters (and other complications) will produce, under the null, p-values that are not U(0,1) but stochastically larger than this. Used in tests, the guarantee they provide is that the Type I error rate is controlled at some nominal value, not that it’s equal to the nominal value.

              Of course, with large samples the p-values do approximate U(0,1) well, again under the null.

              • Keith O'Rourke says:

                fred and James:
                > OK, I forgot many still think the distribution of p_values under the Null hypothesis in observational studies is Uniform(0,1) :-(

                I have commented on this a few times on this blog and it was a bit foolish of me to think most (of today’s) readers would be familiar with the topic.

                The slides I was commenting on – “Are Observational Studies Any Good? David Madigan, Columbia University” are concerned with learning about harms (and benefits) of drugs outside of randomized trails which provide very little information.

                So the real interest is causality – do they cause harms in patients who use them? Given the groups being compared are not equal on all other things (not randomized comparisons) the estimates are biased, the bias is unknown and at least within a single standalone comparison nothing can be learned about it from the data.

                What does this mean for type one error?

                Well without bias, repeatedly draw 30 Normal(0,1) outcomes and do the t.test testing that the difference is not 0 and plot the p_values – they will look very Uniform(0,1).
                With bias, repeatedly draw 30 Normal(0,1) outcomes add the bias (e.g. + 2) and do the t.test testing that the difference is not 0 (again bias is unknown) – they will look not very Uniform(0,1) at all.
                Repeat using draws of 100 – without a bias still Uniform(0,1) but with a bias even less Uniform(0,1) than with 30 – things get worse with increasing sample sizes. Same with confidence intervals, with bias the coverage of the unbiased effect decreases with increasing sample size to 0.

                What is surprising to me is that some (apparently many) people who have learned a lot of other stuff about Statistics have completed missed this point -for an example see http://www.stat.columbia.edu/~gelman/research/published/GelmanORourkeBiostatistics.pdf

                As for nuisance parameters (other than bias parameters) the non-uniformity they cause is far less severe – in the reference above we give an example of small samples from perfect RCTs where using a t.test with unequal variances fails to produce p_values that are Uniform(0,1).

                Bayesian approaches utilize informative priors for the biases (using background knowledge and other studies) and my guess is Madigan’s work will lead to better calibrated priors for bias. In his current slides he seems to be mainly estimating null distributions that can be used to get p_values closer to Uniform(0,1) and CIs with better coverage.

        • Anonymous says:

          I think you show it directly. It’s likely you can find tests and a data set that does the following. Take the standard stat 101 analysis where you have two populations. The typical scenario is:

          (1) First test equality of variances assuming possibly different means. Find they are equal variance.
          (2) Test equality of means assuming equal variance. Find the means are different.

          Now compare this to

          (1) Test equality of means assuming possibly different variances. Find they are equal means.
          (2) Test for equality of variance assuming a common mean. Find they have unequal variances.

          All tests being done at the same 5% level or whatever.

          Reversing the order of the tests then reverses the conclusions. Frequentists love to laugh at and dismiss Bayesian coherence. Well this is the price they pay.

          • fred says:

            Anon: Your “101” analysis is a straw man, that doesn’t acknowledge tests that cope with unequal variances, despite these having been around since at least the 1940s.

            It’s as unconvincing as arguing that Bayesians can only use conjugate priors, or that datasets have to be small enough to be entered by hand into a steam-driven difference engine.

            • Anonymous says:

              It’s possible for two sets of absolutely legitimate frequentist tests to reach different conclusions from the same data+assumptions. Nothing you said changes or challenges that fact in the slightest.

              • fred says:

                …and nothing you said provides a compelling example of incoherence actually mattering in practice.

                (For testing equality of means, your first scenario of pre-testing variances – and by naive use of “the same 5% or whatever” pretending you didn’t pre-test – is well-known to invalidate frequentist properties, see e.g. the first two references here. Its lack of coherence with another not-actually-frequentist scenario does not, therefore, provide an example that there is an important “price to pay”.)

                See e.g. here for a discussion on the tradeoff between coherence and calibration. Refusing, on philosophical grounds, to ever trade a bit of coherence for gains in calibration is a dead end.

              • Anonymous says:

                I don’t care if frequentists in practice can sometimes spot their absurd results and ad-hoc fudge everything back to something reasonable.

                This is a blog comment to illustrate a point not a dissertation, but since you brought up “in practice” it’s not unusual for applied stat papers to string together dozens of such tests and conclusions just like it. So “in practice” this little problem explodes in frequentists faces.

                To say it isn’t frequenitst is laughable since the first approach in my example appears and is covered ad nauseam in every introductory frequentist text book I’ve ever seen.

                Frequentist calibration whereby probabilities are matched to frequencies of occurrence is joke an not desirable at all. This is easily the most misunderstood folk belief in statistics. To cut a long story short here, the predictive power and usefulness of distributions in no way requires that kind of calibration and the most useful distributions for inference/prediction don’t have it. And thats assuming they could achieve out-of-sample calibration which rarely happens. So no, I wont be trading any elementary and obvious requirements consistency for frequentist calibration.

  2. Evan Warfel says:

    >> If two Bayesians are given the same data, they will come to two conclusions. What do you think about that? Does it bother you?

    I would answer this slightly differently: It depends on the degree of rationality of each Bayesian, as well as the information they have access too.

    To the extend that ‘knowledge’ is that which engenders accurate predictions, I am not convinced that there is any fundamental, categorical difference between ‘belief’ and ‘information’, with regards to research. A researcher’s hunch is just based on personal pattern recognition. The quality of a researcher’s hunches in part reflects the degree to which she or he isn’t fooling her or himself.

    There was a period of time where if you weren’t Charles Darwin, it would have been considered highly ‘rational’ not to believe in evolution. And thus even if Darwin and a stranger were the two most rational people in the world, based on their own evidence, they would have come to profoundly different conclusions. Still though, we can say that Darwin was more rational than his critics.

    I would also point out that if you have a computer run the same frequentist test a sufficently large number of times, it will not always return the exact same result, due to minute differences in the random number generation process, heat stress on the machine, bit-flips due to cosmic rays, and so on.

    • Jonathan (another one) says:

      I’m not sure I understand your definition of rationality in the Darwin example. Has Darwin explained his rationale to the stranger? If the stranger has rejected it, why? While I’m certain we can say that Darwin proved more correct than his critics, what is the basis for arguing that he was more rational? Or at least more rational than those critics whose objections were fact-based.

  3. konrad says:

    The distinction between “belief” and “model/assumption” is an important one. It is perfectly reasonable (and in fact a very important part of science) for the same researcher to run two analyses using two different and mutually contradictory models, so as to explore the consequences of their associated assumptions. It is rather harder to hold two mutually contradictory beliefs at the same time.

    • JD says:

      I don’t see the distinction you are trying to make here.

      Don’t we all have a distribution of beliefs? There are very few things we believe almost surely. And quite often we believe that contradictory statements could be true with nonvanishing probabilities.

      While I feel that the distinction between belief and model is quite fuzzy, I would almost argue for more absolutism regarding models… that is, “All models are wrong.”

      • Anonymous says:

        Beliefs are irrelvant. The equations have no way of knowing whether you believe the inputs or not. They simply assume every input is true and product results.

        If you assume only a single model, then the machinory will spit out results as if that model were the only possibility.

        If you allow for many possible models then the machinory will consider the result for each model and weight them by the total evidence for the model.

        At no point are beliefs taken into account.

      • konrad says:

        The reason the distinction is important is that Bayesian statistics is most commonly described (e.g. https://en.wikipedia.org/wiki/Bayesian_statistics) as expressing evidence in terms of degrees of belief. This, as Andrew pointed out in the OP, is misleading because the word “belief” already has a well-established meaning. It is perfectly fine to think of a belief state as described by a probability distribution, but most of the probability distributions we work with in practice (e.g. for expressing the evidence contained in data) do not correspond to the belief state of any actual individual. Also, it is often useful to calculate the different probability distributions resulting from different assumptions, and we do not require more than one scientist to calculate this (as would have been the case if the probability distributions had to correspond to the actual beliefs of the scientist).

        So when we want two conclusions, we don’t need two Bayesians. A single Bayesian willing to explore the implications of two different assumptions will suffice.

        • JD says:

          I really don’t see that much of a difficulty with the term “belief” being used like this in Bayesian statistical settings, because in that context it connotes the reality that even if two individuals have identical information, they will very often construct very different priors. In addition, if each individual is expressing these priors mathematically, then in addition, that is a “model”, an abstraction of each prior, and as such each mathematical expression would not be a perfectly correct representation of each personal prior.

          As a result of this process, I can see numerous shortcomings with using the term “information” to replace “belief”. “Information” also has well-established meaning. And previous usage as a technical term in statistics is *nothing* like how it would be used to replace “belief”. Perhaps casually in statistics people might use “information” interchangeably with “data”, but there are also more technical usages, e.g. as a model based quantification of “information” related by the data. Further, I don’t really know of a context where “information” takes on these unavoidably personal characteristics.

          A single Bayesian may realize the limiting nature of representing their prior in mathematics and choose to propose other plausible mathematical approximations to their prior in order to assess the sensitivity of their conclusions to the mathematical modeling of their prior beliefs.

          There is nothing implicit in using the term “belief” that prevents a Bayesian statistician from eliciting priors from other individuals and calculating their posteriors, or hypothesizing an individual with a particular prior and calculating that hypothetical person’s posterior.

          I am not at all concerned that people will hear the term “belief” and think that they can choose *any* prior they want, disregarding whatever prior data they have available to them. It is being used as a technical term in a technical context. Anyone calculating posteriors should know that priors are not chosen arbitrarily. Whereas to dismiss what “belief” connotes might lead people to conclude that everybody would quantify all past, relevant experimental evidence in exactly the same way. That just isn’t going to happen.

          • Andrew says:

            Jd:

            :%s/prior/model/g

            If the prior distribution represents “belief”, so does the family of data distributions.

            • JD says:

              Andrew,

              I accept that the choices made in mathematically modeling a likelihood are affected by personal experience. So I wouldn’t be averse to openly acknowledging that all modeling choices reflect something personal.

              The distinction, as I see it, but I am open to being convinced otherwise, is that there is something fundamentally personal in prior “belief”, separate from the choices we make in mathematically modeling that prior belief.

              I wonder if accepting both that “all models are wrong” and that modeling compromises may be affected by personal experience would make Bayesian updating incoherent. Gosh, I hope not.

              So here’s where I sense a distinction: On the one hand, a Bayesian’s prior belief today, if Bayes’ rule is accepted as the way of calculating posteriors, is always carrying around a little bit of influence from that prior they should have had when there was no prior data. So even if two Bayesians agreed on likelihoods since, each Bayesian’s “primal” prior belief would still influence their prior belief today. It seems like “primal” priors are unavoidably personal. But rather than agree exactly on how a prior belief should be updated in the presence of data, the two Bayesians should now have “primal” prior beliefs about whether to take the data “at face value”. So even if they would agree on likelihood if the experimental data could be taken “at face value”, their updated prior belief will reflect their personal assessment of biases, QRPs, etc.

              I realize that we don’t go back to our “primal prior” and try to construct today’s prior sequentially like that. We trust that we are being rational and that today’s prior belief properly reflects all of our “primal priors” and data. But I’m just trying to illustrate the distinction, as I see it, between the personal nature of “prior belief” vs the personal nature of making mathematical modeling decisions.

              JD

              • Andrew says:

                Jd:

                I see no reason to assume that two statisticians, Bayesian or otherwise, would agree on the data model, or that they would agree on what data to include in the model, or that would agree on how to preprocess the data, any more than I would assume they would agree on the prior distribution.

                It’s not about the model being “affected by personal experience”; it’s more that all aspects of the model are chosen. Choosing the model and the data that go into the model, that’s what statistics is all about. If it were automatic, we wouldn’t need statisticians in the first place!

              • konrad says:

                JD:

                I sense a fundamental disagreement, which I assume is the disagreement between subjective and objective Bayesianism.

                Your language strongly implies a one-to-one correspondence between individuals and priors: “two individuals…will very often construct very different priors”, “A single Bayesian…their prior”, “eliciting priors from other individuals”, “hypothesizing an individual with a particular prior”. This is the position we are objecting to (at least when discussing statistics rather than psychology). If we want to write down and investigate the consequences of a list of models (including priors), there is no need to hypothesize individuals who might find the models plausible, or who might (before seeing the data) have held beliefs that were well approximated by the priors used in those models.

                “I am not at all concerned that people will…choose *any* prior they want”: but people *can* and *should* choose any prior they want, and the more they choose, the better! It’s just a part of the model – if the choice of model assumptions that affect the likelihood function is going to affect the conclusion, it is wise to consider multiple alternatives. The same goes for the choice of model assumptions that affect the prior! In the end you can choose to present one set of assumptions and the posterior it implies, but it is often better to present multiple sets of assumptions and the posterior implied by each.

                “…priors are not chosen arbitrarily”: in practice, priors are often chosen for reasons of convenience or convention, which sounds pretty arbitrary to me. Ultimately, the success of a model (including its prior) depends on its performance, not on where its author found inspiration.

              • JD says:

                @Konrad

                I too sensed that our disagreement was fundamentally about what is often described as the subjective vs. objective perspectives.

                I guess two points in my perspective I want to be clear on are (because I am hoping that I’ll learn from folks challenging them):

                Individuals possess a “pure” prior belief, but that beliefs inevitably get “corrupted” by the realities of mathematical modeling. So I may disagree with some “hardcore” subjective Bayesians who might argue that performing sensitivity analyses on the choice of prior doesn’t make sense.

                “Ultimately, the success of a model (including its prior) depends on its performance…” This is another point I’m not completely settled on. The reason being is that ultimately, no experiment is “repeatable”. So if we accept that, what does “performance” even mean? For now, yes, I do evaluate and quantify the performance of models. But I do not think doing so is consistent with my perspective on probabilities. There are some “hardcore” subjective Bayesians who might argue that trying to assess the quality of an individual’s prior does not make sense. I am not quite there, but I understand their point.

                I’m compelled in forming my perspectives on the practice of statistics to start with a foundational perspective on “probabilities” which is coherent. And from what I’ve read in the philosophy of science literature on probability, it seems like “personal probability” has more appealing properties than so-called frequentist or hypothetical frequentist probability. But I’m not a philosopher, so if there are folks that can help me strengthen my foundational understanding, I’m totally receptive.

              • JD says:

                And regarding what has informed my perspective on the foundations of probability, I’m referring to, in part, some articles by Alan Hajek.

                See, for example “Fifteen Arguments Against Hypothetical Frequentism”
                http://philrsss.anu.edu.au/people-defaults/alanh/papers/fifteen.pdf

                But again, I’m not a philosopher, so I am not well equipped to evaluate the strength of those arguments or am even sure if I have a sufficient understanding of the arguments.

              • konrad says:

                JD:

                I agree that personal probability (or subjective Bayesianism) is a more appealing framework than frequentism, but those are not the only options.

                It is usually accepted that mathematics can only ever tell you the implications of a set of assumptions – if instead you want ground truth, you need to go out on a limb because mathematics will not provide guidance on what the assumptions should be. Objective Bayesianism sees probability as part of mathematics in this sense: for any set of assumptions A, the mathematical machinery will tell you its implications by providing a probability (or probability density) P(B|A) for the hypothesis of interest B. NB is that these implications are always conditional on a crisply specified set of assumptions; anything beyond that is not part of probability theory.

                Also NB is that probability distributions are functions of two inputs: the assumptions A and hypotheses B. Everyone is comfortable with treating B as a free variable, but it seems that only objective Bayesians are comfortable doing the same with A.

                To address your points:
                “Individuals possess a “pure” prior belief” – this is probably true, but I regard it as an empirical claim in the domain of psychology, and not part of probability theory or statistics. Some people agree with the claim while rejecting Bayesian statistics as a normative theory; some disagree with the claim while championing Bayesian statistics.

                “What does ‘performance’ even mean?” – this is within the non-mathematical part of statistics, but outside of probability theory. So it is something we are free to define as needed on a case-by-case basis and assess empirically. I agree that this is a tricky question for philosophy of science, but I don’t see it as a challenge for the basis of probability any more than it is a challenge for other areas of applied mathematics.

              • konrad says:

                Thanks for linking the article – I think most Bayesians are on board with the gist of these objections to frequentism, but the article does not address the question of what probability _should_ be taken to mean – and Bayesians are split into at least two camps on this. For an exposition of the objective Bayesian approach (based on early work by Cox, 1946), I highly recommend the book by Jaynes (2003). For some recent discussion on this blog, with references, search for comments by Alexandre Patriota in this thread:
                http://andrewgelman.com/2015/07/03/why-should-anyone-believe-that-why-does-it-make-sense-to-model-a-series-of-astronomical-events-as-though-they-were-spins-of-a-roulette-wheel-in-vegas/

              • JD says:

                Hi Konrad,

                Perhaps in that Alan Hajek has not yet written an article called “Fifteen Arguments Against Subjective Probability” is informative.

                At the foundations of probability, there doesn’t seem to be anything known as “objective Bayesian probability” or in the spirit of the mathematical manipulations that are performed to construct priors in the practice of objective Bayesian statistics.

                http://plato.stanford.edu/entries/probability-interpret/

                To me, this seems like a major problem with adoption of the objective Bayesian perspective. In that, if you want to do develop a coherent philosophy of science from the ground up, through to the practice of analyzing data, then I don’t see how objective Bayesian statistics will ever be the result.

                Yeah, I have read some Jaynes. Does he address the foundational issues with objective Bayesian statistics somewhere?

                JD

              • konrad says:

                I was only debating the foundation of probability, not of science. Sure, we would like to come up with an interpretation of probability that will be useful in the larger scheme of things, but I think it is enough to come up with something that is coherent and provides sensible normative guidance within its scope. If probability theory does not tell us which assumptions to make in a probability modelling context, that is not its fault any more than differential equation theory can be blamed for not telling us the same thing in a differential equation modelling context. If what you are after is a complete philosophy of science, that must surely contain more than just an answer to “what is probability?”

                When you refer to foundational issues, are they within the scope of this question, or in the broader (fuzzy) area where scientists have to decide which analyses to do? If the latter, I don’t have any answers but I do think that a key aim of science is to convince others so I certainly would not consider subjective approaches along the lines of “each scientist should pick an analysis according to their `primal` prior belief” satisfactory.

              • JD says:

                Hi Konrad,

                I think challenging me to really think about what I am saying that I require of “probability” to perform statistical inference is fair. Upon reflection, I’d take back what I said in my 9:29PM reply.

                But this “primal prior” thing I mentioned (this notion is not part of any subjective Bayes dogma, as far as I know), was simply a tool to illustrate that I have trouble believing that priors should (or can) be “objective”.

                But this lack of objectivity is in a sense that I still think is distinct from when two statisticians simply make different, but defensible, modeling assumptions.

                So as a result, I think that when statisticians elicit and use informative priors, they are practicing subjective Bayesian statistics. And that should be OK.

              • konrad says:

                Hi JD,

                I agree that priors (and model assumptions more generally) are not objective, but I do not see the distinction you are making between priors and other model assumptions. Ultimately, a statistical model is a set of equations defining a probability distribution for an observable variable. We follow certain naming conventions in referring to some of these equations as priors and others as likelihood functions, but I am not sure these conventions can even be formalized in general.

                What is (or should be) objective is how the analysis proceeds after all assumptions have been stated formally. I think the key point is that in presenting science a scientist should not present their personal conclusion (as inspired by the primal prior) in isolation. Rather, they should present one or more (objective) assumption-conclusion pair, leaving the audience to draw their own (subjective) final conclusions.

                This is not to say that the scientist cannot also argue for a particular one of the possible subjective positions (provided such argument is clearly flagged as opinion), just that the inference of conclusions from assumptions is an objective component of the larger subjective process and that this objective component can and should be presented as such.

          • Martha says:

            A prior can be based on both information and belief — so restricting the phrase “prior belief” to the case where the prior is based solely on belief makes sense to me, while using the phrase in general seems misleading. In particular, trying to distinguish between (external) information and (internal) belief ideally can help prompt both the the analyst and the reader to think carefully about which is which and have a better sense of the input into the analysis.

        • JD says:

          Konrad,

          I think your description of statistical modeling sounds a bit too reductionist for my perspective. The prior is an instrumental feature of Bayesian inference. It isn’t just another level in a hierarchy of mathematical formulas for the likelihood.

          I am not sure I know what you mean by “(objective) assumption-conclusion pair”, but guess the (objective) here is in reference to your expectation that the mathematical machinery of going from assumptions to conclusions would have to be “objective”. I don’t know if this is a point worth raising, but I use Bayes’ rule to update my prior. You probably consider that objective. But my posterior isn’t always my conclusion. Sometimes I want to calculate a point estimate. If I use a posterior mean, I am implicitly assuming a squared-error loss function. But I may have reason to use a different loss function. What do you consider objective here? The choice of loss function, probably not. Is choosing a point estimator that minimizes expected posterior loss “objective”? If so, why? Sure, it may be admissable but…

          Also, if we are agreeing that all modeling decisions are implicitly subjective (yet clearly disagree on a distinction in the nature of the subjectivity present in priors), then why do you think the scientist advocating for a particular model would need to explicitly “flag” such advocacy as “opinion”? That would be something redundant to both of our understandings of modeling. In addition to be consistent, every article that does any modeling would need to include such a disclaimer.

          Why don’t we just always elicit priors from domain experts?

          If the reader finds the expert to be a credible one, then their own prior shouldn’t be too far from that, in some sense, though in a strictly mathematical sense, perhaps we can’t say.

          If the data and code are made accessible, as they should be in the interest of open science, then the reader can (perhaps with some assistance from a computational statistician) go ahead and use that as a starting point for their own analysis.

          JD

  4. Anonymous says:

    I view beliefs as a combination of information and assumptions (where the assumptions are either made for convenience or based on some past experiences). And of course frequentists with different beliefs, by that definition, will also come to different conclusions. The main difference is not that the Bayesian will impose more outside information, but that the frequentist will usually impose more information silently.

    But then that depends on having a precise definition of ‘beliefs’.

  5. Ram says:

    It seems to me that two frequentists will only come to the same conclusion if they have the same data, the same likelihood function, the same loss function, and the same procedure for picking one admissible decision rule (e.g., minimaxity). A loose interpretation of the complete class theorems is that choosing a prior, given data, a likelihood, and a loss function, is observationally equivalent to picking one admissible decision rule. So, if we only observe the inputs the statisticians receive, and the outputs they produce, it is usually not going to be possible to tell whether they were a Bayesian or a frequentist.

  6. Steve Sailer says:

    What’s the difference between Prior Information and Prejudice?

    For example, James D. Watson felt he had “prior information,” but he was demonized for “prejudice.”

  7. Fernando says:

    Wrong question IMHO.

    What would bother me is if two statisticians using whatever method to bet on some predictable phenomenon against each other did not win at random.

  8. Chris G says:

    > If two Bayesians are given the same data, they will come to two conclusions. What do you think about that? Does it bother you?

    I’d examine their choices for p(theta) and p(y|theta). If those looked reasonable and the data appeared valid then different conclusions probably wouldn’t bother me at all. Why would they? If the conclusions were profoundly different that would certainly get my attention but it seems like a stretch that they would be.

    A little related: For the past few weeks I’ve been using bootstrapping to obtain parameter estimates and uncertainties where I’ve got a nonlinear model and non-Gaussian noise in my data. It’s great. FWIW, I tried predicting the parameter covariance matrices using Huber’s (?) recipes for robust estimation but estimates were typically low by about a factor of two – sometimes less, sometimes more like a factor of three. Not sure if the issue with the prediction was the nonlinearity (i.e., presumption of quadratic cost about the minimum of the cost function) or computation of the correction factor for the robust weighting function but the bootstrap yielded results which are self-consistent.

  9. hjk says:

    Two scientists are given the same data. They will [can] fit, using least squares, two different functions to it [depending on the additional constraints they use to specify the class of functions]. What do you think about that? Does it bother you?

  10. dmk38 says:

    A slightly different take: I’d say that disagreement about the likelihood ratio or weight to be assigned new information is much more important to the “disagreement” of the Bayesians than is disagreement on priors. The significance of the disagreement, however, isn’t one peculiar to Bayesianism; it’s a problem intrinsic to the project of using evidence to address collective uncertainty or lack of knowledge generally.

    1. If Bayesians reach different conclusions only b/c of different priors– who cares? So long as they agree on likelihood for new information, they’ll converge eventually, given enough new information.

    2. If they reach different conclusions *about* the likelihood, that’s more problematic. Bayesianism tells you want to do w/ prior odds & Likelihood ratio — multiply them. It doesn’t tell you where priors come from or how likelihood ratio is assigned. If determining latter generates systematic *difference* among analysts, there will be no convergence under Bayesian information processing, and possibly even polarization, as the analysts consider new information.

    A stylized version of real world example:

    H1: “Voter id” laws will meaningfully reduce (i.e., reduce to an extent necessary to justify whatever burden the requirement places on prospective voters] the incidence of voter fraud.
    H2: “Voter id” laws won’t meaningfully reducing voter fraud — b/c there isn’t any meaningful amount of it.

    Judge D starts w/ prior odds of 100:1 in favor of H2, and 1:100 against H1.
    Judge R starts out w/ prior odds of 100:1 in favor of H1, and 1:100 against H2.

    Information: “In jurisdictions w/o voter id laws, there have been 3 reported instances of voters misrepresenting identity in 1000 elections over 50 yrs time.”

    Case 1: both judges assign the same “likelihood ratio” to the information: the probability that there would be 3 reported instances of voters misrepresenting identity in jurisdictions w/o voter-id laws is 25x more consistent with H2 than H1.

    They update and reach two different conclusions: Judge D now believes the odds against H1 are 2500:1, while judge R believes the odds are 25:1.

    So what? So long as the process admits of additional information to be presented to the judges, they’ll eventually come to same conclusion.

    Case 2: Judge D assigns the new information (only 3 instances in 1000 elections) a likelihood ratio of 25 in favor of H1 and thus, again, becomes even more confident — to tune of 2500:1 — that H1 is false.

    But Judge R draws a different inference: the low number of reported instances shows how difficult it is to *detect* voter impersonation; the low probability of detection supports the inference that the incidence is high, since those who wish to perpetrate the fraud will take advantage of the low probability of apprehension. He assigns the information a likelihood ratio of 25 in favor of H2. He thus becomes even more confident that H2 is is false– to the tune of 2500:1.

    This is a very bad situation. Because they assign different likelihoods to the information, the judges are not converging but in fact polarizing in the face of new information.

    The differences could be rooted in different understandings about how the world works. They could be based on *motivated reasoning* of a sort that induces individuals to assign evidence the weight that supports conclusions they are predisposed, ideologically or otherwise, to acccept.

    But either way, the judges common commitment to Bayesianism won’t fix the problem, b/c the likelihood ratio, like priors, are presupposed by Bayesian information processing. In theory, the judges could avail themselves of more information bearing on the likelihood ratio to be assigned the new information on the frequency of reported instances of voter fraud– but that will work only if in making that determination they share, on grounds exogenous to the Bayesian framework for weighing evidence, understandings of the probative value (likelihood) of the additional information. Ultimately, there is no escaping the need to apprise the likelihood of evidence on the basis of theories about how the world works that Bayesianism can’t suppoly.

    So I view the possibility of disagreement among Bayesians based on diverese likelihood ratios as much more problematic than disagreements based on priors. I think that means I likely disagree a bit w/ Andrew, since he ranked disagreement about priors as “more important” than disagreement about likelihoods.

    3. *But* I don’t see how any of this has to do w/ a problem distinctive of Bayesianism. Probative value of information w/r/t hypothesis is exogeneous to *any* statistical method for assessing the signficance of the information. They’ll run into the same problems, e.g., if they want to NHT here, since the issue is whether the evidence is properly construed as supporting or negating Hypothesis 1.

    (BTW, Judge R might decide just a couple of yrs later that the argument he credited before was “goofy”. But how to evaluate the probative value of what that information says about Judge R’s decisionmaking style is also something that Bayesianism won’t by itself do, but likely can be done better by anyone who thinks in Bayesian terms about how to process information or weigh evidence.)

  11. Utkonos says:

    If two frequentists are given the same data, use the same model, and do the same test with different significance levels, they will also come to different conclusions. I don’t think anybody who slavishly adheres to 0.05 because Fisher once said so, which is everybody, can use this argument to criticise Bayesian statistics.

    And in any case, I actually think it would be suspicious if two statisticians given the same data *didn’t* come to different conclusions. Otherwise, why would we need more than one statistician?

Leave a Reply