Philosophy of Bayesian statistics: my reactions to Cox and Mayo

The journal Rationality, Markets and Morals has finally posted all the articles in their special issue on the philosophy of Bayesian statistics.

My contribution is called Induction and Deduction in Bayesian Data Analysis. I’ll also post my reactions to the other articles. I wrote these notes a few weeks ago and could post them all at once, but I think it will be easier if I post my reactions to each article separately.

To start with my best material, here’s my reaction to David Cox and Deborah Mayo, “A Statistical Scientist Meets a Philosopher of Science.” I recommend you read all the way through my long note below; there’s good stuff throughout:

1. Cox: “[Philosophy] forces us to say what it is that we really want to know when we analyze a situation statistically.”

This reminds me of a standard question that Don Rubin (who, unlike me, has little use for philosophy in his research) asks in virtually any situation: “What would you do if you had all the data?” For me, that “what would you do” question is one of the universal solvents of statistics.

2. Mayo defines scientific objectivity as concerning “the goal of using data to distinguish correct from incorrect claims about the world” and contrasts this with so-called objective Bayesian statistics. All I can say here is that the terms “subjective” and “objective” seem way overloaded at this point. To me, science is objective in that it aims for reproducible findings that exist independent of the observer, and it’s subjective in that the process of science involves many individual choices. And I think the statistics I do (mostly, but not always, using Bayesian methods) is both objective and subjective in that way.

3. Cox discusses Fisher’s rule that it’s ok to use prior information in design of data collection but not in data analysis. Like a lot of hundred-year-old ideas, this rule makes sense in some contexts but not in others. Consider the notorious study in which a random sample of a few thousand people was analyzed, and it was found that the most beautiful parents were 8 percentage points more likely to have girls, compared to less attractive parents. The result was statistically significant (p<.05) and published in a reputable journal. But in this case we have good prior information suggesting that the difference in sex ratios in the population, comparing beautiful to less-beautiful parents, is less than 1 percentage point. A classical design analysis reveals that, with this level of true difference, any statistically-significant oberved difference in the sample is likely to be noise. (Even conditional on statistical significance, the observed difference has an over 40% chance of being in the wrong direction and will overestimate the population difference by an order of magnitude.) At this point, you might well say that the original analysis should never have been done at all---but, given that it has been done, it is essential to use prior information to interpret the data and generalize from sample to population. Where did Fisher's principle go wrong here? The answer is simple---and I think Cox would agree with me here. We're in a setting where the prior information is much stronger than the data. If one's only goal is to summarize the data, then taking the difference of 8% (along with a confidence interval and even a p-value) is fine. But if you want to generalize to the population---which was indeed the goal of the researcher in this example---then it makes no sense to stop there. Cox illustrates the difficulty in a later quote: "[Bayesians'] conceptual theories are trying to do two entirely different things. One is trying to extract information from the data, while the other, personalistic theory, is trying to indicate what you should believe, with regard to information from the data and other, prior, information treated equally seriously. These are two very different things." Yes, but Cox is missing something important! He defines two goals: (a) Extracting information from the data. (b) A "personalistic theory" of "what you should believe." I'm talking about something in between, which is inference for the population. I think Laplace would understand what I'm talking about here. The sample is (typically) of no interest in itself, it's just a means to learning about the population. But my inferences about the population aren't "personalistic"---at least, no more than the dudes at CERN are personalistic when they're trying to learn about particle theory from cyclotron experiments, and no more than the Census and the Bureau of Labor Statistics are personalistic when they're trying to learn about the U.S. economy from sample data. 4. Cox: "There are situations where it is very clear that whatever a scientist or statistician might do privately in looking at data, when they present their information to the public or government department or whatever, they should absolutely not use prior information, because the prior opinions on some of these prickly issues of public policy can often be highly contentious with different people with strong and very conflicting views." Maybe. But I don't think Cox even believes this statement himself if it were taken literally. For example, right now I'm working on the politically controversial problem of reconstructing historical climate from tree rings. We have a lot of prior information on the processes under which tree rings grow and how they are measured. I don't think anyone would want to just take raw numbers from core samples as a climate estimate! All the tools from Statistical Methods for Research Workers won't take you from tree rings to temperature estimates. You need some scientific knowledge and prior information on where these measurements came from. So let me interpret what I think Cox was saying. I take him to be dividing any scientific inference into two parts, inside and outside. Priors are allowed in the inside work of scientific modeling, which uses lots of external information, from the basic assumptions that the data correspond to your scientific goals, through the mathematical form of the transfer function, down to details such as an assumption of normally-distributed measurement errors, which might be supported based on prior experimental evidence. But Cox would prefer to avoid priors in the outside problem. In my example, I assume he'd allow prior information on the tree-ring measurement process---I don't see how you can get anywhere otherwise---but he'd rather not combine with external estimates of the temperature series. That's a tenable position. It doesn't avoid all the controversy---manipulations of the data model can map in predictable ways to changes in the final inferences---but it could make sense. I've followed this approach in much of my own applied work, using noninformative priors and carefully avoiding the use of prior information in the final stages a statistical analysis. But that can't always be the right choice. Sometimes (as in the sex ratio example above), the data are just too weak---and a classical textbook data analysis can be misleading. Imagine a Venn diagram, where one circle is "Topics that are so controversial that we want to avoid using prior information in the statistical analysis" and the other circle is "Problems where the data are weak compared to prior information." If you're in the intersection of these circles, you have to make some tough choices! More generally, there is a Bayesian solution to the problem of sensitivity to prior assumptions. That solution is sensitivity analysis: perform several analyses using different reasonable priors. Make more explicit the mapping from prior and data to conclusions. Be open about sensitivity, don't try to sweep the problem under the rug, etc etc. And, if you're going that route, I'd also like to see some analysis of sensitivity to assumptions that are not conventionally classified as "prior." You know, those assumptions that get thrown in because they're what everybody does. For example, Cox regression is great, but additivity is a prior assumption too! (One might argue that assumptions such as additivity, logistic links, etc., are exempt from Fisher's strictures by virtue of being default assumptions rather than being based on prior information---but I certainly don't think Mayo would take that position, given her strong feelings on Bayesian default priors.) My point here is that all statistical methods require choices---assumptions, if you will. Not all your choices can be determined or even validated from the data at hand. If you don't want your choices to be based on prior information, what other options do you have? You can rely on convention---using methods that appear in major textbooks and have stood the test of time---or maybe on theory. Both these meta-foundational approaches have their virtues but neither is perfect: Conventional methods are not necessarily good (as can be seen by noting that for many problems there are multiple conventional methods that give different results), and theory often doesn't help (for example classical confidence intervals and hypothesis tests are insufficient in the simple sex-ratio problem noted above).

17 thoughts on “Philosophy of Bayesian statistics: my reactions to Cox and Mayo

  1. Pingback: The universal solvent of statistics — The Endeavour

    • I think the hypothetical question, “What would you do if you had all the data?” is specious. In cases where some of the variables are quantitative one can never have all the data. If all the variables are categorical and the population is finite and small enough to take a reliable census then no inference is required. One just describes the population using traditional methods.

      Why is Rubin’s question deep or is the emperor missing his clothes?

      • Bradley:

        The “what would you do if you had all the data” question has been helpful to me in many statistical applications I’ve worked on over the years. It may not be helpful to you because perhaps you have already internalized the principle. But for me it is an always-helpful reminder. In regard to your point above, it is not always clear how to “describe the population using traditional methods.” First, even if one had all the possible data there is often a goal of generalizing to new settings. Second, description itself can be a challenge in high dimensions. Even in two dimensions; consider Red State Blue State.

  2. I love the discussions and just thought I would chime in something.

    I was recently reading “Objectivity” by Daston and Galison after Galison gave a talk at my University. I only had a couple days and was busy otherwise, so I didn’t get through it, but it did have some interesting points about the history of objectivity as an epistemic virtue that is relevant to this discussion.

    They write that objectivity is a fairly recent virtue, developing from the ability of scientists to take photographic images and other instruments of the sort in which the observer plays a passive role in the observation. Previously, the guiding ethic was “Truth-to-nature”, in which the scientist would try to best replicate the symmetries and common properties of the underlying properties of the observed – without the flaws and imperfections of any particular specimen. The authors then discuss the emergence of objectivity and the desire of scientists to remove the impact of the observer on the observed. They discuss the limitations of objectivity as well however, how there are unwarranted artifacts and outliers in data that judgement must be used.

    I bring this up because it seems connected to your statement:
    “If one’s only goal is to summarize the data, then taking the difference of 8% (along with a confidence interval and even a p-value) is fine. But if you want to generalize to the population—which was indeed the goal of the researcher in this example—then it makes no sense to stop there.”

    The idea one must prioritize their epistemic virtue; that being objective and being true to the underlying nature are sometimes in conflict and trained judgement is needed to bridge the gap between.

  3. Thank you for taking the time to write this. I appreciate you stating clearly the point that “inference about a population” is “between” the two goals of (a) Extracting information from the data and (b) A “personalistic theory” of “what you should believe.”

    Also you stating clearly the need for tough choices when working on thorny and thrilling problems where only weak data is available and necessarily informative priors could irrationally (or rationally) prejudice decisions about the objectivity of the analysis – and your solution of sensitivity analysis against different priors. My guess is that using a prior different than the researcher’s preferred prior will just show that no inference at all can be done – dispensing with the tiniest amount of information will prevent any work from being done, such is the skill and sound judgement and good faith of the researcher in his problem space, typically. Infinite sensitivity to a prior even minimally deficient. The benefit of your stance is a total commitment to openness, regardless.

  4. For what little it’s worth, I agree with what you’ve written!

    When you talk about sensitivity analysis, might this moves things in the direction of Wald’s Statistical Decision Theory?

  5. Insofar as humans conduct science and draw inferences, it is obvious that “human judgments” are involved in science. This is true enough, but far too trivial an observation to distinguish between accounts that manage to afford reliable ways to learn about the world and those that don’t. An objective account of inference is deliberately designed to avoid being misled, despite limitations, biases, and limited information. By being deliberately conscious of the ways biases and prejudices lead us astray, an adequate account of inference develops procedures to set up stringent probes of errors while teaching us new things. I am not saying this is true for Gelman (it isn’t), I am saying that I constantly hear over the years, but “we cannot get rid of human judgments” as a prelude for (a) throwing up one’s hands and saying so all science/inference is subjective and/or (b) regarding as false or self-deceived any claim that a method is objective in this scientifically relevant sense– in accomplishing reliable learning about the world. This is dangerous nonsense that has too often passed as a deep insight as to why scientific objectivity is impossible. In the trivial sense (science is done by humans), it is true. The idea of objectivity as the disinterested scientist (as Mike mentions, citing Galison) is indeed absurd. Objectivity demands a lot of interest–interest in not being fooled and deceived.
    Methods and procedures for objective learning exist; we should be developing more of them rather than looking for cop-outs. An account is inadequately objective, not because of containing human judgments (that it does is a tautology), but because it does not connect up with the real world adequately, readily allow ad hoc saves of our pet theories in the face of failed predictions and/or commits any number of the errors humans know all too well that they will fall into without stringent checks. You can search “objectivity” on my blog: errorstatistics.com or, even better, in my work.

    • I’m all for holding up objectivity as something to strive towards, but the fact that the choice of hypotheses that we even consider are intertwined with “biases and prejudices” means that true objectivity is an inherently unattainable fantasy.

      Maybe this is what you mean by “it is obvious that “human judgments” are involved in science”.

      • We humans would not desire an “objectivity” that was irrelevant to humans in their desire to find things out, avoid being misled, block dogmatic authoritarian positions where relevant criticism is barred, or claimed to be irrelevant or impossible. Who wants “true objectivity” (whatever that might mean, but perhaps “ask the robot” would do)when it hasn’t got a chance of producing the results we want and need to intelligently relate to the world. I really don’t get it….

  6. For both issues in the discussion, it seems to me that we need to avoid the extremes.

    In terms of objective/subjective, we can’t throw in the towel and say that “subjective humans are in the loop at every step, so there can be no objectivity”, but we also cannot say, “I used a technique that does not use the word ‘subjective’, so therefore my investigation is objective”. Every decision from what topic to investigate to what and how much data to gather, from what techniques are applicable to whether answers are physically significant and plausible, involves decisions that are made based on a researcher’s experience and agreed-upon conventions (which can vary between disciplines) rather than some objective standard. We attempt to be as objective as possible by developing and following conventions and by being open with our methodology so that others can decide for themselves if we’ve let our subjective decisions twist the result. (I wonder if subjectivity, in the “you let your personal views twist your science” sense, is somehow analogous to overfitting, and objectivity is our attempt at avoiding overfitting.)

    In terms of prior information, it seems obvious that all of those decisions I’ve mentioned, are made based on a researcher’s prior information. It’s called “knowledge”. The key is making prior information (assumptions) explicit and visible so that they may also be examined by others. Trying to claim that methodology X does not use prior information is ridiculous: everything else about the investigation uses prior information in one form or another, and methodology X also depends on many assumptions which are judged “met” based on prior/snooped information.

    In both cases, explicitly expressing prior information, assumptions, and decisions made in the investigative process is the key. It allows others to examine them for personal biases or oversights. People don’t replicate research, in general, to find arithmetic errors but rather to investigate what prior information, assumptions, and decisions might have been in error. The easier that process is, the more objective the investigation can ultimately be.

    It seems to me that the strength of the Bayesian approach is that it attempts to make key assumptions (priors) clearer, and to allow for different assumptions to be more easily tested. That’s the difference between, say, a political discussion and a scientific one: in the political discussion most participants do not make their assumptions clear and often are not aware of them, so the discussion can go round and round with no resolution because it all occurs too high up the reasoning ladder. A scientific discussion can’t occur at the first-principles level, of course, but various proposals must have their roots exposed and available for inspection.

  7. Pingback: Philosophy of Bayesian statistics: my reactions to Senn « Statistical Modeling, Causal Inference, and Social Science

  8. Pingback: Links of 2012-02-04 | Bad Simplicity

  9. Pingback: Philosophy of Bayesian statistics: my reactions to Hendry « Statistical Modeling, Causal Inference, and Social Science

  10. Not to presume to speak for David, but my take on his apparent lack of enthusiasm for formally modelling prior information as probabilities is a bit different – it seems to me to be based on an assessed benefit/cost of doing so (the gain of information at the risk of getting miss-information and all of the work that is involved by all parties to do this well.)

    This may be evinced by his writing on partial Bayes (OK to go ahead with priors just for unimportant nuisance parameters) and his verbal criticisms of full probability modeling of multiple biases ala Greenland. He seems to prefer to allow all those who are informed to have input into the formulation of design and structuring of the analysis – so that (further!) prior information becomes unnecessary or at least unimportant. This perhaps being most fully achieved in those convenient cases where higher order asymptotics allows the nuisance parameters to be made independent (and then “any” prior on these gives the “same” answer [quotes to evade technicalities].)

    That fits in with my favourite definition of frequentist statistics which is a fervent attempt to avoid formal prior probability models at (almost) any cost. Of course, one of motivations for the avoidance is a discomfort of things that can not be empirically checked (and the reluctance of many Bayesian to re-engage with such discomforts perhaps explain all that reluctance to checking priors). Newer work on prior checking may help here.

    But then there is what happens in practice, how people in the field implement Bayes and my first hand experience of most of that has been really horrible. I have difficulty explaining this to Bayesian colleagues who seem to think I am referring to subtle technical mistakes. No it is someone taking a researcher’s data set, running WinBugs with default priors, none of which or any prior is discussed with the researcher and then the percentage of the marginal credible interval for a guessed parameter of interest that is positive is conjured into _the_ relevant probability of that parameter being positive for that researcher _and_ all of their colleagues.

    They usually have a Phd in Statistics, usually not be even sometimes supervised by a very capable Bayesian and they get personally upset when I call them on this sort of thing. As an instance, there was a biologist that was thrilled and impressed that the prior contained all the background knowledge on protein structure interactions and how the human biology processed and eliminated the substance. When I asked to see that prior and pointed out that a data equivalent prior of exponential survival with one observed survival of 16 months that was used was unlikely to capture all of that – they were very embarrassed (but they had already written up the findings and discussion for their paper.)

    Anyways, no one wants their work evaluated by the worsted its ever being applied, but in choosing whether to be enthusiastic about an approach – it is worth considering.

    • O’Rourke: a couple of people asked what I thought of your remark, and I hadn’t seen it, so I came back to it. I actually am not sure I understand it though. But I do think it is good to let people speak for themselves and assume they mean what they say. Cox is very wise.

      On your last points, it’s good that you call out the Bayesians you mention on their presumed priors. I have increasingly been hearing stories like this, and helps me to understand what J. Berger and others have been speaking out against. Perhaps, in an ironic twist, some Bayesian ways are being discredited as a product of their own success (in being integrated into easy-to-run programs that make them automatic). But here’s a scary thought; will the Stat Ph.D’s, with all their technical wizardry, be able to go back (to what underlies frequentist and other methods)and think things through ? I think it might be forgotten knowledge, some of it…at least going by some (many?) textbooks nowadays.

  11. Pingback: Philosophy of Bayesian statistics: my reactions to Wasserman « Statistical Modeling, Causal Inference, and Social Science

Comments are closed.