Why waste time philosophizing?

I’ll answer the above question after first sharing some background and history on the the philosophy of Bayesian statistics, which appeared at the end of our rejoinder to the discussion to which I linked the other day:

When we were beginning our statistical educations, the word ‘Bayesian’ conveyed membership in an obscure cult. Statisticians who were outside the charmed circle could ignore the Bayesian subfield, while Bayesians themselves tended to be either apologetic or brazenly defiant. These two extremes manifested themselves in ever more elaborate proposals for non-informative priors, on the one hand, and declarations of the purity of subjective probability, on the other.

Much has changed in the past 30 years. ‘Bayesian’ is now often used in casual scientific parlance as a synonym for ‘rational’, the anti-Bayesians have mostly disappeared, and non-Bayesian statisticians feel the need to keep up with developments in Bayesian modelling and computation. Bayesians themselves feel more comfortable than ever constructing models based on prior information without feeling an obligation to be non-parametric or a need for priors to fully represent a subjective state of knowledge.

In short, Bayesian data analysis has become normalized. Our paper is an attempt to construct a philosophical framework that captures applied Bayesian inference as we see it, recognizing that Bayesian methods are highly assumption-driven (compared to other statistical methods) but that such assumptions allow more opportunities for a model to be checked, for its discrepancies with data to be explored.

We felt that a combination of the ideas of Popper, Kuhn, Lakatos, and Mayo covered much of what we were looking for – a philosophy that combined model building with constructive falsification – but we recognize that we are, at best, amateur philosophers. Thus we feel our main contribution is to consider Bayesian data analysis worth philosophizing about.

Bayesian methods have seen huge advances in the past few decades. It is time for Bayesian philosophy to catch up, and we see our paper as the beginning, not the end, of this process.

OK, now to the question of the day: Who cares? A natural question or response to all this is to declare it a waste of time. Every moment spent philosophizing is a moment not spent doing real research. Why philosophize? Why not just do?

The philosophy of statistics is interesting to me. But, beyond this, my reason for writing about it is that the philosophy of statistics can affect the practice of statistics. The connection is clearest to me in the area of model comparison and checking. Bayesians have been going around with models that don’t fit the data, not even looking for anything better, out of a belief that Bayesian models are (a) completely subjective and thus (b) to be trusted completely. This combination never made sense to me—I’d think that the more subjective a model is, the less you’d want to trust it—but it was central to the inferential philosophy that was dominant among Bayesians when I was starting out. I think that this unfortunate philosophy restricted what people were actually doing in practice. It was making them worse data analysts and worse scientists. Conversely, the Popperian philosophy of falsification encouraged me in my efforts to include model checking and exploratory data analysis into the folds of Bayesian inference (see, for example, here and here).

So, yes, conditional on not changing your methods, philosophy is at best a diversion from the real work of science. But I think philosophy is more important than that. Good philosophical work can free us from our ruts and point us toward new and better statistical methods.

And I’m not just talking about the past, it’s not just about confusions that we’ve already dispelled.

For example, one open question now is: How can an Artificial Intelligence do statistics? In the old-fashioned view of Bayesian data analysis as inference-within-a-supermodel, it’s simple enough: an AI (or a brain) just runs a Stan-like program to learn from the data and make predictions as necessary. But in a modern view of Bayesian data analysis—iterating the steps of model-building, inference-within-a-model, and model-checking—here, it’s not quite clear how the AI works. It needs not just an inference engine, but also a way to construct new models and a way to check models. Currently, those steps are performed by humans, but the AI would have to do it itself, without the aid of a “homunculus” to come up with new models or check the fit of existing ones. This philosophical quandary points to new statistical methods, for example a language-like approach to recursively creating new models from a specified list of distributions and transformations, and an automatic approach to checking model fit, based on some way of constructing quantities of interest and evaluating their discrepancies from simulated replications.

I don’t know how to do all this—it’s research!—but my point is that philosophy, even if not strictly necessary, can help, both in the negative sense of clearing away bad and confusing ideas and in the positive sense of suggesting new ways forward.

32 thoughts on “Why waste time philosophizing?

  1. I liked a comment I once read about John Dewey – roughly – “He brought the best of philosophy to bear on everyday problems”

    Perhaps has something to do with why Peirce wanted to change the idea of _practical_ in pragmatic philosophy to _purposeful_ – it is the purposefulness that matters most.

  2. I’ve long wondered about building tools to do automatic data analysis and modeling, what you call “AI statistics”. This seems like a fascinating project. Are many folks working hard on this or is it still out of reach?

  3. I think possibly the best reason is as you said- ‘the philosophy of statistics is interesting to me’.

    It might not be strictly necessary, but where is the fun in life if you boil everything down to the things that are ‘strictly necessary’. Sometimes there is no need to justify something further than being interesting.

  4. On this topic, E. T. Jaynes said:

    “It would be very nice to have a formal apparatus that gives us some ‘optimal’ way of recognizing unusual phenomena and inventing new classes of hypotheses that are most likely to contain the true one; but this remains an art for the creative human mind. In trying to practice this art, the Bayesian has the advantage because his formal apparatus already developed gives him a clearer picture of what to expect, and therefore a sharper perception for recognizing the unexpected.”

    This is from the last paragraph (p. 351) of his 1985 paper, “Highly Informative Priors.”

  5. I hope Gelman’s lack of squeamishness as regards the possible relevance of philosophy of statistics catches on. I know David Cox declared in a long ago article that a statistician without philosophical foundations is a mere technician–something to that effect. These issues are newly in flux, it seems to me. A book I am writing is in this spirit.

  6. Weird — I tend to think of statistics as applied epistemology, or applied philosophy of science, or something. As such it seems totally crucial to me to think about the philosophy part! (Not to the exclusion of actually building models, of course…)

  7. Pingback: Toward a framework for automatic model building « Statistical Modeling, Causal Inference, and Social Science

  8. >So, yes, conditional on not changing your methods, philosophy is at best a diversion from the real work of science. But I think philosophy is more important than that. Good philosophical work can free us from our ruts and point us toward new and better statistical methods.

    How can one engage in philosophical thought or discussion without being open to the possibility of seeing something in a new way? Can you even call it philosophy if you’re not? As you say, good philosophical work frees us from our ruts and helps us identify new ways of doing things. My view is that good philosophical work helps us understand how we relate with the world. Better understanding isn’t necessarily motivation to action but it should at least be motivation to consider the possibilities.

  9. Robust statistics. A subject near and dear to my heart. (Peter Huber wrote a book on the topic, http://www.amazon.com/Robust-Statistics-Wiley-Probability/dp/0470129905) In a nutshell, statistically-robust estimation methods provide estimates of parameter values which are minimally-sensitive to modest deviations between the data and the underlying presumptions of the model. For example, a statistically-robust regression method yields estimated parameter values which aren’t affected the the presence of a few outliers in the data, i.e., so long as the noise in the data is ‘mostly normal’ the details of the outliers shouldn’t affect estimated parameter values. This is in stark contrast to least-squares-based estimation, which is highly sensitive to outliers. There are – in my mind at least – philosophical considerations as to what constitutes an outlier and how to deal with it. (How many sigma is anomalous? Should you give outliers zero weight? Equal non-zero weights? Something else?) The philosophical considerations have practical consequences. (What functional form should the weighting function take? Does it need to be continuously differentiable?) Sometimes it’s vice versa.

    (How) do considerations in developing statistically-robust estimation methods fit with philosophical considerations you mention above?

    PS A ‘methods’ paper I particularly like on the subject of robust estimation:
    “Unmasking Multivariate Outliers and Leverage Points,” P. J. Rousseeuw and B. C. van Zomeren, J. Am. Stat. Assoc., vol. 85, no. 411, (Sept. 1990), pp. 633-639.
    http://www.jstor.org/discover/10.2307/2289995?uid=3739696&uid=2&uid=4&uid=3739256&sid=21101796591887

    • Chris:

      We have a chapter on robust models in Bayesian Data Analysis. We follow the usual approach of modeling so-called outliers using continuous mixtures.

      • That’s a nice article. Thanks for the reference. (The only other ‘history’ paper I can recall reading was one by Roger Koenker. I should read more.)

        >My take is that, if taken too far, robustness becomes way too wrong an approach.

        I don’t think that’s fair. Stigler concludes with a discussion about potential consequences of trimming. The potential pitfall he cites is legit but it doesn’t strike me that you’d have to exercise too much care to avoid it. Look at your data. What does it tell you? Does it provide guidance on where to set limits for your method? (Rhetorical question. If it doesn’t suggest limits then that suggests a bigger problem.)

        Stigler notes at one point: “The robust estimates could only help with variations from assumptions that the scientists had foreseen… As models grew more complicated, so too did the question of just what “robust” meant. The potential model failures in a multivariate time series model are huge, with no consensus upon where to start.” That’s a key takeaway. A robust estimation method isn’t going to compensate for a fundamental modeling error. Robust methods are more resilient than least-squares-based methods but they’re not magic.

        Stigler comments on the utility, power, and popularity of least-squares. He’s right. It’s hard for me to imagine lsq will ever be replaced. The imperfections of least-squares-based methods are generally more than made up for by their utility. That said, what’s an M-estimator but iteratively-reweighted least-squares? With that in mind, what’s not to love about M-estimators?

        Stigler also comments on how robust estimation hasn’t caught on with multivariate models. Probably 7-8 years ago I was full of enthusiasm for applying robust methods to multivariate problems. Classic case of newbie-ism: I had a new tool in hand but not the benefit of Stigler’s (or his peers) wisdom. I’d tried on a few problems and it worked well so, obviously, I thought it was the solution to all problems. I now understand from experience the strengths and limitations of the methods – to some degree at least. In my previous position I worked applications where I needed to process on the order of 10^5 to 10^6 samples per second so processing had to be fully-automated. In addition, there was a low tolerance for false positives and the data was contaminated by time-varying non-gaussian noise. For that application, M-estimators performed quite favorably relative to lsq-based methods. They didn’t compensate for modeling error but they did drive down the false positive rate by a meaningful amount with only a modest increase in computational burden. Bottom line: Robust estimation methods can be a valuable tool – just how valuable will depend upon the problem you’re working on.

        Anyhow, thanks again for the reference to Stigler’s paper. It was a good (and instructive) read.

  10. Philosophy can have another impact apart from “changing the methods”, namely understanding what their results mean. This includes the long discussion about the meaning of p-values as well as the insight that posterior probabilities, if they are indeed interpreted as probabilities, inherit meaning from the prior, i.e., if you can’t give a proper probability interpretation for your prior, neither can you interpret your posterior (although it still can be used for regularization).

    For me, the communication of science is integral part of science, so interpreting results is as important as getting them.

    • Christian:

      I agree completely. We put huge efforts into understanding what our models are doing, but it’s not always clear to people that this is an important part of statistical practice and research, it’s not merely what we do to explain models to students.

      To put it another way: scientific communication is important, not just for for me to communicating to others but also for me to communicate to myself.

  11. “How can an Artificial Intelligence do statistics? […] in a modern view of Bayesian data analysis—iterating the steps of model-building, inference-within-a-model, and model-checking—here, it’s not quite clear how the AI works.”

    My sense is you are only changing the learning tempo. E.g. Robot makes an inference the old fashioned way. If model does not fit well it will learn it eventually. Model checking is just bringing forth that eventuality by testing within sample. So yes, model checking makes sense but not sure how much of a radical departure it is. (Not even sure you want it to be a radical departure.)

    A more pedestrian but important issue is that with model checking I never know whether it is me using the computer or the other way around. It’s so manual and clunky (this is specially true in Box Jenkins stuff where you have to whiten residuals, look at residual plots, etc). I am sure much of this can be automated. If so, is that AI? Maybe but there are other possibilities that have nothing to do with Bayes or stats, like random evolution and natural selection of robots. Maybe Bayes is an end product.

    • CS Peirce often argued that creativity necessarily requires confusion first.

      Not sure how to confuse a machine.

      But with (smart) people “sharper perception for recognizing the unexpected” is a good way to get them confused.

  12. Since you’re on the topic of philosophy what do you think of this new claim?
    http://arxiv.org/abs/1212.0953
    “We argue using simple models that all successful practical uses of probabilities originate in quantum fluctuations in the microscopic physical world around us, often propagated to macroscopic scales. Thus we claim there is no physically verified fully classical theory of probability. We comment on the general implications of this view, and specifically question the application of classical probability theory to cosmology in cases where key questions are known to have no quantum answer.”

    I have a knee-jerk reaction to it, but I don’t trust my knees. Anyway, it reminded me of this: http://math.ucr.edu/home/baez/bayes.html

    • Dear Jim,

      A few comments on John Baez’s text and on some Bayesian claims that probability is the only way of modeling uncertainty:

      When we choose a number in (0,1) to represent our uncertainty on the occurrence of an uncertain event, it DOES NOT imply that this number has anything to do with probability.

      There is no theorem that guarantees that every coherent believer should use the probability rules to handle with uncertain events. Although there are many papers on the literature trying to show this:

      1. de Finetti (1931, 1937) offered one way of defining coherence based on theory of games: He created a very specific game and define (in)coherence as follows: if one lose a bet no matter what the result may come, then (s)he is being incoherent. The Dutch book argument (DBA) reads as: a coherent believer should ultimately use a measure that respects the rules of probability. Other authors follow the same line of though (the created games are very special type of games).

      Of course that this argument (DBA) is fallacious, since it is quite trivial to define another type of game in such a way that it is possible to write a Dutch book against players that use the probability rules (see for instance Waidacher, 1997).

      2. Richard Cox (1946) justified the probability axioms via desiderata. He starts from some mild desiderata and by a theorem he arrives at the probability rules, however he does not explicit all assumptions he made. It should be said that de Finetti (1931) anticipated the proof of the Cox’s Theorem and moreover he explicitly states the strictly increasing assumption, see the forth condition on the top of page 321. DeGroot (2004) made similar developments considering similar assumptions.

      Here, to be coherent is not really necessary to restrict our uncertain measure to respect the strict increasing assumption (coherence can be achieved by imposing just non-decreasing). Many different measures that do not hold the strict increasing condition can be considered to be coherent. (Un)fortunately there is not a unique way of defining coherence.

      References:

      de Finetti, B. (1931). Sul significato soggettivo della probabiliè, Fundamenta Mathematicae, 17, 298–329.

      de Finetti, B. (1937). Foresight: Its logical laws, its subjective sources, in Kyburg and Smoker (1980), 53–118.

      DeGroot, M.H. (2004). Optimal Statistical Decisions, Wiley Classics Library.

      Waidacher, C. (1997). Hidden assumptions in the Dutch book argument, Theory and Decision, 43, 293–312.

  13. Pingback: Friday links: new small-school ecology blog, job skills advice for grad students, #sciencepickup lines, Dawkins for Pope odds, and more | Dynamic Ecology

  14. Perhaps relatedly, there’s Kurt Lewin: “There is nothing so practical as a good theory.” (Lewin, K. (1951) Field theory in social science; selected theoretical papers. D. Cartwright (ed.). New York: Harper & Row. p. 169)

  15. The position that thinking philosophically is a waste of time is itself a (self-defeating) philosophical position. Also, philosophical research *is* real research :)

Comments are closed.