Philosophy of Bayesian statistics: my reactions to Hendry

Posted on February 7, 2012 9:45 AM by Andrew

Continuing with my discussion here and here of the articles in the special issue of the journal Rationality, Markets and Morals on the philosophy of Bayesian statistics:

David Hendry, “Empirical Economic Model Discovery and Theory Evaluation”:

Hendry presents a wide-ranging overview of scientific learning, with an interesting comparison of physical with social sciences. (For some reason, he discusses many physical sciences but restricts his social-science examples to economics and psychology.)

The only part of Hendry’s long and interesting article that I will discuss, however, is the part where he decides to take a gratuitous swing at Bayes. I don’t know why he did this, but maybe it’s part of some fraternity initiation thing, like TP-ing the dean’s house on Halloween.

Here’s the story. Hendry writes:

‘Prior distributions’ widely used in Bayesian analyses, whether subjective or ‘objective’, cannot be formed in such a setting either, absent a falsely assumed crystal ball. Rather, imposing a prior distribution that is consistent with an assumed model when breaks are not included is a recipe for a bad analysis in macroeconomics. Fortunately, priors are neither necessary nor sufficient in the context of discovery.

I could just laugh this off—but as someone who has published two books and hundreds of articles on applied Bayesian statistics, I think I’ll take Hendry seriously.

Let me start with the tone. I generally don’t like when people take words or phrases that you disagree with them and put them in quotes. If you’re going to put “prior distributions” and “objective” in quotes, then please show the same disrespect to your other terms: “falsely” . . . “crystal ball” . . . “breaks” . . . “recipe” . . . “macroeconomics” . . . “discovery.”

But let me get to the substance. First, Hendry’s right. No statistical method is necessary. With sufficient effort, I think you can solve all statistical problems with Bayesian methods, or with robust methods, or with bootstrapping, or with any number of alternative approaches. Fuzzy sets would probably work too. Different approaches have different advantages, but I’m sure that if Hendry adopts a self-denying ordinance and decides to never use priors, he can solve all sorts of data analysis problems. He’ll just have to work really hard sometimes. But, to be fair, there are some problems that I have to work really hard on too. In short: econometrics methods tend to require more effort in complicated settings, but they often have appealing robustness properties. It’s fair enough that Hendry and I place different values on robustness vs. modeling flexibility.

My most serious criticism with Hendry’s above paragraph is the old, old story: he’s singling out Bayesian methods and priors as being particularly bad. Meanwhile all those likelihood functions and assumptions of additivity, symmetry, etc. all just sneak in. Hendry’s standing at the back window with a shotgun, scanning for priors coming over the hill, while a million assumptions just walk right into his house through the front door.

Here’s Hendry’s summary:

The pre-existing framework of ideas is bound to structure any analysis for better or worse, but being neither necessary nor sufficient, often blocking, and unhelpful in a changing world, prior distributions should play a minimal role in data analyses that seek to discover useful knowledge.

I’m going to have to disagree. I could give a million examples of useful knowledge that can be discovered with the aid of prior distributions. For example, where are the houses in the U.S. that have high radon levels? What are the effects of redistricting? How much perchloroethylene does the body metabolize? What is public opinion on gay rights by state? Or, for a classic from Mosteller and Wallace in 1960, classify the authorship of the Federalist Papers using 1960s technology.

I’m not saying that Hendry and his colleagues need to be using Bayesian methods in his applied research. I’m not even saying that Bayesian methods are needed to solve the problems listed in the above paragraph. In practice these problems were indeed solved using Bayesian inference, but I think other approaches could get there too. What I am saying is, why is Hendry so sure that “prior distributions should play a minimal role” etc.? I’m really bothered when people go beyond the simple and direct, “I have no personal experience with Bayesian inference solving a useful problem” to prescriptive (and wrong) statements such as “prior distributions should play a minimal role.” And it’s just silly to say that priors are “unhelpful in a changing world.” I’d think an econometrician would know about time series models!

Hendry also pulls the no-true-Scotsman trick:

Fortunately, priors are neither necessary nor sufficient in the context of discovery. For example, children learn whatever native tongue is prevalent around them, be it Chinese, Arabic or English, for none of which could they have a ‘prior’. Rather, trial-and-error learning seems a child’s main approach to language acquisition: see Clark and Clark (1977). Certainly, a general language system seems to be hard wired in the human brain (see Pinker 1994; 2002) but that hardly constitutes a prior. Thus, in one of the most complicated tasks imaginable, which computers still struggle to emulate, priors are not needed.

This is a no-true-Scotsman argument because, when confronted with an example in which our brains figure things out using a pre-existing structure (not for Chinese, Arabic, or English, but for human language in general), Hendry simply says that this system that is “hard wired in the human brain . . . hardly constitutes a prior.” Huh? It’s definitely a prior. That’s the whole point: our brains are tuned to decode human language.

Why does this bug me so much about a few throwaway paragraphs in an otherwise-pretty-good-article? Hendry’s anti-Bayesian sentiments are no more clueless than those earlier expressed by, say, John DiNardo. The difference is that DiNardo was just venting his opinions and was pretty open about this, whereas Hendry’s presenting his prejudices with an air of expertise. If Hendry wants to work on “replacing unrestricted non-linear functions by an encompassing theory-derived form, such as an ogive,” then fine. His theoretical models of model selection seem interesting and could perhaps be useful. I just wish he’d cut out the part where he implicitly disparages the work of Mosteller and Wallace, Lax and Phillips, and a few zillion other researchers who’ve used Bayesian methods to solve problems.

It’s not too late for Hendry to reform (I hope). All he needs to do is to retreat to present the positive virtues of his preferred inferential approach along with his explanations as to why Bayesian methods have not seemed useful for him. He’s an econometrician, he doesn’t work in toxicology and that’s fine. I think both his positive and his negative statements would be stronger if he would be more aware of the limits of his own experience. Just as, in mathematics, a theorem is clearer if you understand the range of its applicability and the areas where there are counterexamples.

16 thoughts on “Philosophy of Bayesian statistics: my reactions to Hendry”

Wayne on February 7, 2012 2:00 PM at 2:00 pm said:

I think the key example is the idea of language learning not having priors. He simply chooses to view the hard-wiring as not a prior in the same way that he’d say that assumptions of normality, or independence, or a whole list of things, are not priors.

Heck, look at OLS linear regression, which seems to be pretty objective. Except it has five (or have I seen nine?) assumptions that must be fulfilled for it to be valid. Well, except that some of these assumptions are “close enough” assumptions, like the error being normal, since they’ll never be perfectly satisfied in the real world. And even then, the assumptions are not equally important. I’ve seen rules of thumb about homoscedasticity being a bigger deal than normality of errors, for example, but some might argue differently. Many graphical tools (ACF graphs, QQ plots) are also subject to “looks reasonable” standards.

I really can’t believe that one particular method, nested in this huge chain of “close enough” and “seems reasonable” (not to mention p values and other conventions that have no objective reality), can be criticized for using subjective or prior information. The point it so make assumptions explicit, to test assumptions, and to give enough information for others to make judgements on your chain of data/reasoning.
Manuel Moe G on February 7, 2012 10:02 PM at 10:02 pm said:

Agreeing with Wayne, and this from the post is key: “Hendry’s standing at the back window with a shotgun, scanning for priors coming over the hill, while a million assumptions just walk right into his house through the front door.”

When does a prior encroach upon a posterior, when does a prior risk over-biasing a posterior, when do prior and posterior co-mingle?

[1] the machinery of the model prevents easily/clearly separating prior from posterior – a learning system is set loose upon some data, for instance)

[2] weak data, to prevent insensible outcomes, needs a substantial prior – have weak data, and to keep from flying off the rails we must tether the result with a reasonable substantial prior

[3] Need a lot of assumptions to actually perform the inference, and we are willing to rise to the challenge of show the inference is not spurious – we have a meaty prior and we cannot dispense with _any_ or else we cannot perform inference at all

What the parts of a substantial/meaty prior?

[1] parts easily argued as objective/firmly-rationally-established

[2] parts we cannot now argue are objective/firmly-rationally-established, but will will be seen as objective/firmly-rationally-established *after* future data becomes available

[3] parts that will be established incorrect because it contradicts future/additional data

[4] parts that will be established subjective because it contradicts plausible-artificial data that can be generated in light of new understanding

[5] some residue that will never be established as either objective/firmly-rationally-established or incorrect or subjective

Even if these cannot be easily teased apart, the dedication to explicitness of Prof. Gelman is valuable, and thus it is not wasted effort to tease apart:

1) a prior, that allows inference to be made, that collects the knowledge and information of a career, that is ignorant of the specific data set under consideration (ignorant of future data as well, of course)

2) a posterior, that facilitates decision, that is ready to receive new data, and draw even sharper inference from the further abundance of new data

3) the objective part of the prior: well established, necessary to draw inference and to measure quality of inference, necessary to sensibly constrain weak data. (May be of some value: a) to demonstrate that some is necessary, b) to demonstrate how tightly some constrains the posterior, regardless of exposure to arbitrary but sensible artificial data)

4) the subjective part of the prior: *Definitely* will want to demonstrate how tightly it constrains the posterior, regardless of exposure to arbitrary but sensible artificial data. Made up of a) the subjective-but-arguable part of the prior that the author is willing to defend because the author can imagine good-faith criticisms, and b) the subjective-but-not-arguable part of the prior that the author can establish is useful in this specific case but that the author would be unwilling to defend against _future_ data made available or future understanding, and c) the residue
- Andrew on February 7, 2012 10:29 PM at 10:29 pm said:
  
  I liked that shotgun line—I’m glad somebody noticed it!
  - Gustav on February 8, 2012 2:45 AM at 2:45 am said:
    
    The shotgun line should be in some sort of “hall of fame” of Bayesian quotes.
    
    But, how many of the OLS assumptions also goes for a Bayesian approach?
MAYO on February 7, 2012 11:59 PM at 11:59 pm said:

Talk about gratuitous swings: Gellman seems to gratuitously misunderstand Hendry. The following hardly says that Bayesian methods are bad:

Rather, imposing a prior distribution that is consistent with an assumed model when breaks are not included is a recipe for a bad analysis in macroeconomics. Fortunately, priors are neither necessary nor sufficient in the context of discovery.

Hendry is one of the world experts on testing assumptions of statistical models, by the way. It’s not too late for Bayesian statisticians to reform (I hope) and see the value of the kinds of models checks developed by Hendry, and also by Aris Spanos.
- Andrew on February 8, 2012 8:06 AM at 8:06 am said:
  
  Mayo:
  
  I have no idea what Hendry meant. I only know what he wrote. Here’s what he wrote:
  
  ‘Prior distributions’ widely used in Bayesian analyses, whether subjective or ‘objective’, cannot be formed in such a setting either, absent a falsely assumed crystal ball.
  
  and
  
  “Prior distributions should play a minimal role in data analyses that seek to discover useful knowledge.”
  
  In response, I discussed how prior distributions can be helpful. I gave several examples.
  
  Nowhere did I disparage or even discuss Hendry’s methods for testing assumptions of statistical models. I’m not telling Hendry that his methods are bad. Hendry, however, is putting down what I do without acknowledging the successes of Bayesian methods. He even goes to the trouble of bringing up models of human language without recognizing the success of Bayesian methods in natural language processing.
  
  Hendry can be a world expert in statistics, that doesn’t mean he should go around criticizing others’ methods in an uninformed way.
- Manuel Moe G on February 8, 2012 4:30 PM at 4:30 pm said:
  
  Deborah G. Mayo: “It’s not too late for Bayesian statisticians to reform (I hope) and see the value of the kinds of models checks developed by Hendry, and also by Aris Spanos.”
  
  Faithful reader’s of this blog will be very familiar with Gelman’s view of the necessity of model checks:
  
  (pasting links because I use the entire Internet as an extension of my engineer’s notebook ;-) )
  
  http://www.stat.columbia.edu/~gelman/research/published/w2.pdf (Model Checking and Model Improvment, chapter for Gilks, Richardson, and Spiegelhalter book, Andrew Gelman, Xiao-Li Meng)
  
  And the most important 2 links:
  
  http://statmodeling.stat.columbia.edu/2011/04/bayesian_statis_1/ Bayesian statistical pragmatism – very important non-technical summary of Gelman’s thought
  
  … to be supplemented with Gelman and Shalizi 2010 “Philosophy and the practice of Bayesian statistics” [ http://www.stat.columbia.edu/~gelman/research/unpublished/philosophy.pdf ] for the section on Mayo’s “severe” testing of models, Section 4 “Model checking”
  
  With Gelman recently stressing the idea that the prior is composed of parts subjective and objective the value of the Bayesian discipline of language of inference snaps into focus. I am excited about using the language of Bayesian inference now, and see value in the discipline, even if my methods differ.
  
  [of no great matter, the Internet is my engineer’s notebook: The prior is broken into nearly orthogonal parts: [1] necessary-for-inference vs. sensibly-constraining-for-weak-data [2] objective vs subjective [4] firmly established in literature vs closely-argued-in-this-paper vs this-practitioner’s-art [5] of-a-perfect-experiment vs of-failure-modes-of-this-specific-experiment [6] the tractable vs the intractable ]
  
  [of no great matter: [A] Imagining the task of generating infinite counterfeit data from a “perfect experiment” 1) prior to considering the data set under consideration and 2) after. [B Flat] Imagining the task of filtering, distorting, adding noise, adding false data points, dropping false data points, applying this set of filters to the “perfect” counterfeit data to generate data comparable to quality of the description of the experiment under consideration. [B] imagining the task of developing these sets of filters 1) prior to considering the data set under consideration and 2) after. [C] imagining slicing the infinite data into chucks on the order of the data set under consideration. In the same way Kolmogorov complexity informs information theory with the specificity of a computer implementation, writing computer code in a nice functional style to do [A] [B] [C] would inform the issues of measurement error, experiment protocol, sampling error, speculation about regularity of data, etc. ]
MAYO on February 8, 2012 10:51 PM at 10:51 pm said:

You’re still overlooking what he says which concerns discovery–literally. He is discussing methods to unearth/probe something entirely new, and arguing that even if you knew nothing about the phenomenon of interest, here’s how you might begin probing, and without having to know or assume what you might find. Put aside for the moment if his method works, since you do not discuss that. I fail to see why you would want to claim, as you do, that any such method that didn’t require a prior for discovery was somehow horribly confused and in need of reform! That’s pretty extreme.
- Andrew on February 9, 2012 7:32 AM at 7:32 am said:
  
  Mayo:
  
  You write, “Put aside for the moment if his method works, since you do not discuss that. I fail to see why you would want to claim, as you do, that any such method that didn’t require a prior for discovery was somehow horribly confused and in need of reform!”
  
  But I never said that or anything close to that. I never claimed that Hendry’s statistical methods are horribly confused and in need of reform. In fact, I wrote, “No statistical method is necessary. With sufficient effort, I think you can solve all statistical problems with Bayesian methods, or with robust methods, or with bootstrapping, or with any number of alternative approaches. Fuzzy sets would probably work too. Different approaches have different advantages, but I’m sure that if Hendry adopts a self-denying ordinance and decides to never use priors, he can solve all sorts of data analysis problems.” That is, I explicitly wrote that Hendry’s methods could work just fine. My problem was not with Hendry’s statistical methods but with his bashing of what I do.
  
  When I wrote in my last paragraph of the above post that Hendry needed “reform,” I was not talking about any problem with his statistical methods, I was talking about the problem that he made sweeping claims about Bayesian methods. Hendry can be a world expert in statistics, that doesn’t mean he should go around criticizing others’ methods in an uninformed way.
boo on February 10, 2012 3:45 PM at 3:45 pm said:

While I agree with you that Hendry may be playing favourites with his priors, I think you misunderstood his comments. Hendry’s background is forecasting in economic time series and one of the major issues there is structural breaks, where the model changes between periods. These are very difficult to time and make forecasting an incredibly difficult task. Hendry is right that putting in priors that ignore these breaks is a “recipe for a bad analysis in macroeconomics.” Moreover, because the model may change in completely unpredictable ways, it is impossible to find a prior (or any prior-determining-mechanism) that works in all cases of structural change; that is, for any prior you might hold, there is a structural break out there that makes your forecasts completely wrong. But this is just my reading of Hendry, I might be wrong.

PS: Hendry is Scottish.
Yann LeCun on February 10, 2012 8:33 PM at 8:33 pm said:

> That’s the whole point: our brains are tuned to decode human language.

Actually, it’s the other way around: human language is designed (by humans themselves) to be easily decodable by the human brain.
But your main point is still valid.

Also, language us hardly “one of the most complicated tasks imaginable”. Language evolved in a very short time (perhaps 300,000 years), and is performed by a tiny part of the cerebral cortex. It can’t possibly be that complicated. We want it to be omplicated because we think of it as uniquely human: It’s what makes us human superior to other animals.

In reality, vision is enormously more complicated than language. It evolved over hundreds of millions of years, and takes
up 1/4 to 1/3 of our entire brain. Yet we take it for granted, because other animals seem to be able to do it.
Sam on February 10, 2012 8:40 PM at 8:40 pm said:

For what it’s worth, most hardcore econometricians find serious flaws in his approach, mostly having to do with the problems of failure to correct penultimate inference for pre-testing. And of course, pre-testing is the foundation of his entire approach.
- James Reade on May 4, 2012 11:26 AM at 11:26 am said:
  
  Some classic trolling here. “most hardcore econometricians find serious flaws in his approach”?
  
  No. Most casual econometricians do because he holds to very strict standards when doing empirical work. There is no-one I know who is as thorough as David, with the exception of Aris Spanos, who has also been mentioned in this thread.
  
  Thankfully others are doing a great job here of defending David against the rather unjustified attacks here by Gelman – particularly the bizarre comment about scanning the horizon for priors with a gun. There is nobody who is aware of, and takes the assumptions on which he is modelling more seriously than David.
  
  For sure, you can claim somehow Bayesian statisticians made all these great breakthroughs, but where here have you managed to show that these breakthroughs couldn’t have equally well (perhaps better since they wouldn’t have been distorted by some prior) by a classicalist/frequentist?
  - Andrew on May 4, 2012 11:49 AM at 11:49 am said:
    
    James:
    
    You ask, “where here have you managed to show that these breakthroughs couldn’t have equally well (perhaps better since they wouldn’t have been distorted by some prior) by a classicalist/frequentist?”
    
    Indeed, I don’t claim that! Here’s what I wrote:
    
    No statistical method is necessary. With sufficient effort, I think you can solve all statistical problems with Bayesian methods, or with robust methods, or with bootstrapping, or with any number of alternative approaches. Fuzzy sets would probably work too. Different approaches have different advantages, but I’m sure that if Hendry adopts a self-denying ordinance and decides to never use priors, he can solve all sorts of data analysis problems. He’ll just have to work really hard sometimes. But, to be fair, there are some problems that I have to work really hard on too. In short: econometrics methods tend to require more effort in complicated settings, but they often have appealing robustness properties. It’s fair enough that Hendry and I place different values on robustness vs. modeling flexibility.
    
    I don’t have a problem with Hendry using different methods than I do. I have a problem with him attacking Bayesian methods, with him associating priors with “a falsely assumed crystal ball” and for him to say that “prior distributions should play a minimal role in data analyses that seek to discover useful knowledge.” Damn straight I’m going to react to that sort of statement.
Dani on February 11, 2012 5:13 PM at 5:13 pm said:

I will comment on what boo said. Specifically “Moreover, because the model may change in completely unpredictable ways, it is impossible to find a prior (or any prior-determining-mechanism) that works in all cases of structural change; that is, for any prior you might hold, there is a structural break out there that makes your forecasts completely wrong. ”

I believe Hendry ignores the great advances provided by Bayesian NonParametrics, which have been used quite effectively for change-point and regime-switches problems. A Bayesian Nonparametric approach has also the big advantage that does not restrict the number of changes “ex ante” as Hendry and his pals econometricians do. About the topic of finding a prior that incorporates the possibility of “unpredictable” changes, I point Boo to a nice essay by Sandy Zabell “Predicting the Unpredictable”.
boo on February 14, 2012 4:52 AM at 4:52 am said:

While I can see that Bayesian Nonparametrics (BN) is all the rage in statistics right now, I disagree that it could successfully tackle the issue of unpredictable changes. With a bit of creativity, you can always trump any prediction by tweaking the assumptions (even BN makes assumptions, no?). I agree with Andrew that many approaches can successfully tackle a given problem, including what “Hendry and his pals econometricians” do. In that regard, I would just like to make a correction, econometricians do not restrict the number of regime shifts. That might have been true for some (not all) models a few decades ago, but not today. Thanks for recommending the paper.

Comments are closed.