## Plain old everyday Bayesianism!

Sam Behseta writes:

There is a report by Martin Tingley and Peter Huybers in Nature on the unprecedented high temperatures at northern latitudes (Russia, Greenland, etc). What is more interesting is the authors are have used a straightforward hierarchical Bayes model, and for the first time (as far as I can remember) the results are reported with a probability attached to them (P>0.99), as opposed to the usual p-value<0.01 business. This might be a sign that editors of big time science journals are welcoming Bayesian approaches.

I agree. This is a good sign for statistical communication. Here are the key sentences from the abstract:

Here, using a hierarchical Bayesian analysis of instrumental, tree-ring, ice-core and lake-sediment records, we show that the magnitude and frequency of recent warm temperature extremes at high northern latitudes are unprecedented in the past 600 years. The summers of 2005, 2007, 2010 and 2011 were warmer than those of all prior years back to 1400 (probability P > 0.95), in terms of the spatial average. The summer of 2010 was the warmest in the previous 600 years in western Russia (P > 0.99) and probably the warmest in western Greenland and the Canadian Arctic as well (P > 0.90). These and other recent extremes greatly exceed those expected from a stationary climate, but can be understood as resulting from constant space–time variability about an increased mean temperature.

As with classical p-values, these probability statements depend on an assumed model, but I agree with Sam that the expression of direct probabilities is a huge step forward from traditional practice.

1. James says:

Does the fact that they turned some of the proxies upside down concern you? See recent climate audit blog post.

• PI says:

Are you confusing the Climate Audit critique of PAGES2K with the critique of Tingley and Huybers? I don’t recall this particular criticism being leveled at the latter.

2. Entsophy says:

The essence of statistics is this: if our limited knowledge implies that for every way A can happen there are X>>1 ways B can happen, then our best guess is B. The bigger X is, the greater credence our guess has. There’s no guarantee B is true or will happen, but this is the best we can do with that limited knowledge. To do better requires that we know more.

For example, if for every sequence of a hundred coin flips which results in a mostly heads or tails, there are 10^10 sequences which results in mix of heads/tails, then our best guess is that we’ll get about as many tails as heads if we flip a coin 100 times. Doing better requires physics and additional measurements.

Similarly if polls and analysis in mid 2012 indicated that for every reasonable way in which Obama looses the election, there were 3 ways in which he wins, then our best guess it that Obama is going to win and we can say the odds of Obama loosing are 1 in 4.

The probability calculus is really just a way of getting the counts X and propagating them through the calculations correctly and consistently. So the probability calculus is just as relevant for the dice example as for the election example or the historical temperature example in this post. And it’s just as meaningful to speak of the probability of a singular event or hypothesis as it is to a repetitive one.

These are really simple ideas and it’s hard to believe they’re controversial at all, but many in the Mayo et al. crowd are going to be horrified by using the probability calculus to analyze a question like are temperatures higher now than in the past and reporting the results with a simple probability. To them a statement like “odds are at least 100 to 1 that the summer of 2010 was the warmest in the previous 600 years in western Russia” will be forever meaningless because it can’t be made objective in their minds by an always factitious “infinite repetition” thought experiment.

• phayes says:

“To them a statement like “odds are at least 100 to 1 that the summer of 2010 was the warmest in the previous 600 years in western Russia” will be forever meaningless because it can’t be made objective in their minds by an always factitious “infinite repetition” thought experiment.”

Yes… bizarre. I commented on the errorstatistics.com blog recently hoping to find out what motivates ‘philosopher-statisticians’ – especially the ‘frequentist’ types – to think such things. (I somewhat obliquely asked if they know something about our cosmos’s global topology and/or physical foundational matters which the rest of us don’t.)

• Fran says:

There is nothing wrong using Bayes’ for deductive reasoning, I tried to check the paper to see if that was the case but it is pay-walled so for what I know I could agree with the analysis. The problem comes when using a prior when no prior information is available… or using data to inform the prior as Objective Bayesians do making the prior name nonsensical in that context and the whole process a futile tautological exercise.

• we always have *some* prior information. So if you believe that Bayesian reasoning is fine when the prior can be justified, then you’re just a Bayesian with a fairly strict requirement for model checking.

Also, if you have 25 samples of a quantity, and you take 3 of them at random, and use them to construct some kind of diffuse prior for a value, and then you take 22 of them to fit a specific model, are you or are you not commiting a tautology? If not, and if the difference between this analysis and an “Objective Bayesian” analysis is small, then can’t we just say that in these contexts “Objective Bayesian” methods are simply useful approximations to a more rigorous methodology?

sure I can see that problems could occur when people over-specify their prior, or use their data to generate a fairly informative prior and then use their prior to generate a posterior using the same data and therefore under-estimate the appropriate uncertainty, but I see plenty of room for similar errors in any form of statistical inference, such as overfitting a curve using polynomials with too many terms or splines with too many degrees of freedom by least squares methods.

• Fran says:

we always have *some* prior information. So if you believe that Bayesian reasoning is fine when the prior can be justified, then you’re just a Bayesian with a fairly strict requirement for model checking.

I didn’t say that and, no, sorry but no, understanding the correctness of Bayes’ Theorem does not make me a Bayesian anymore that understanding the correctness of Calculus makes me a Newtonian.

Also, we don’t always have *some* information but, for the sake of discussion, let’s say you’re right and we do. How then will you construe a prior with such information? Let me guess… with another prior? And where would you get this time the information to construe this prior for the prior? See?

At some point in your inference process you will have to use a “non-informative” prior and, since such creature does not exists, the whole process turns into wistful thinking. And no, assuming one of the so-called “non-informative” priors does not equal to assuming a distribution for the likelihood of the data. Priors not only assume a distribution (normal, uniform… ) but numerical info (limits for the support of the prior, moments…)

Also, if you have 25 samples of a quantity, and you take 3 of them at random, and use them to construct some kind of diffuse prior for a value, and then you take 22 of them to fit a specific model, are you or are you not commiting a tautology? If not, and if the difference between this analysis and an “Objective Bayesian” analysis is small, then can’t we just say that in these contexts “Objective Bayesian” methods are simply useful approximations to a more rigorous methodology

No is not in this case, but then I will ask you why don’t you use 4 samples instead 3 for a better prior? And then why not 5 instead 4 for the same reason… And hey, why don’t you just use all your data to specify the prior? At that moment you should realize you do not need Bayes’ theorem to do any inductive inference and that once you use all your data to specify the prior you are done.

The realm of Bayes’ theorem is deductive reasoning, that is, inferences about the data you already have.

• Entsophy says:

“How then will you construe a prior with such information?”

Let me ask you a different question. What’s the probability that if you flip a coin 100 times in a row you’ll get a specific sequence of heads/tails (say all heads for example)? If you answer 1/2^100, which is the answer everyone always gives, then how did you know this? You couldn’t possibly verify this answer empirically even if you had the entire life span of the universe to do it. Whatever method we used to determine this probability, it couldn’t possible involve real frequencies and we must have arrived at it by some other process.

So if this “other process” can be used to successfully derive sample distributions, then why can’t this other process be also used to derive prior distributions?

• Fran says:

Well, I’d first estimate the MLE probability p of getting heads with that coin by tossing the coin n times before the universe ends and my estimation for the probability you asks for would be $p^100$.

But anyhow, let me ride on the spirit of your question and imagine that we are asked for the probability of getting 1 trillion different coins heads where each coin might have a different p… Then my answer would be it can’t be done whereas a Bayesian answer would be “1/2^10^12 and let me update this with data” which, given the size of the experiment, the estimation will basically remain there forever.

So what is best; to simply state you just can’t do it and it can’t be known or return an everlasting estimation of 1/2^10^12 when you have absolutely nothing to support this statement? Well, I’d rather know what I don’t know instead imagining things.

• Fran says:

edit: The “it can’t be done” actually is “it will stay p=0 forever” just as the Bayesian p=1/2^10^12 will stay there as well. But again, I’d rather stick with what data tells me that with someone else imagines.

• Entsophy says:

Uh Fran, you don’t need to go to a trillion. Even n=100 leads to numbers like 2^100 which are completely impossible to verify. There is not a Statistician alive who would hesitate to assign a probability of 1/2^100 for each sequence and then proceed to derive confidence intervals and whatnot. This isn’t hypothetical either. When a Casino hires Statisticians to create statistical methods for testing when someone is cheating they assume exactly these kinds of probability distributions even though they’re impossible to verify.

In reality 2^100 is so large the true frequency for most n=100 sequences in the history of universe will be zero, so to the extent that we have any “data” at all it contradicts the probability assignment. Nevertheless everyone just assigns 1/2^100 and proceeds with the calculations.

The truth is we assume these sampling distributions because our state of knowledge is symmetric with respect to interchanges of sequences and we get away with it because we only ask “easy questions” whose answer is not sensitive to various assumptions (for example, “what fraction are heads?”). If we can do this so successfully for sampling distributions then why can’t do exactly the same thing for priors?

• Entsophy says:

Also two other points:

(1) I’m not asking for the probability of heads, I’m asking for the probabilty of a sequence of lenth 100.

(2) Say you collect some data and conduct a test and decide Pr(heads)=.5. There is absolutely nothing about that “test” which implies, or even hints, that every sequence of length n=100 in the future will occure equally often. Indeed we have strong reason for believing this definitly wont happen.

• Fran says:

Ent:

Oh, but they assume p=1/2 in a casino because they have information about the physical properties of a coin, mainly its symmetry, so not only your state knowledge is symmetric in this case. I have found some examples where Bayesians boast about how much better Bayesian way does for estimating the p of a coin ignoring the little fact that the Bayesian kick off is right in the correct value 1/2. So yeah, it works great when you happen to hit the bulls’ eye in the prior.

But I might disagree I cannot make proper inferences about your example, since you use the same coin all the time, once I know how that coin behaves if I toss it once, and I can make inferences if I toss it twice… or n=100.

That is why I gave you an example with a trillion different coins so that you cannot even have a complete set in a lifetime (by the way ignore the edit I did, since I cannot even finished the first trillion series the only thing I can say is ‘nothing’ and the only thing the Bayesian can say is his 1/2 initial guess).

I also disagree in your point (2); estimating p=0.5 it does hint that every sequence might occur with the same frequency in the sense that this is the most likely scenario of all possible ones given the data shown by our only coin (and always having in mind that any value of p in a continuous support has an infinitesimal probability)

• Entsophy says:

Fran,

In reality most flips of length n=100 will never occur in the history of the universe. Their relative frequency is 0 in almost all cases. However our knowledge doesn’t tell us which ones will occur and which ones won’t. Our knowledge about the sequences has a symmetry property which knowing the coin is “fair” only exacerbates (i.e. examining the coin doesn’t break the symmetry – it adds to it).

So assigning 1/2^100 to each sequence is a result of this symmetry in our state of knowledge. It’s the best we can do, even though we know this couldn’t possible equal the real relative frequencies. We can use it to make a “best guess” about the next 100 flips. That “best guess” may not be correct, but it is all we can do.

You would like an “objective” method which guarantees true results, but that’s impossible given our ignorance of almost everything in the universe. What can be achieved is a method which objectively determines what our “best guess” would be given a limited state of knowledge. And that’s all we’re trying to do.

So why would such a seemingly flawed method ever work? Well because we only ask certain kinds of questions. Using the 1/2^100 probability assignment, we work out that there is a very high probability that the fraction of heads is in the interval (.25, .75). When we go to perform the experiment, we generate a sequence of 100 flips and count the number of heads. Since from the probability calculation we know that almost every sequence, no matter what specifically caused it (whether it generated by a coin or random number generator or simply picked by a human or influenced by the phases of the moon) has the property that 25 < # heads <75 then that’s usually what we observe and our prediction is usually confirmed.

The key point is that this successful prediction in no way requires that the initial probability distribution 1/2^100 correctly represent the actual relative frequencies, which is a good thing because there is absolutely no way for us to get that information. We never know the true relative frequencies of occurrence beforehand.

Also it's useful to turn the logic around. The probability calculations shows that we should be able to predict frequencies of heads for large n while still being ignorant of the physics involved and without knowing which sequences will never occur. If that turns out to be true, then great: we can make some accurate predictions without being omniscient. If it turns out to be false, then that's great too. We just discovered strong evidence for new physics.

• I’m not sure what “construe” means here, perhaps you mean construct? How will I construct the prior? I will approximate aspects of it, the same way I use Newtonian mechanics to approximate aspects of physics.

when I say that I give a length x a prior that is normal around 20cm with a stddev of 5cm I don’t necessarily mean that exactly, I mean it’s very likely somewhere between about 10cm and 30cm and more likely near the center of this distribution than the edges. occasionally I might re-do my analysis with say some kind of gamma prior with similar high probability region but with skewness to the right, to make sure that my analysis is not unduly influenced by the form of my prior. I have to do this because I acknowledge that the prior I specify is not an exact fact about the universe, it’s a stylized fact. that doesn’t bother me, f=ma is a stylized fact as well, weight = mg is a stylized fact as well, g is not truly a constant.

Sure it’s more complicated with things like logistic regression coefficients, but a logistic regression is just a stylized model anyway, stylized facts about a stylized model can still be useful.

• Fran says:

construe: to deduce by inference or interpretation; infer.

As you yourself explain you need to try this, then try that, then check everything holds and if it does not then recheck again… And eventually you reach a result that might or might not be similar to the Bayesian next door.

Come on Daniel, just do the MLE for the parameters of the likelihood distribution; you’ll be done in a blink of an eye and all your colleagues will agree with your results.

• Andrew says:

Fran:

Maximum likelihood estimation can be fine and it solves a lot of problems. But there are many other problems where it does not do the job. There’s clearly a demand for Bayesian methods—otherwise my books would not have sold so many copies! Just because you have a method that works for you, don’t be so sure it works for everyone. (I am not claiming that my methods, as they currently exist, will solve everyone’s problem; I’m merely claiming that they solve some problems.)

Anti-Bayesianism is pretty silly. It was silly 20 years ago when I was encountering it all the time, and I think it’s even sillier now that it’s an extreme minority position.

• Fran says:

Andrew:

I am not sure I would equate the demand for something to its correctness; I just checked the New-York bestsellers section in Paperback Nonfiction and the winner is PROOF OF HEAVEN, by Eben Alexander… Lot of demand for that too.

Also there are other ways to do estimation other than MLE, the moments method for instance. I still have to find a situation where I have no choice left but Bayes’ when it comes to inductive inference.

And about Anti-Bayesians being silly or not trendy, well, I believe in freedom, do you want to go Bayes? God bless your heart, but when I repeatedly see unfair attacks to the p-value, NHST or anything non-Bayesian, Blog posts attacking NHST again, and again, and again, Wikipedia being peppered with Bayesian ads in articles that sometimes hardly relate to anything Bayesian… I wonder who is being silly here.

Bayesians seems to be making a great marketing campaign everywhere about their product presenting it, not as an alternative but as a solution. So I would not be surprised if eventually most people go in that direction the same way I am not surprised PROOF OF HEAVEN is the best-seller in the non-fiction section. Whether we like it or not marketing is a powerful force… even in Science.

• Andrew says:

Fran:

Demand for Bayes does not imply the correctness of the methods. What it implies is some dissatisfaction with the alternatives. You wrote, “just do the MLE for the parameters of the likelihood distribution; you’ll be done in a blink of an eye and all your colleagues will agree with your results.” My point is, that might be fine for the problems you work on, but there are lots of problems out there where MLE does not do the job. If it did, there’d be essentially no demand for my books. It’s not about marketing, it’s about the previously existing alternatives not doing the job.

• Fran says:

Andrew:

Could you please provide us a link or give us a list of all the inductive problems than can only be properly tackled with Bayes because the previously existing alternative don’t do the job? I would be very thankful, maybe I’m missing something.

• Entsophy says:

Ok I’ll bite on that one:

Bayesian Spectrum Analysis was such a big improvement over previous Frequentist methods it was thought to be a fraud initially. Subsequently it revolutionized Nuclear Magnetic Resonance imaging:

http://bayes.wustl.edu/glb/book.pdf

Also I know a few statisticians who made a few hundred million dollars in the 80’s and early 90’s replacing older Frequentist methods for radar target discrimination with Bayesian ones:

http://www.amazon.com/Bayesian-Multiple-Target-Tracking-Library/dp/1580530249/ref=sr_1_1?ie=UTF8&qid=1367434373&sr=8-1&keywords=stone+bayesian+target

How about any situation that has highly relevant, highly informative priors, such as image reconstruction? Then MLE will be dramatically worse than the full Bayesian approach using the informative prior:

Consult almost any literature in the last 20 years on image reconstruction (it’s too big to list)

No doubt you can do just as well with some hypothesis tests, confidence intervals and unbiased estimators. Have fun with that!

• Fran says:

Ent:

Bayesian Spectrum Analysis was such a big improvement over previous Frequentist methods it was thought to be a fraud initially. Subsequently it revolutionized Nuclear Magnetic Resonance imaging: http://bayes.wustl.edu/glb/book.pdf

Ok, this book mentions this in its conclusion: “This is not to say that the actual estimates will be very different from those obtained from maximum likelihood or least squares, indeed, when little prior information is available the estimates of the parameters are the maximum likelihood estimates.” And then goes on about better computing performance. A couple of things:

1- With no information he admits he basically gets MLE but then he seems to imply that only Bayesians can use prior information for their calculations and when doing so he gets his “amazing” improvement. So, if this is the reason for the “amazing” improvement then I am not amazed at all since Bayesians are not the only ones that can use prior information for doing inference.

2- Computational performance is important, but if you get the same results with MLE haven’t you just done little else than a more efficient code for the same thing?

Also I know a few statisticians who made a few hundred million dollars in the 80′s and early 90′s replacing older Frequentist methods for radar target discrimination with Bayesian ones: http://www.amazon.com/Bayesian-Multiple-Target-Tracking-Library/dp/1580530249/ref=sr_1_1?ie=UTF8&qid=1367434373&sr=8-1&keywords=stone+bayesian+target

I don’t have the book so I cannot comment on it but getting rich with something does not necessarily mean that something is good and, in any case, maybe the previous points 1 and 2 could be applied in this case as well? Thanks for the tip though.

How about any situation that has highly relevant, highly informative priors, such as image reconstruction? Then MLE will be dramatically worse than the full Bayesian approach using the informative prior: Consult almost any literature in the last 20 years on image reconstruction (it’s too big to list)

For what I have read here http://en.wikipedia.org/wiki/Image_reconstruction they might use Markov Random Fields to describe probability dependencies among nodes. As long as the inductive part of the process is done properly for such fields (or for Bayesian Networks) the field/network themselves simply engage in a deductive process which is the appropriate use for Bayes.

No doubt you can do just as well with some hypothesis tests, confidence intervals and unbiased estimators. Have fun with that!

I can deduce from this comment you play tennis with a baseball bat.

• Entsophy says:

Fran,

You may know a great deal about standard Frequentist methods, but you consistently display a very superficial knowledge of Bayesian Statistics combined with serious ignorance about major applications of it. To take just one one example: the result that a uniform prior will reduce Bayesian estimates to the MLE is a triviality which everyone knows. You don’t need to quote someone saying this. Obviously, the methods were outperforming Frequentist methods by an order of magnitude because they were using informative priors (although there were some other technical problems which were eliminated like nuisance parameters). You can claim the same could be done with Frequentist methods, but in fact it wasn’t and as far as I know hasn’t been done even today. The reason people feared it was a fraud initially because it performed better than anyone thought any method could perform, so I doubt you’d have as easy as time duplicating it with frequentist methods as you think.

So your chief reason for rejecting these methods is because you don’t understand philosophically why they work and you think maybe/possibly/hopefully given enough time that you could come up with results just as good. Wow. At this level of play I’m sure I could win at tennis even with a baseball bat.

• Fran says:

Ent,

you consistently display a very superficial knowledge of Bayesian Statistics combined with serious ignorance about major applications of it. To take just one one example: the result that a uniform prior will reduce Bayesian estimates to the MLE is a triviality which everyone knows. You don’t need to quote someone saying this.

Well, as you say I was quoting the author so, if it is something so obvious will you call ignorant the author of the book as well for saying it in his conclusion? But let’s see how this “a uniform prior will reduce to MLE” actually happens.

Take a coin with two tails (being the tail 0 and heads 1) and let’s estimate the p for n=1,2,3,…

p MLE: 0, 0, 0, 0, 0….. infinity… 0
p Bayes: 0.5, … slow pig dance towards zero…. infinity… 0

So the uniform prior never gives you the MLE!! oh wait, yeah, after an infinity of tosses, so, basically, never, awesome.

So your chief reason for rejecting these methods is because you don’t understand philosophically why they work and you think maybe/possibly/hopefully given enough time that you could come up with results just as good.

You just are making up things, I guess this is a Bayesian side-effect.

Wow. At this level of play I’m sure I could win at tennis even with a baseball bat.

I am sure you can do many things with a baseball bat.

• Entsophy says:

It’s a folk theorem Fran, I didn’t spell out every nuance needed to get strict necessary and sufficient conditions over measurable blabbity blah or how precisely it holds with a given diffuse prior. It’s a blog comment not a treatise.

Next time a doctor recommends a life saving NMRI for you, please refuse the procedure on the grounds that the Bayesian Spectrum Analysis used to create the image is philosophically fallacious, full of priors that amount to unproven opinions and researcher’s hopes, without the verifications and guarantees that make Frequentist methods so reliable in psychology and pharmacology. Then tell them you’ll wait until a better Frequentist NMRI algorithm comes along.

That’ll show us Bayesians!

• Also the “Correctness of Bayes Theorem” is more or less a tautological statement about a purely mathematical device, namely the theory of probability, which has exactly nothing to do with anything real, just as the formula for the size of the “surface region” of an 85 dimensional sphere is a purely abstract notion, we can’t verify it using surveying instruments on a convenient sample of 85 dimensional spheres delivered to our house via Amazon Prime.

Bayes’ theorem’s “correctness” as a means of discovering information about the world is entirely an empirical question. Do we frequently get approximately correct answers when we use methods based on Bayes theorem? If so, it would seem to be an epistemically valid method of discovering information about the world. anyone who claims to have a pure and 100% correct method for discovering information about the real world is a charlatan. Information about the real world is always messy and approximate.

• Entsophy says:

There are just so many huge misconceptions here.

First, a prior P(m) will “work” if the true value of m is in the high probability manifold of P(). If you really don’t know much about the true value of m, then you can always spread P() out so much that every possible value of m is in the high probability manifold. That’s what an uninformative prior is and they definitely do exist. This not only can be done, but is done all the time. To the extent to which standard Frequentist procedures are equivalent mathematically to assuming uniform priors, they rely on this effect more than Bayesains do.

Second, there is nothing inherently bad about two different statisticians getting different answer to the same question. If the true value of m=10 and one statistician estimates “9< m < 11" while another estimates "9.9< m < 10.1" then everything is perfectly ok. I understand that Frequentist philosophy balks at the idea of having two right answers to a statistical question, but in practice there are trivially lots of right answers.

• Fran says:

These two results are “perfectly okay” for you?????

Q: Mr. Statistician, should I take this very dangerous medicine?

Bayesian One: NO! the m toxicity levels might reach the dangerous 11 value!!
Bayesian Two: Sure! No problem, at most you’re getting an m of 10.1 far away from the dangerous 11 value.

So “perfectly okay”? all right.

• Entsophy says:

I didn’t say they were equally useful, but are both literally correct statements about the true value of m. Really you’re just being ridiculous at this point if you’re going to interpret very clear statements like that.

• Fran says:

You said, and I quote

I showed you something inherently bad (death usually is) about two different statisticians getting different answers for the same question and, somehow, that makes me ridiculous? Well, explain that to the patient left out with the decision whether to take the medicine or not based on Bayesian One or Two advice.

• Entsophy says:

Seriously? read the whole paragraph again and think about the context. If m=10 then both “9< m < 11" and "9.9< m < 10.1" are correct statements. "Inherently" here means "automatically bad as a matter of principle".

My point was that Frequenitst don't like to think that way because they imagine interval estimates as some kind of physical property of the universe (i.e. they think they're modeling "randomness" of the errors) and there is only one unique answer.

If you think of interval estimates as repesenting knowledge about the true value of m, then there can be lots of correct answers. Odviously, they are not equally useful in practice. Shorter interval estimates are better assuming the true value is in the interval.

• I wish Andrew allowed about 2 more levels of nesting. I know nesting comments forever is pretty annoying, but we frequently run into these comment nesting issues on this blog.

That being said, in reply to the issue of medicine toxicity values, any decisions based purely on a threshold are almost always improperly decided. Suppose you have a disease which will most likely kill you in the next three days unless you take some medicine whose m toxicity has according to statistician 1 some nontrivial chance of reaching a dangerous level of 11. But, if you do take the medicine it has a very decent chance to completely cure your disease. This person makes a very different decision than the person who wants to take the medicine for a headache likely to go away on its own in a few hours.

But in the absence of a cost/benefit decision issue, what about the issue of why did the two statisticians get different results? It could be due to several factors:

1) They used different models for the world (likelihood is different)
2) They used the same likelihood model but different prior models
a) the more informative prior is based on an inspection of generally related valid scientific information
b) the more informative prior is not well justified by any offered argument
3) They used different data with or without the same likelihood/prior model
4) They used different computational approximations for the same data and method

analyzing these cases:

4) I think we can all get behind the idea that in important analyses, 4) shouldn’t be the problem. If we need more accurate computation, we should try to achieve it. Unfortunately sometimes this is not achievable, see predicting the climate in 100 years for an example.

3) is an issue for everyone everywhere. In general though, if one of the statistician had “better” data (larger n, more efficient experimental design, higher quality measurements etc) then that statistician is the one to listen to. I don’t think this is very controversial, except in so far as what constitutes “better” can be controversial.

2 a) if we choose our prior because it approximately summarizes valid external scientific information I don’t see that this is a problem, except in so far as we may need to investigate the validity of this approximation. If a good dataset is available, we may wish to pull it into the analysis and create a heirarchical model, but often a good dataset isn’t available only some summary information from a variety of previous analyses, this is still valid information, but analysis of the sensitivity of results to prior specification is something I think most modern Bayesians are perfectly happy with since it’s easy to acknowledge that a prior is an approximation of what we know.

2 b) if a person uses a highly informative prior and does not offer valid justification for that prior, I think it’s fair to say that this person’s results are useful only for their own purposes. You might argue that there is no such thing as an “uninformative prior” but your suggested use of maximum likelihood which is equivalent to a nonstandard (usually called “improper”) prior suggests that in some limit you don’t actually believe this.

1) This camel often passes through the eye of the needle. In many cases the “deterministic” portion of the model has potentially several alternatives. Fortunately the scientific method gives us a long-term limiting procedure for discovering approximately true models. Unfortunately in the long term we’re all dead so we sometimes need those results before they’re available.

• Entsophy says:

“Objective Bayesian” means something very different than “using data to inform the prior”. The later is just a technique which goes under the name Empirical Bayes or possibly Multilevel/Hierarchical Bayes nowadays. Since the owner of this blog, among others, has had enormous practical success using Multilevel/Hierarchical models, I don’t think you’re going to convince them very easily that “the whole process is a futile tautological exercise”.

• Fran says:

When I said “futile” I meant frivolous rather than useless. Though I believe Bayesianism, objective or otherwise, is a philosophically flawed approach to handle uncertainty it does not mean that using it equals to destroy scientific research or aids terrorists.

There are very few situations where, with the same data, a Bayesian would take a different decision that I would so, for the most part, and in a practical sense, this approach get the job done, but just like my grandma getting rid of a headache by praying did not make me believe in Baby Jesus, Bayesianism getting the job done does not make me believe it is right.

• Andrew says:

Fran:

Indeed, different methods work well for different problems. Bayesian methods have worked for me on the problems I’ve worked on, but other people have had different experiences. Much of my research involves making Bayesian methods more general and useful in (I hope) a wider range of applications.

• Entsophy says:

Right. These methods work because they accidentally correspond to valid Frequentist procedures. Just out of curiosity, can you find any Frequentist justification or explanations for why Multilevel/Hierarchial modeling works in the real examples where it’s been applied?

If not then maybe you should consider the possibility that there is a valid Bayesian expliantion for these methods, but that you just don’t understand what it is.

• Anonymous says:

From the frequentists’ view, don’t they view random effects models as being equivalent to bayesian multilevel models? Part of why they “work” may also they work similarly to to James-Stein estimation?

It’s funny that this discussion of correctness is becoming detached from what a method does.

• Entsophy says:

I though the whole point of multilevel/hierarchal modeling was to go beyond random effects modeling. Certainly such methods are successfully applied to things which couldn’t plausibly by considered random variables.

It is funny to discuss correctness detached from what a method does, but since the methods are attacked on philosophical grounds then what can you do?

• K? O'Rourke says:

The controversy _should_ be about how wrong the prior and data models were.

> just as relevant for the dice example as for the election example
Not if the considerably more challenging to represent, representation (model) is likely to be importantly wrong.

In most applied Bayesian analyses I have seen, I would only take the posterior probabilities as a rough analogy or meta-for …

Did they check for prior data model conflict or calculate the prior probabilities (i.e. use no data) of their parameters of interest.

My sense is that little of this is taught in Grad school as all the time is taken-up learn MCMC.

• Seems like some of your comment maybe got cut off? In any case I really strongly agree with you about the need for discussion of the quality of the model. Posterior checking, comparison of prior and posterior, and fitting alternative models that are plausible as well are all important aspects of good statistical analysis.

• K? O'Rourke says:

Just busier than usual.

Some different views on modeling checking out there – I like Mike Evans work on this.

You might like his views on continuous models – http://www.utstat.utoronto.ca/mikevans/papers/techrepineq.pdf

• Thanks for that link. I have been looking for info on Bayes Factors and model comparison these days anyway so it’s right up my alley. I usually find your links to papers quite useful. I know you’re an occasional contributor of blog stories here, but that writing up proper blog posts can take a lot of time. Maybe you’d consider posting just a handful of interesting links to papers once a month or something?

3. Ag says:

@james: you or mr. Audit can repeat the analysis with the particular proxies flipped. If you can demonstrate that this changes the conclusions then you should submit a comment to Nature. If not then please atleast blog about how robust you find the results to be…

4. John Mashey says:

For sure, a welcome thing.

5. […] reporting probabilities instead of p values is very Bayesian […]

6. Christian Hennig says:

“As with classical p-values, these probability statements depend on an assumed model”
Classical p-values compare the data to a thought construct. That’s not exactly an “assumption”, it’s a choice. I don’t need to claim at all that the model is true in order to compute and make sense of a p-value (although many people who use p-values unfortunately do that). Subjective probability is a choice, too. If the probability statements above are made without clear reference to subjective choices, that’s more serious, though.

• Andrew says:

Christian:

Model, thought construct, assumption, whatever. If you want to label a Bayesian model a s”subjective,” that’s fine. Just then please also label as “subjective” the model or thought construct or assumption used to create a classical p-value. You don’t need to claim a model is true to interpret a p-value or to interpret a posterior probability. But, in either case, if the model makes no sense, you’re not learning much from the probability statement being produced.

7. it seems to me that stating that the probability of an event is > 0.x is a step backwards, or a step sideways at best, not forwards. in particular, as i believe christian was saying, p-values are claims not about the probability of an event, rather, they are explicitly in reference to a null and alternative. the sentences you extracted made no reference to a probability model, rather, they simply stated the probability of an event, as if it were an objective claim. obviously, all claims are subjective, insofar as there is a ‘subject’ making them; objective claims are not even possible. so, this seems like going backwards to me.

also, “My point is, that might be fine for the problems you work on, but there are lots of problems out there where MLE does not do the job. If it did, there’d be essentially no demand for my books.”

that seems like the kinda of sloppy causal claim that you ding people for all the time. i mean, how much evidence is there for that causal claim (MLE insufficient causes demand for your book) vs. some alternative explanation/model (eg, demand is because you are really sexy)? my guess is, it is not clear what are the various causal factors for interest in your book, and any evidence in favor of one causal explanation, would be in reference to alternatives anyway, so an ‘objective’ claim about the causes forces behind your book’s popularity seem unfounded.

• Andrew says:

Joshua:

If you want to think that Bayesian Data Analysis has sold something like 20 or 30,000 books since 1995 because I’m really sexy, that’s fine with me. I mean, no, it’s not really fine with me at all. But, then again, I don’t think you really believe that. I’ve been around and seen enough to know that there really are a lot of problems that maximum likelihood doesn’t solve. Consider also the success of lasso and other penalized likelihood approaches. Surely you can’t attribute those successes to Rob T.’s sexy form? I honestly believe that if maximum likelihood could solve all problems, that Bayes and lasso would not have taken off the way they did.

• oh yeah, i totally agree with you: neither MLE nor Bayes is a panacea, that seems a priori obvious to me, and you taught me much about the problems with both through your various writings.

nonetheless, i was pointing out that you run the blog called “Statistical Modeling, Causal Inference, and Social Science”, and it seems like you made a causal claim in the absence of sound statistical evidence, or even a statistical model, about social science (why people buy a particular book).
it seemed noteworthy and somewhat ironic.