Jason Yamada-Hanff writes:

I’m a Neuroscience PhD reforming my statistics education.

I am a little confused about how you treat confidence intervals in the book and was hoping you could clear things up for me. Through your blog, I found Richard Morey’s paper (and further readings) about confidence interval interpretations. If I understand correctly, the main idea is that it is not clear that anything can be said about any given CI—that is, interpreting an individual CI is a mistake because all of the relevant properties are in the long-range behavior of the CI-generating procedure, not the CIs themselves. Andrew has written various posts addressing the common misinterpretations.

What I don’t quite get is that there are various recommendations in your regression book [with Jennifer Hill] that do seem to advocate interpreting individual CIs—indeed, what else is possible if you decide to calculate one? For example, you outline how to generate CIs in the early part of the book and routinely judge whether a given regression coefficient indicates an effect by looking at whether +/-2*SE crosses 0.

How do I align these interpretations with the scolding from Morey and you to not interpret CIs that way? I imagine I’m just missing some subtlety, but I can’t see it. It wouldn’t be the first time in stats! Perhaps the coefficient estimate/standard errors can be thought of as Bayesian posteriors, even though they aren’t calculated that way?

I replied: Jennifer and I are working on the 2nd edition of our book (volume 1 scheduled to come out at end of 2017, volume 2 in 2018) and we are removing all references to statistical signficance. Also we’re discussing a bit the role of prior distributions and the ways in which conf intervals can be misinterpreted. Our thinking has changed a lot in the past 10 yrs (and I take a lot of the blame for the stat siginf stuff in our book, as Jennifer was always skeptical).

Jason then asked:

Just so I understand you correctly, you are basically disclaiming the interpretation of CIs in this first edition, yes?

Without giving away the farm on the second edition, is there a quick version of a good way to interpret the standard errors? As the standard deviation of the distribution for the coefficient estimate, the standard error does seem to be some sort of a measure of precision… although what sort is not so clear.

And I replied: s.e. represents variation in the point estimate, but conf interval has usual Bayesian interpretation only with flat prior.

“it is not clear that anything can be said about any given CI”

But for “standard problems” (see Mueller-Norets 2016, Econometrica 2016, p. 2185), a realized confidence interval does have an interpretation. The concept is “bet-proof”. Here is my own attempt to summarize, in a comment on an earlier entry on Andrew’s blog:

http://andrewgelman.com/2016/11/26/reminder-instead-confidence-interval-lets-say-uncertainty-interval/#comment-353769

The real puzzle for me is why this isn’t more widely known. Getting people to stop using the wrong interpretation is easier if they have a legit interpretation they can use instead.

Link to Mueller-Norets:

https://www.princeton.edu/~umueller/cred.pdf

Mark:

I haven’t read that particular paper but, yes, it’s well known that for the normal distribution with flat prior (or problems that are close to that, in some way), that confidence intervals can be interpreted Bayesianly. But in many many settings the flat prior makes no sense (consider that notorious estimate that early childhood intervention raises adult earnings by 42% with conf interval something like [2%, 82%]). One trouble with confidence intervals is that many theoretically-trained statisticians consider the confidence interval to be a general principle rather than as useful idea that works in some settings.

Well, there is a link to a Bayesian interpretation – see the paper, the discussion is only 1 page long – but for the “bet-proof” interpretation of realized frequentist CIs there is no need to bring in flat priors or any other Bayesian apparatus.

I think you could even teach it to undergraduate students working from a frequentist textbook and who haven’t even seen Bayes’ Theorem.

Mark:

You don’t need Bayes’ theorem to prove these things, but the point is that the “bet-proof” thing only works with a flat prior or something mathematically equivalent to it. Thinking in terms of Bayesian prior distributions is just a way to understand this. In the early-childhood-intervention example, a flat prior doesn’t make sense, and the classical confidence intervals are not bet-proof. For example it would be foolish for someone to bet that a future replication study would show a 50/50 chance of showing a difference of more than 42%. It would definitely be possible to show this sort of thing in a frequentist manner without Bayes’ theorem—that’s pretty much what John Carlin and I did in our article on type M and type S errors. To make progress in more complicated problems, though, I find the Bayesian formulation to be very helpful.

I think you may be wrong about bet-proofness failing for the early childhood example. As I read Mueller-Norets, this example satisfies their definition of a “standard problem”, and if so, then the realized CI will be bet-proof.

That’s not to say that the frequentist setup is or isn’t a sensible one to use in that application. But the frequentist realized CI will be bet-proof, I think. Happy to be corrected if I’m wrong on this.

> but for the “bet-proof” interpretation of realized frequentist CIs there is no need to bring in flat priors

To define bet-proofness you need to assume a distribution of true parameter values to calculate an expectation: “Is it possible for the inspector to be right on average with her objections no matter what the true parameter is, that is, can she generate positive expected payoffs uniformly over the parameter space?”

Please disregard the previous comment.

If I understand it right, this property doesn’t mean that one cannot make money betting against the confidence interval. It means that one cannot make money betting against the CI conditional on the true value of the parameter for each and every possible value.

But given the “distribution of true values” (which of course make no sense for a frequentist), we may bet and have a positive expectation (even if we would lose on average for some parameter values). In the simple estimation of a location parameter mu, we would only accept the confidence interval [x-2*sigma x+2*sigma] at face value if our prior for mu was flat.

I was at a machine learning conference yesterday, and one of the speakers said [roughly] “it’s nice to have a theory, but everybody uses CART all the time and there’s no theory that shows it’s optimal for anything. It’s just useful.”

1. Could the same thing be said about confidence intervals?

2. Bless your hearts, I know you and Hill will come up with something equivalently short (as p-values and confidence intervals are short).

CART = Classification and Regression Trees, AKA decision trees, AKA C4.5, C5.0, AID, CHAID, etc.

Zbicyclist:

“Useful” depends on what problem you’re working on. I don’t think those methods would be very useful for estimating public opinion in subgroups of the population, or for modeling in pharmacology, or for estimating student abilities from noisy testing data, or . . .

Certainly CART methods would not be useful for these problems. I wasn’t clear.

My point was that using a method either atheoretically (or with abuse of theory) may be one of the milder sins — more useful than not in getting to a useful conclusion — and is likely to survive until pushed out by something else relatively simple and roughly as useful. If a theory is muddled / nonexistent (CART) or misunderstood (meaning of confidence intervals) that may not matter than much in actual use.

To a certain extent, confidence interval use has pushed out some uses of p-values and significance tests (although I can’t cite data, and this is just an impression).

I’ve never quite understood why effect sizes have made so little inroads into common practice, so you can’t find them in most of the stat or analytics books. Surely, a good number of the type M errors made are because nobody bothers to think about whether the size of the effect makes sense. And comparing effect sizes makes more sense than saying “A is important (p=.048), but B is not (p=.14).

Theory does matter for reasons not being addressed in your comments. It is a truism that over time numbers as test outcomes become reified in a way that is logically unwarranted, but widespread nonetheless. So much so that the professional psychometricians who actually knew better at one time become convinced of the fallacy as well and develop convoluted methodological theories to justify their positions. Reification of these results becomes harmful to groups for which systematic prediction error, not properly researched and modeled, deflates scores and inflates predictive magnitude.

Edit:

“Reification of these results becomes harmful to groups for which systematic measurement error, not properly researched and modeled, deflates scores while concomitantly systematic prediction error inflates the magnitude of prediction.”

The easiest accurate CI interpretation I’ve heard came from Kruschke (who got if from Cox’s Principles of Statistical Inference, if I’m remembering correctly), namely that a particular CI indicates the range of values that you cannot reject with respect to the associated observed statistic. Or, as Cox puts it on p. 40, “Essentially confidence intervals, or more generally confidence sets, can be produced by testing consistency with every possible values in Omega_psi and taking all those values not ‘rejected’ at level c, say to produce a 1-c level interval or region.”

Noah:

Indeed, that definition is just fine and it also makes it clear why it is impossible in general to draw any conclusions from any particular confidence interval!

Sorry to keep hammering on about this, but what makes your statement correct is the “in general”.

If I replace that with “for standard problems” – and in practice, I think the great majority of problems are “standard” in the Mueller-Norets sense – then it is very possible indeed to draw conclusions from a realized confidence interval, namely by using the bet-proofness interpretation.

Still happy to be corrected if I’m wrong about any of this. I’ve started using it (“bet-proofness”) in my teaching and I really do want to get it right.

Mark:

“Standard” includes normal distribution (or something close to normal, some “location estimation problem”) with uniform prior (or something close to uniform; prior information weak compared to data). These models are indeed the vast majority of statistical models that anyone ever fits.

Problems for which the standard model does not apply include

– that notorious estimate that early childhood intervention raises adult earnings by 42% with conf interval something like [2%, 82%])

– all that “Psychological Science” or “PPNAS”-style research (ovulation and voting, ovulation and clothing, fat arms and political attitudes, subliminal smiley faces and voting, shark attacks, himmicanes, power pose, etc.)

– all the pharmacology and toxicology problems I’ve ever worked on

– estimating state-level opinions from national polls (Mister P)

– etc.

Here is the context for “standard”, from Mueller-Norets:

“The main result of this literature is that a set is “reasonable” or bet-proof (uniformly positive expected winnings are impossible) if and only if it is a superset of a Bayesian credible set with respect to some prior.”

“In the standard problem of inference about an unrestricted mean of a normal variate with known variance, which arises as the limiting problem in well behaved parametric models, the usual interval can hence be shown to be bet-proof.”

This definition of “standard” covers a lot, including at least some of the examples in your list as well, no?

Mark:

Rather than trying to untangle that definition, let me just say that classical confidence intervals won’t be bet-proof (in any sense that makes sense to me) unless the prior distribution is essentially flat. In all the examples I gave, classical confidence intervals can give obviously wrong answers and hence can be “busted” with an appropriate choice of bets.

Another way to get a feeling for this is to remember that confidence limits (or bands or whatever) have to calculated using sample data, not population data. Whether you use +/- x sigma or some other manner of calculation (most of which are in essence equivalent), you don’t actually know sigma. You only know its sample estimate. And you also don’t know (in the general case) the distribution of the sample estimate of sigma. You only know its sample distribution.

It’s surprising how far off those sample estimates of sigma can be. And in the worst case, for a really bad distribution, you might be looking at (for example) Chebyscheff’s limit rather than the normal factor [word play intended] of 2 sigma.

The prior distribution that Andrew has been reminding us about plays the role (in this formulation) of biasing the estimates of sigma, or at least of its distribution.

And that’s even if there is no systematic bias in the underlying measurements.

The definition of bet-proof in Mueller-Norets doesn’t require any reference to a prior. It’s very simple (I think) – can you make money (in an expected value sense) by betting on whether a realized frequentist CI contains the true value of the parameter?

But if you’re saying that using prior information enables you to “beat the House”, then sure. A bet-proof CI isn’t going to be bet-proof against a gambler who is also armed with inside information.

The point, though, is that for “standard problems”, there really is an intuitive and theoretically-sound interpretation we can give to realized frequentist CIs. Why this isn’t in the textbooks is a mystery to me.

Isn’t this coverage property the very definition of a confidence interval? For any given true value of the parameter, the 95% CIs generated will contain the true value 95% of the time. And for “standard problems” they don’t misbehave, so there is no way to distinguish the hits from the misses (assuming a non-informative prior!).

Mark:

You refer to “inside information” but I’m referring to publicly available information. In the early childhood intervention example, this includes (a) theoretical arguments from economics or sociology that easy 42% improvements are not out there for the picking; (b) the history of preschool experiments in which there was no good evidence of huge effects; and (c) statistical theory in the form of the statistical significance filter or type M errors which tells us that published estimates are biased upwards. All this information suggests that I could make good money by accepting a 50/50 bet on “true effect is less than 42%.” No inside information here at all. I’ve never even been to Jamaica (which is where the experiment was done), nor do I know any of the experimenters, nor have I seen any of the data.

Andrew:

I was using “inside information” as a metaphor, that’s all. I meant only that a statistician with more information could make money in a betting game from a less-informed statistician who simply constructed these bet-proof CIs (I’m not being precise but I think the idea is clear). Which is perfectly reasonable, but isn’t what the claim that “realized CIs have a property called bet-proofness” is about.

Carlos:

No, coverage and bet-proofness are different. Coverage is a claim about repeated realizations etc., just as you describe it. Bet-proofness is a claim about a single realization – what is the expected payoff from betting against it?

Expected payoff is by definition about repeated realizations, isn’t it?

In the paper you cited, bet-proofness (for a generator of confidence intervals, not for a particular realization) is defined looking at the payoff of all the potential betting schemes, calculating the expectation over the distibution of observed values of x conditional on a fixed value of the parameter.

If there’s a set of values that you

can’treject, doesn’t this then imply that there’s a set of (complementary) values that youcanreject (keeping in mind the problems with probabilistic modus tollens, which, granted, are serious problems)?More to the point, I think for a fair amount of social science research, delimiting sets of parameter values that are consistent or inconsistent with an observed data set is a substantive, scientific step to take. It’s (very) limited in scope, but small steps can be useful and appropriate.

You can reject the values outside the 95% confidence interval exactly in the same sense that you can reject the null hypothesis when p<0.05. Which may or may not be a substantive, scientific step to take.

Noah:

Sure, combine this information with a prior distribution and you can get somewhere. I wrote a whole book about this!

From that bet proofness paper:

> The literature on betting and confidence sets showed that a set is bet-proof at level 1−α if and only if it is a superset of a Bayesian 1−α credible set. This characterization suggests that in a search of bet-proof confidence sets, one may restrict attention to supersets of Bayesian credible sets.

Then there’s a footnote attached to that paragraph saying:

> One might also question the appeal of the frequentist coverage requirement. We find Robinson’s (1977) argument fairly compelling: In a many-person setting, frequentist coverage guarantees that the description of uncertainty cannot be highly objectionable a priori to any individual, as the prior weighted expected coverage is no smaller than 1−α under all priors

Interesting. I’ve never heard this line of reasoning before.

Peter Finch offers a descriptive interpretation of the standard error in terms of perturbations.

Finch, Peter D. “A descriptive interpretation of standard error.” Australian & New Zealand Journal of Statistics 23, no. 3 (1981): 296-299.

Unfortunately that article is behind paywall, and I am sitting in a large research University’s library!

Thanks for the post, it raised two questions:

1. You were referring to the property that a 95% confidence interval is the same as a credible interval with flat prior if the parameter is normal distributed. Thus, you kind of converted the CI into a credible interval to derive this interpretation. But if you stay strictly at frequentism, a single CI does not say anything about the parameter. Did I get this correctly?

2. CI and p-value can be directly converted (see Kruschke’s CI interpretation). If a single CI has no info about a parameter, then a single p-value does not as well. That would mean that if the nullhypothesis is true and p = .05, the chance of a type 1 error is not = 5%? it is just in the long run 5%, kind of like the 95% just work in the long run (infinite repetition).

(I really am overstaying my welcome here – sorry. Last one for today on bet-proofness.)

“But if you stay strictly at frequentism, a single CI does not say anything about the parameter.”

Not so – see the discussion and links above. A single realized frequentist CI is bet-proof (for “standard problems” – the caveat is important).

This bet proofedness argument is way too abstract. And reasoning about bets in this manner only works if the bettors are robots who have been programmed to seek certain expectation values.

I’m not a robot, and I don’t think that most people bet mainly based on classical expectation values. So these kinds of arguments, which are so abstract and don’t mesh with how people usually function, don’t help many real people get a strong, visceral understanding of the subject (or maybe I’m just extrapolating too far based on my reaction).

I don’t know – I’ve tried it out on MSc-level students a couple of times, and it seemed to go down OK. But I don’t have enough experience yet to say whether it really does work well in a classroom setting.

Still, at the very least it’s helpful to be able to say “There’s a way to interpret a realized CI, here’s how, but you need to be careful about X, Y and Z.” It’s easier to get people to avoid using the wrong interpretation if you have something positive to offer instead.

A single realized CI is not bet-proof, you could win or loss (the CI contains the true value or it doesn’t). The expected outcome, averaging over realizations, conditional on the true value of the parameter, is bet-proof (in the long run, you get the promised coverage).

Bet-proof CIs are “reasonable” (there are methods of constructing CIs with the correct coverage probability but clearly “unreasonable”, like returning the empty set with probability 5% and the whole parameter space with probability 95%). But I don’t think this allow us to say anything about the parameter. We can only say things about the interval.

(I know I said no more – sorry.)

Carlos:

Sorry, no. See the Mueller-Norets paper and discussion. A single realized CI is bet-proof in expected value terms. That is how “bet-proof” is defined, as they explain in detail (and better than I can). Different from coverage.

And indeed their paper is actually all about what to do when the usual CIs are not bet-proof. Your example of a CI that is empty 5% of the time and the whole parameter space 95% of the time is an example of a CI that has the right coverage but most definitely is not bet-proof, because you can make money betting against a realized empty CI – you know for sure the true parameter value isn’t in there. Mueller-Norets have similar examples. This is what they mean by “nonstandard problems”. It’s in the title of the paper. Worth reading!

Yes, I aknowledged that bet-proofness is better than just coverage (although in “standard” problems you get both without effort). But I don’t think it makes any sense to talk about a *expected* value for a single *realized* CI.

(My apologies – I responded above before I saw your comment here.)

What’s wrong with the expected value of a bet on a realized CI? Seems perfectly OK to me. The CI is random so a bet based on it can have an expected value (if you set out the model and assumptions etc etc) – why not?

Maybe we don’t really disagree… I appreciate that bet-proofness is some kind of “stronger”, “local” coverage guarantee that discards the “unreasonable” CIs. But what you get is “reasonable” CIs, nothing more.

In a frequentist setting there is a real, unkown value of the parameter and if you produce a 50% confidence interval it will either contain or not the true value. If I bet against you, I will either win or lose, but there is no way to calculate an expectation (the true value of the parameter is fixed, the realized value if x is now fixed as well).

In the long run (i.e. for different realizations of x) the CI will contain the true value 50% of the time and there is no betting scheme guaranteed to win (or at least get even) for every potential true value of the parameter (which is a somewhat stronger claim, but not as strong as it seems at first sight). These are frequentist properties, not about the realized observation and the corresponding CI.

Carlos:

I think I see your point now about expectations and a realized CI. That’s why we need the bet interpretation.

It’s like betting on a coin flip, right? If you bet before the flip, you can treat the flip as a random event. If you bet after the flip but before it’s revealed, in the frequentist world you can’t call the flip random – it’s either a head or a tail. But it still makes sense to talk about the expected value of the bet whether or not the bet is placed before or after the actual flip … doesn’t it?

Not sure I agree with your other point. The bet-proof interpretation refers to a specific realized CI. Can I make money (in expectation) by observing a single realized CI and then betting on it? But perhaps I am not getting your point (and/or, as you say, maybe we don’t actually disagree).

Mark:

I think that in a frequentist setting once the coin is flipped it doesn’t make sense to talk about the expected value of that particular realization (but you can still talk about the expected value of the betting scheme over realizations). What if someone else has already seen the outcome and knows already if you won or not? Does your expected value collapse to the realized value at that point? Or are we talking about a subjective expected value, unaffected as long as you don’t know the outcome yourself?

For the bet-proof interpretation, I think a more explicit formulation would be:

Could I make money (in expectation, averaging over many realizations) by observing single realized CIs and then betting on them, for any possible true value of the parameter?

This doesn’t mean that there is no betting scheme that makes money in the real world (in a frequentist setting the true value of the parameter if fixed), it means that there is no betting scheme that would make money for every and each alternative world where the parameter takes different values. If we assume that there is a probability distribution for the parameter, it may be possible to find betting schemes with a positive expected value even if they don’t have a positive expected value conditional on all the parameter values. (At least this is how I interpret the paper you cited, I could be wrong).

Carlos:

Yes, that’s right, I think we are on the same page here. Though I am not sure about putting it in terms of “alternative worlds”, at least in terms of teaching. What I’m interested in, more than anything else, is finding a formulation for students and applied researchers that lets them interpret a realized CI legitimately. I think the betting-scheme formulation is a minimum, but I’m not so sure whether putting things in terms of “alternative worlds”, or making explicit the link to the Bayesian interpretation, is really essential. But maybe it is.

Still, it’s great imho to find a legit interpretation of realized frequentist CIs that is, at least in principle, suitable for core stats or econometrics courses.

Having gotten to where you are….

Isn’t this just Wald’s Theorem? That non-dominated decision rules (betting rules) are essentially all inside the class of bayesian decision rules, and so defining a “bet proof” interval as one which includes the interval you’d use for a bayesian decision rule kind of “trades on” this fact?

Also isn’t “for any possible true value of the parameter” exactly the Bayesian uniform prior on [a,b] where a and b are the range of “possible”. So assuming that the set of possible values is finite… this all collapses to “A confidence interval in a ‘reasonable’ case is just a bayesian interval for a flat prior”

There may be “reasonable” models in which the finiteness of the prior is not necessary, and then the nonstandard prior [-N,N] for N a nonstandard integer works fine too (in other words a kind of [-infinity, infinity] interval, where the likelihood is sufficiently regular to cause convergence)

I don’t see this as “a legit interpretation of realized frequentist CIs that is, at least in principle, suitable for core stats or econometrics courses.” I see this as an attempt to subsume Bayesian intervals into Frequentist statistics… a kind of “we were wrong about Frequentism, but now we’ll pretend we’re just Bayesians with flat priors without mentioning the fact”

I think it’s better to just “teach the controversy” ;-)

I’m sympathetic to finding a positive interpretation to help people avoid wrong interpretations. But it seems to me that the bet-proofness interpretation is just as likely to slip into the common-sense Bayesian interpretation in actual use.

> Also isn’t “for any possible true value of the parameter” exactly the Bayesian uniform prior on [a,b] where a and b are the range of “possible”

Daniel: I don’t see the connection, could you be more explicit?

Carlos: “Could I make money (in expectation, averaging over many realizations) by observing single realized CIs and then betting on them, for any possible true value of the parameter?”

The scheme goes like this right? Someone observes some data, generates a 95% confidence interval. You see the interval and get to bet whether the true value is in *this interval*, if you bet against it and win you win 20x and if you bet against it and lose you lose 1.

Wald’s theorem says more or less that the betting rule you should use is of the form “pick a prior and a loss function and bet if the expected value is positive”

The loss function is pre-defined 1 loss if you lose, -20 loss if you win. The expected value is the posterior over some prior. If you have to treat all possible parameters as symmetrically possible, this is the uniform prior over the “true value” So “bet proofness” sounds like it’s just “wald’s theorem applied when you force the uniform prior on the whole space”

Let’s assume your betting scheme yields a positive expected value when you integrate out the parameter using the uniform prior (or any other prior, in fact).

This won’t make the confidence interval generating function (the process that generates CIs from observations) fail the “bet proofness” condition. For that, you need to show that your betting scheme always has non-negative expected value conditional on the parameter value.

I’m confused as to what the expectation is taken with respect to when it’s “conditional on the parameter value”

If you tell me the parameter value my betting scheme is “bet the house when the parameter value is outside the confidence interval, and bet 0 otherwise” and I win 100% of the time.

So, the expectation has to be taken with respect to a choice of a prior, since there’s nothing else that is a probability distribution. The confidence interval is observed.

if you force the choice of a flat prior… that is you force us to consider all parameter values as symmetrically possible, then there is no choice of prior so the only meaning I can make for “expected value” is “expectation for the bet over the posterior with the forced flat prior.”

Am I missing something?

It seems there is confusion as to whether the *interval itself* is bet proof, or the *interval construction process* is bet proof. Mark seems to be saying that it’s the interval itself that is bet proof, and that to me seems like the same thing as saying “a Bayesian with flat prior would construct this interval, and thanks to Wald’s theorem no one else who doesn’t have a more specific prior can do better”.

But maybe I’m missing something.

Daniel, the expectation is taken over the observations. If the parameter value is fixed (mu) we get a distribution of observations (x is a random variable, it has a probability distribution conditional on mu). For each x, you construct a confidence interval CI(x) which may or may not contain mu. The fact that you don’t know the value of mu doesn’t make it a random variable. You don’t have to consider all the parameter values as symmetrically possible. You have to consider all the parameter values as possible, and the question is whether there is a betting scheme that wins regardless of what is the true parameter value (conditioning on each of them separately, not averaging over all of them according to any particular distribution).

@Carlos: You wrote:

“Daniel, the expectation is taken over the observations. If the parameter value is fixed (mu) we get a distribution of observations (x is a random variable, it has a probability distribution conditional on mu). For each x, you construct a confidence interval CI(x) which may or may not contain mu. The fact that you don’t know the value of mu doesn’t make it a random variable. ”

This appears to have a fair amount of confusion in it (or at least is written in a confusing way.)

Here’s how I would describe the situation: You are using a model that involves a parameter mu, whose value you don’t know. You take a sample (i.e., a set of observations) that are consistent with the model (e.g., models typically involve independence assumptions as well as distributions of observations.) So the sample is a sample from the distribution (with parameter mu) in the model.

The confidence interval is constructed based on the sample that has been collected. The construction of the CI involves one or more “statistics” that are calculated from the sample. (For example, the sample mean, which is taken as an estimate of mu if mu is the mean of the distribution. The construction of the CI typically also depends on a sampling distribution (e.g., the distribution of sample means), whose distribution can be derived from the model.

So we know e.g., the sample mean that is calculated from our sample. But, if I understand correctly, the expectation involved in the betting characterization is the expectation over all possible samples that fit the model assumptions — not over the individual observations.

Martha, I’m not sure I understand your comment. I think that what you call a sample is what I called an observation (the data that is used to construct a confidence interval). The expectation involved in the bet-proof definition is over all possible samples for a fixed value of the parameter (repeated for every possible value of the parameter).

Carlos, Martha, can we start again at the bottom of the page? I’m posting something there.

Jason:

“I’m sympathetic to finding a positive interpretation to help people avoid wrong interpretations. But it seems to me that the bet-proofness interpretation is just as likely to slip into the common-sense Bayesian interpretation in actual use.”

That suits me (as a teacher) just fine. Slipping between legitimate interpretations would be a big improvement over the status quo, where people slip in great numbers into a misinterpretation of what they are doing when they use a frequentist CI procedure. And if it alerts the student to the existence of another way of thinking about what they are doing (Bayesianism), so much the better.

Here is a way that, I think, would get some of these concepts, especially for a realized data set, across in a less abstract way. You do still need to use some imagination, but it’s less abstract.

Start with a fairly concrete situation: a factory produces batches of parts. You measure them – say, their diameters – and build up a record foofhe statistics, lot by lot. You are interested in How many will be rejected because they are out of tolerance on the diameter, and also whether the mean or variability are increasing.

Ok, next, simulate this by computer. Run the little simulation say 100 times, graph each result, and observe the variations. By now you are pretty familiar with most of the features you will see in these curves, their wiggles and runs, their getting out of tolerance, etc.

Now make a few more runs. Can you answer the question “Has something changed?”. Of course, you are running the simulation and you know the answer, so you have to be as scrupulous as you can be.

Remember, your job is to say whether or not something in the process has changed, based on one or a few samples. By now, you should have seen a surprising number of results that stray farther than you would have thought (but that’s not uncommon) or otherwise have features that attract your notice. So you are in a good position to assess whether you new data is like any of the old ones or not.

In the end, you ought to realize that sometimes, you just can’t tell – about any specific instance – and that confidence limits or p-values can never be more than a guide.

With that as preparation, you can then start to consider what conclusions you could draw from a single sample (or experiment) when you *don’t* have that extensive statistical history behind it.

Really, invoking concepts such as the “usual case”, the central limit theorem, priors, and regression to the mean, are attempts to supply that missing history. If you are lucky, it will be good enough. If not, …

If it weren’t for the correspondence with credible intervals, I am convinced (almost) no one would use confidence intervals. I really do think of a confidence interval as a computationally cheap approximation that will usually (but is not guaranteed to) work for simple models.

In response to the thread with Carlos and Martha above: http://andrewgelman.com/2017/03/04/interpret-confidence-intervals/#comment-435894

Carlos said: “Daniel, the expectation is taken over the observations. If the parameter value is fixed (mu) we get a distribution of observations (x is a random variable, it has a probability distribution conditional on mu).”

Martha said: “But, if I understand correctly, the expectation involved in the betting characterization is the expectation over all possible samples that fit the model assumptions — not over the individual observations”

Here’s what I’m concerned with, there is really ONE observation: the Confidence Interval that was realized. The confidence interval comes about as (in a computational notation)

C(Sample(R(Theta)))

Where C is a confidence interval construction function that takes a fixed set of values, Sample is a sampling function that pulls a random sample from an RNG, R is the RNG and Theta is the input parameter to the RNG.

Now, I have one pair of numbers that came out of C that specifies an interval let’s call it [a,b]

Everyone agrees on the suitability of our model for how the RNG works, but only an independent judge knows what the value of Theta really was, after bets are placed, we’ll ask him or her to decide whether Theta was in or out of [a,b]

What is bet proof about [a,b] the actual pair of numbers we got when we ran this at Sun Mar 5 1:13:00 PM Pacific time and got [a,b] = [1,3]

I’m not sure if that’s a direct question, but let me repeat what I discussed previously with Mark. What is bet-proof is the procedure used to construct confidence intervals. I guess one can stretch a bit the concept and say that the CI produced is bet-proof in the sense that its constructed using a bet-proof precedure. In the same sense that we can say that the output of a RNG is a random number or that the result of an estimator is unbiased or that a CI has some coverage guarantees when these objects are realizations of random variables with some specific properties. But, strictly speaking, there are no expectations left at that point.

So my impression was that Mark had a different view, that the interval itself was somehow bet-proof with an expected value taken over something or other and I couldn’t work out from that conversation in what sense that might be true.

Here’s my take. The confidence-interval-building procedure can be said to have a property we call bet-proofness.

But the bet-proofness property refers to the expected value of a bet based on a realized CI.

So we can call the CI-building procedure bet-proof, because when it is applied to datasets it will yield realized CIs that are bet-proof.

But I think we can also call a specific realized CI bet-proof, because when faced with such a bet, we can’t expect to make money on it.

So, I think this gets at what Andrew said about “the point is that the “bet-proof” thing only works with a flat prior or something mathematically equivalent to it” and what Carlos said “But given the “distribution of true values” (which of course make no sense for a frequentist), we may bet and have a positive expectation (even if we would lose on average for some parameter values). In the simple estimation of a location parameter mu, we would only accept the confidence interval [x-2*sigma x+2*sigma] at face value if our prior for mu was flat.”

Bet proofness is nothing of the kind, it’s a misnomer like “statistical significance” A Bayesian with a prior that places the true parameter higher in the probability distribution than a uniform prior will eat the Confidence Interval construction procedure’s lunch and dinner. The only scenario where we don’t have any prior knowledge is the case where we have a cryptographically strong random number generator. Although this is of interest to cryptographers and soforth, it is meaningless for social science, biology, chemistry, ecology, climatology, oceanography, linguistics, agriculture, teaching, automatic musical transcription, reconstruction of damaged photographs, orbital mechanics, dam engineering, structural vibration control, or any of the other stuff that actually studies the way the world works.

In all of these cases, the general knowledge we have about the world applies, and it should affect our priors, and it will make it so that a Bayesian Decision Theory method using *actual prioc* information can often outperform the CI construction procedure.

But the misnomer “bet-proofness” makes it sound like that’s not the case, when in fact it’s just a very technical condition of no interest to the real world stuff we study.

> just a very technical condition of no interest to the real world stuff we study.

That’s my take – it is being taken as a good property without a proper (purposeful) assessment of good for what?

Daniel:

You’re commenting about the usefulness (or not) of frequentist CIs. And that’s fine, you are making sensible points. But the blog entry was about how to interpret CIs, not about how useful they are or how they compare to Bayesian methods. And bet-proofness, I think, gives us a way of interpreting realized frequentist CIs that is legit and intuitive.

You might even want to argue that the Bayesian link (the formal statement from Mueller-Norets is that “a set is “reasonable” or bet-proof (uniformly positive expected winnings are impossible) if and only if it is a superset of a Bayesian credible set with respect to some prior”) makes clear why you prefer the Bayesian approach in practice. Fine by me! But not what the blog entry was ostensibly about.

Intuitive to whom? I sincerely doubt that it would be intuitive to many physicians, school board members, politicians, dietitians, etc. who are “end users” of studies using statistics.

This is the problem. I’m struggling to see the value of replacing one correct-but-not-useful inteerpretation with another. In my experience, the main issue is that frequentist confidence intervals aren’t what people really want (which is a Bayesian credible interval or something very like it), so they are misinterpreted.

I’m also a bit unsure about the interpretation of confidence intervals as Bayesian intervals with a flat prior. They’re not really the same, are they? For one thing, one is derived from a probability distibution and the other isn’t. Just being the same size doesn’t make two things identical.

Simon:

It’s about interpreting realized CIs. Someone gets a CI that is [0.2,0.4] or whatever, and they want to know what it means. But what’s in the textbooks is about interpreting the CI procedure (coverage), and not about interpreting a realized CI. So people get into all kinds of trouble, laying interpretations on [0.2,0.4] that aren’t valid.

So it’s about replacing a wrong interpretation with one that’s legit, not “replacing one correct-but-not-useful inteerpretation with another”.

But I freely admit that whether it is “useful” or “intuitive” is an empirical question!

Mark, I’m not really disagreeing at all – betproofness seems a useful additional way to try to explain what CIs mean. But it seems a bit backwards – people are producing these things called confidence intervals, then womdering about what they mean, and finding that it’s not what they thought. Isn’t the appropriate response then to produce something different that IS what they want? Somehow that doesn’t seem to be happening.

Simon:

“Isn’t the appropriate response then to produce something different that IS what they want?”

Personally … I think the bet-proofness interpretation of realized CIs is useful whether you think the answer is yes or no.

If your objective is to teach the usual frequentist stuff, then it’s useful to offer an interpretation that’s legit. Definitely an improvement over the status quo, where the students either aren’t taught how to interpret realized CIs and then go on to do so themselves – and wrongly – or, even worse, they are taught a wrong interpretation.

If you are teaching this stuff in passing, on the way to or in conjunction with Bayesian statistics, then I think it’s also useful. It’s common to teach frequentist stats even if the main focus of the stats sequence is Bayesian. And the bet-proofness interpretation of frequentist CIs also has a Bayesian interpretation. (M-N: “a superset of a Bayesian credible set with respect to some prior”.)

Even if you are just teaching the usual frequentist stuff, this means it’s a opportunity to signal to the students that there’s another (Bayesian) way of doing things.

Put another way, if the only difference in stats curricula around the world were that this were taught, it’s hard for me to think of a big downside.

Mark: if someone gets a CI that is [0.2,0.4], could you explain what is the legit, intuitive interpretation that bet-proofness provides?

Does it refer to the expected value of a bet based on the realized confidence interval [0.2,0.4]? Is the bet a random variable? What is random about it?

Carlos:

I think you described the setup correctly above in an earlier comment. The distribution of the data X depends on theta. The researcher has a data realization x. The researcher constructs the CI based on the data. The bet is on whether the true theta is contained in the constructed CI.

The M-N paper is very well written with a motivating intro with multiple examples that is relatively lengthy for an econometric theory paper. They do a better job than I can of explaining what bet-proofness means. The paper appeared in Econometrica last year, the top journal for theory papers, so the technical quality is also high. Below is an extract from the intro describing the betting scheme. The formal definition is at the start of section 2.1.

http://www.princeton.edu/~umueller/cred.pdf

“Suppose an inspector does not know the true value of θ either, but sees the data and the confidence set of level 1 − α. For any realization, the inspector can choose to object to the confidence set by claiming that she does not believe that the true value of θ is contained in the set. Suppose a correct objection yields her a payoff of unity, while she loses α/(1 − α) for a mistaken objection, so that the odds correspond to the level of the confidence interval. Is it possible for the inspector to be right on average with her objections no matter what the true parameter is, that is, can she generate positive expected payoffs uniformly over the parameter space?”

Mark, unless I’m missing something they define what bet-proof means for a confidence set, which is a function of the data (i.e. the confidence interval construction procedure and not a particular realization).

You say that there is a bet-proofness interpretation of realized CIs like [0.2 0.4], but it seems that you cannot define what does it mean for a confidence interval to be bet-proof. I don’t see how it can mean anything beyond “a confidence interval that was constructed using a bet-proof procedure”.

(To be clear, I already agreed that this is somewhat more satisfying than “a confidence interval that was constructed using a procedure with some coverage guarantees” because it excludes “unreasonable” solutions.)

Carlos: yes, I think you might be missing something (or might not, could just be I am missing your point).

First, M-N use “confidence set” because theta can be a parameter vector. In the case of a scalar theta, it’s a confidence interval. (Their “confidence set” is not the set of all CIs!) But probably you knew that too.

The main point is that the choice of whether or not to bet takes place only after the realization of the data and the calculation and observation of the CI.

So I can say [0.2,0.4] is bet-proof because if I am the M-N “inspector”, and someone has a dataset, calculates the CI [0.2,0.4] and shows it to me, then I can’t make money by betting on whether theta is inside it.

Mark, I stand corrected. But only in part, because I think they don’t use the term “confidence set” consistently. In some cases it’s clearly a particular realization (v.g. in the conclusion “at least % of data draws yield a confidence set that covers the true value”). But in other cases it clearly includes all the potential realizations.

Definition 1 states what does it mean to say that “phi is bet-proof”, where phi is a rejection probability function that defines a confidence set.

Theorem 1, however, says that phi (a function of gamma and x) *is* a confidence set and proceeds to integrate x out.

The extension “a realized phi(theta,x), for particular values of theta and x, is bet-proof if the function phi(theta,x) is bet-proof” seems to be used implicitly, and as I said it seems natural to do so.

> So I can say [0.2,0.4] is bet-proof because if I am the M-N “inspector”, and someone has a dataset, calculates the CI [0.2,0.4] and shows it to me, then I can’t make money by betting on whether theta is inside it.

This is where we can’t agree. Let’s say the parameter value is 0.5 (it’s fixed!) and let’s say your betting scheme is to always bet that the interval doesn’t cover the true value. In that scenario, when you are given the CI [0.2 0.4] and you bet agains it you can make money. You will make money.

What does it mean that you can’t make money? Do you mean that you can’t make money because using this betting scheme (always against the CI) you would lose when the CI happens to include the point 0.5 and that more than offsets the gains in other cases? That seems to me a claim about the CI construction method, not about the realized CI [0.2 0.4].

And even that may not be correct. The fact that phi is bet-proof doesn’t guarantee that (given the true value 0.5) no winning betting scheme exists. As far as I can see from the definition (but maybe there is some non-obvious point that I’m missing), a betting scheme with positive expected value could exist and the CI construction method could still be bet-proof (the requirement being that no betting scheme works against *all* the possible values of the parameter).

Carlos:

Thanks for responding. And I do think we are getting somewhere. It may be that we actually don’t disagree on what “bet-proof” means, or if we do it’s only about how to use the term.

Let me try to describe the betting setup without the B or F words, and then we can agree (or not) on what we call bet-proof.

There’s a betting game based on a random number generator and some parameter theta. Theta changes with each play of the game. Every time I play the game, the casino chooses some value for theta, inserts the value of theta into the RNG, and uses it to create some data. The data won’t be secret, but I won’t know what the theta is.

The game is that every time I play, the casino will offer me a bet about theta. The bet is the output of an algorithm that I also know that takes the data as an input. The output of the algorithm is an interval and some odds. I will be offered a bet with these odds that the true theta is in the interval.

I think the only place we might disagree is whether the term “bet-proof” (or “reasonable”, the other term used in the literature) can be applied to the algorithm, a specific output of the algorithm (a bet = an interval with odds based on some realized data), or both.

And on reflection … I don’t mind being wrong about terminology here and that in fact the right way to do it and/or the standard usage in this literature is to refer only to the algorithm that generates the bets as “bet-proof”. The important thing for me is that – for “standard problems” – a realized frequentist CI has this property, namely that it can be seen as an output of a procedure that generates fair bets.

[I hope I got that right….]

Mark, this property (as defined in the paper, at least) may not be as interesting as you think.

Consider the following example:

Parameter of interest theta in the interval (-1,1)

x ~ uniform ( theta – 1 , theta + 1 )

You construct bet-safe 50% confidence intervals for me: CI(x) = [ x – 0.5 , x + 0.5 ]

My betting scheme: if x is below -1 or above +1, bet that the parameter is not in the CI.

For any value of theta different from 0, the expected payoff of my betting scheme is positive. For some values of theta I’m even sure that I won’t lose any money at all (when I bet, I can only win).

However, when theta=0 there is no betting scheme that wins in expected value and that makes your CIs bet-safe.

For illustration: http://imgur.com/a/2kJ6j

Mark, you said:

“The important thing for me is that – for “standard problems” – a realized frequentist CI has this property, namely that it can be seen as an output of a procedure that generates fair bets”

The thing that’s misleading is that there is nothing “fair” about bets placed on the *realized* interval. So the terminology seems to imply something about bettability of the realized interval, but in fact it just means “it came from a certain kind of procedure”.

I think I’ve finally wrapped my head around this (sorry for being obtuse earlier Carlos) and the way it works is this:

The casino TELLS YOU the value of Theta.

They then offer you to bet on whether the next data set X and the associated confidence interval will contain Theta.

The procedure is bet proof if you have no way to consistently make money off this *pre data* bet. Here, the uncertainty in the data drives the inability to make the bet.

However, if you are able to make the bet AFTER seeing X and the CI(X) then you win 100% of every bet that you make (just look at the CI and see if the Theta they told you is in CI(X))

In science, the usual situation is totally different: we all know the data X, and we don’t know the Theta. In that scenario, a little real prior information for a Bayesian makes them eat the CI procedure’s lunch even if it’s “bet proof”.

What I wrote before is not correct:

> However, when theta=0 there is no betting scheme that wins in expected value and that makes your CIs bet-safe.

Of course there are betting schemes that win when theta=0 (anything that bets mostly when 0 is not included in the CI).

I think that they won’t have positive expected value for every other value of theta, so the bet-proof property remains, but I have no proof.

Carlos:

I think your example means the constructed CI does not come from a bet-proof procedure. Bet-proof means expected winnings are not possible uniformly across the parameter space.

In your example, your betting strategy is profitable everywhere in the parameter space (-1,1) except where the parameter theta is zero. By definition, this CI procedure is therefore not bet-proof.

(So this is a “non-standard” problem using the Mueller-Norets terminology. From their paper: “In the standard problem of inference about an unrestricted mean of a normal variate with known variance, which arises as the limiting problem in well behaved parametric models, the usual interval can hence be shown to be bet-proof.” In fact their paper is about bet-proof CIs for non-standard problems. As they note, a lot of recent research in econometrics is about such non-standard problems, so this is an impt caveat.)

Carlos: I didn’t spot that either. And actually I am also not sure my claim at the end that this is a “non-standard problem” is correct – apologies. But my main point still holds: the CI procedure you describe is not bet-proof, exactly because you’ve shown a winning betting strategy exists.

Mark, isn’t “uniformly across the parameter space” different from “only in some regions of the parameter space”?

Unless there is a betting scheme that wins *for every value of theta*, then the procedure is bet-proof. By definition.

I have not shown that a winning strategy exists (because the one I proposed doesn’t work for theta=0). Unfortunately, I have not really proved that it doesn’t either. Do you think that a winning strategy exists?

Carlos, I think you’ve got the sense inverted. Uniformly bet proof across the parameter space is intended to mean “it doesn’t matter what the parameter is, you can’t win money for any parameter value at all, even if the casino tells you what the parameter is”

That’s why I say first the casino selects a parameter value and tells it to you. Then they offer to let you place a bet… THEN they generate the data and the CI.

Bet proofness is essentially “you can’t predict the X that will come out, so you don’t know what will happen about the CI” another way to think of it is “you’re missing the cryptographic random seed, which is the information you’d need to make money off the bets”

This is why I think bet-proofness is misleading and irrelevant. In practice few of us here at this blog are running casinos and many of us at the blog work on problems where *THE DATA IS KNOWN* and it’s the THETA that is unknown.

A realized confidence interval CI(X) where X is known, even if it comes from “a bet proof procedure” does not guarantee anything about betting *ON THE THETA VALUE* which is 100% of what people doing science want to do, and 0% of what cryptographers and casinos want to do.

Cryptographers and casinos want to keep the random seed secret and keep you putting down sucker bets on whether the Theta they reveal to you will be in the interval (whether the known-winning triple cherry result will be in the slot machine window essentially)

Scientists want to collect some data that everyone should be able to verify, and then decide what it means about the theta (whether for example taking aspirin makes theta smaller than not taking aspiring, where theta is the average time to resolution of the headache across the population of adults in the US).

Carlos: Ah … so if theta=0, then it isn’t winning. But it also isn’t losing, right? (Sorry, getting late in my time zone and I am flagging.) Without going back to M-N I don’t know if there are definitions for strong bet-proofness (uniformly positive) and weak bet-proofness (uniformly nonnegative). But I see what you are getting at.

Daniel: apologies, I accidentally responded to your comment at the bottom of the comments. But maybe that’s a good thing.

Both: I do appreciate your engagement and interest in this!

Mark, yes, makes sense to start over a new thread. I replied to your post below.

As I said a few comments ago, I don’t see the connection to uniform priors. The bet-proof thing assumes that the true value of the parameter is fixed. If you do the Bayesian thing, because you don’t know what the true value of the parameter is, you may think that your bet has a positive expected value. But conditional on the value of the parameter, the expected value may be positive or negative (because no betting procedure has a positive payoff for every parameter value).

If you really think that the paramter is not fixed, but a random variable distributed according to some probability distribution, I don’t see the relevance of the bet-proof thing. The “frequentist bet-proof thing” doesn’t necessarily translate into a “Bayesian bet-proof thing” (or at least it’s not obvious to me). The fact that no betting scheme exists with positive payoff for every value of the parameter doesn’t even guarantee that under a uniform prior the average payoff over parameter values won’t be positive.

Practically speaking confidence intervals aren’t used on repeatedly performing the same experiment. Imagine the following scenario:

Repeat N times:

Client comes to you with some scientific data about problem X_i, and wants to assess what the unknown fact about the world is (parameter Q_i).

Statistician F: I apply my confidence procedure C to the data and conclude the 95% confidence interval is [a,b]

Statistician B: I google up a bunch of stuff about what is known about problem X_i and construct a prior from that information, then apply a Likelihood which happens to be the same likelihood used by statistician F and generate a credible interval [c,d]

Betting Man: Mr B, I happen to have an oracle that tells us the true value, do you want to bet that the true value is in [a,b]?

Stat B: (looks at interval sees that it’s completely outside [a,b]) Heck Yeah! (but when [c,d] is mostly within [a,b] then heck no!)

Now, taken over N Clients, Stat B is going to win big time because he uses all this googling to improve his (or her!) estimates.

The confidence procedure is being repeatedly performed in exactly the same way, so this is replication of the procedure. The Bayesian procedure is performed in exactly the same way as well… except part of the procedure *is to gather background information about the process and encode it in a prior, which changes each time*

Now, if we force the prior to be flat each time, the Bayesian can’t win because the bayesian interval *is* [a,b]

that’s the connection to flat priors.

Note that there seems to be something wrong with the idea that the “bet proof interval” is a superset of *some* bayesian interval. The interval [a-epsilon,a+epsilon] is always a Bayesian interval for some extraordinarily concentrated prior isn’t it? and this is true for all a, so doesn’t that just say “the confidence interval contains some points”?

What I don’t see now is the connection between what you wrote and the “bet-proofness” property that Mark proposed. Does your betting scheme have positive expected payoff conditional on Q_i, for every possible Q_i?

And if the CI construction method used by F was not bet proof (in the sense proposed by Mark, i.e. there is a betting scheme that has positive expected payoff uniformly) then the expected value of the bet in your example would be positive for any prior, uniform prior included.

Who is conditioning on Qi? If its the Bayesian then yes. Conditional on Qi the Bayesian makes bets that he always wins.

Daniel: the definition of “bet-proof” that we were discussing. I think you’re talking about something unrelated, which explains why I don’t see the connection…

Carlos, I think the issue is that bet-proofness is a quirky and possibly poorly defined thing and I don’t see how it relates to anything or even if it’s actually well posed. it certainly doesn’t seem to relate to any betting scheme that anyone would ever actually carry out, and so It’s like calling “pancakes” “waffles” and then telling people you have the best waffles in town because everyone else gives you a wrinkly thing that no-one would ever really want. When in fact, it’s just the case that everyone who comes to your restaurant is pissed off that you don’t know what a waffle is.

Consider their equation 2 in the paper. x represents a sample of data, f(theta) represents a parameter to estimate b(x) is a betting strategy.

The expectation is taken over *the possible samples of data*. First off for a Bayesian, this is meaningless, because there is one sample of data and it’s known.

But, supposing pre-data, the logical expectation for a Bayesian means multiply p(x | theta) p(theta) and for a frequentist this means p(x | theta) full stop. Since 1 is the unit identity in multiplication we can rewrite “p(x | theta) full stop” as “p(x|theta) * 1”

and so the Bayesian is now being forced to consider his expectation with respect to a prior whose value is proportional to 1 (ie. uniform).

> the issue is that bet-proofness is a quirky and possibly poorly defined thing and I don’t see how it relates to anything

Maybe we agree then, but it was you who said “isn’t this just….”

> and so the Bayesian is now being forced to consider his expectation with respect to a prior whose value is proportional to 1 (ie. uniform).

I don’t want to force anyone to do anything, but if the Bayesian wants to get the same result (the expectation conditional on a fixed value of theta) he has to use a point prior (concentrated at that value of theta) and not a uniform prior.

note that, post-data p(X | anything) = 1, and even pre-data p(f(Theta)|Theta) = 1 too, so the whole thing is very muddle headed.

Let me implement b(x) as follows… First, you tell me X and Theta (since you’re conditioning on that too). Then I’ll calculate your confidence interval for you, and f(Theta) and I’ll look to see if f(Theta) really IS in your confidence interval, and then, I’ll bet the house when I know you’ll lose, and then I’ll drink your milkshake.

How is that not a valid b(x) conditional on theta?

I suppose I really, to be clear, should say p(X | anything,X) = 1 which is what “post data” was supposed to mean.

> How is that not a valid b(x) conditional on theta?

Because b(x) is a function of…. x.

> p(f(Theta)|Theta) = 1 too, so the whole thing is very muddle headed.

Cleary the whole thing is very muddled in you head… Where did you get this p(f(theta)|theta) from?

I think this conversation has derailed completely. I’m done.

So, I just think the whole thing is muddle headed. and it sounds like you kind of agree with me.

Here’s the situation as I see it: The Frequentist doesn’t know Theta, and he doesn’t know X but he knows that if you were to tell him Theta, and pretend that f(Theta) was not also known at that point, and then ask a random number generator for some X and then calculate a confidence interval for X and then tell X to a Bayesian, but not Theta, that if the Bayesian uses X strictly for the purpose of calculating b(x) and then the Frequentist pretends that the Bayesian never knew X and that X might actually be any of the other X values that might have come out of the RNG, that on average, in those other worlds where X was different, The Bayesian can’t drink his milkshake

I mean, hunh?

To the Bayesian this whole scheme relies on being told Theta, and X but pretending that you don’t know Theta for the purpose of betting, or X for the purpose of evaluating how good your bet was, and what’s more, pretending that there’s an obvious reason why you’d be interested in this scenario.

“Because b(x) is a function of…. x.”

but once you know x, what’s the point of averaging over other possible x?

Wald’s theorem says more or less that the right way (an essentially complete class of non dominated ways) to implement b(x) is to take an expectation with respect to Theta for some prior p(Theta) and bet when the expectation is positive.

After the fact, when someone tells us Theta, we either did or did not win, so there’s no expectation left. So the Frequentist, in desperate need of a heroin fix, constructs a fictitious alternative world in which after he tells X to the bayesian, and someone comes along and tells everyone Theta, and he loses, he can claim at least in that alternative world where X might be different (but apparently b(x) has to be the same bet) he expected not to lose.

Carlos, I know you stepped out of this discussion, but anyway in case someone goes back and reads these I’ve been thinking about it and here’s what I see:

1) If you are talking about betting on a single realized interval, then the X is known, and so there’s no sense in discussing the average over x. This means, “bet-proofness” just can’t be a property of an individual interval, because it averages over possible outcomes of X, which are no longer relevant. IT doesn’t generalize from “we used a bet proof procedure” to “this realized interval is bet proof”

2) If you talk about pre-data, then perhaps you can discuss confidence procedure bet-proofness. It seems to go like this: A bayesian and a frequentist are told they’ll be given some data from a cryptographic random number generator with a given exact distribution. The parameter Theta from the distribution will be cryptographically chosen to be any of a finite set of possible values (let’s say between 0 and 2^128 for example). The Frequentist declares what procedure he will use for constructing a confidence interval, and offers the Bayesian to bet on whether the interval he will eventually construct is going to contain the true f(Theta). Bet proofness is the property that the Bayesian can’t create a scheme that will let him win money in the long run from repeating this bet ahead of getting any data.

Note that if the Bayesian somehow obtains information about the cryptographic random number generator so that he knows what Theta is and what X is in each case ahead of time…. he WILL win, because he’ll calculate f(Theta) and X and CI(X) and see if the CI will contain f(Theta).

Even better, in the analogy, is if he can obtain *some* of the bits that the RNG will output… he can’t calculate Theta and X exactly, but he can reduce them to a smaller number of possibilities, and then his probabilities for the outcomes are different from the probabilities for the Frequentist.

The question is, whether when the Bayesian obtains information about Theta but not about X, he can win pre-data bets. (that is, he has real prior information about Theta).

I think the usefulness of bet-proofness, if it has any, comes in characterizing how the confidence interval procedure is independent of knowledge of Theta. Basically, if you don’t know X you won’t be able to figure out whether CI(X) contains f(Theta) in a way that allows you to make money on the bets.

This ultimately comes about precisely because CI(X) is bigger than the bayesian interval you’d construct if you had PERFECT information about Theta.

So now, at least, I can see what the point of it is… and I think this also makes it clear that there is not any kind of “bet-proofness of a single realized interval” property because necessarily to get the realized interval you also are assuming that the X is observed. And that’s what was confusing me above.

Carlos:

Don Rubin once argued that Neyman’s original motivation for CIs was to have intervals that would have good properties regardless of the prior and was lead to consider all point priors so one could see the uniform prior as a continuous approximation to this.

So you may not _see_ that uniform prior being involved but _logically_ it is.

Keith: How does this logical connection resist reparametrization? As far as I can see the results based on point priors would still be valid. However, the uniform prior would no longer be uniform.

The idea that frequentist/confidence/likelihood etc forms of inference are just Bayes with uniform priors is popular but wrong. Even if that was the original motivation of Neyman it wasn’t the logical result. One can add additional assumptions to try to regain this result but I think this also misses the point.

Carlos:

Its the intrigue of continuity ;-)

Results based on point priors would still be valid for inference if one thought inference was separate from decision theory but if one includes a decision theory perspective there is even an impact on point priors of reparametrizations (this I believe is David Draper’s take on it).

ojm:

You might be right, I’ll try a make my position clearer in a future post.

Sorry for longish post. Great thread here!

The problem with CI’s is that there are too many ways to make them and still satisfy the definition. There are numerous examples of intervals that satisfy the CI criteria yet are open to valid criticism. Such examples include intervals for a real-valued mean that are infinitely long 95% of the time and of length 0 5% of the time, or those that fail to condition on ancillary statistics (i.e., there are identifiable subsets of the data that alter the precision of the estimates).

However, I feel that criticism of the concept of “confidence” is misplaced (or at last too broad). “Confidence” is an attempt to calibrate the uncertainty in an estimate. In particular, it gives an unconditional, repeated sampling interpretation. Now, there are “good” ways to go about this and “bad” ways (like my examples). Ideally, you’d make your interval from the sampling distribution of your point estimate relative to the true parameter value. If that is not possible, you can use the likelihood function and some well-known asymptotic results for them. These are just some examples. What I wanted to show is that in both of these cases, you are not merely satisfying the CI definition but are taking into account the distribution of your point estimates in a repeated sampling sense.

Now, how are we to interpret Bayesian intervals? Sure, we use the language of probability, but this “probability” is not the same as the one we think of when making guesses about coin flips — the intuition most people have is “if P(HEADS) = 0.9 then I’d expect about 90% of the coin flips to end up heads”. Or, if you’ve already flipped a coin and they have to guess what it shows, then saying “It will be heads with 90% probability” also has a repeated sampling interpretation (over many repeated flip + guess trials).

While CI’s certainly have their flaws, we can at least verify them to some degree (at least in principle). How do you assess the correctness of Bayesian credibility intervals except by appeal to their accuracy over repeated applications? How do you calculate your risk of finding out you are wrong? I’ve not seen a good answer to this question for Bayesian intervals…unless it’s simply a question Bayesian statisticians are not overly concerned with. As a concrete case: If you have generated thousands of Bayesian 90% intervals (based on numerous simulation experiments), yet only 30% actually contain the true parameter, would you be concerned? Should your client be concerned? Mine would.

I’ve always thought of (95% or 99%) CI’s as “trusted advisers” — you know they are usually correct, so you trust them, even though you cannot know if they are correct for this particular question (in fact, you expect them to be wrong sometimes). In what sense can Bayesian CI’s play this role and how do you know your “Bayesian adviser” is not a charlatan?

Note: I use both Bayesian and non-Bayesian methods in my work, so this is not intended to start a flame war…I sincerely would like to hear about how you calibrate you risk of being wrong (if they care) without appeal to repeated sampling behavior at some level.

Not sure if you’re under the misapprehension that Bayesians are somehow forbidden to look at the accuracy of their procedures over repeated applications. (We aren’t.)

In answer to your specific question about calibrating risk of being “wrong” (presumably you mean something like “using a wrong prior”) in a setting where repeated sampling is not available or not relevant, Mike Evans for one has been working on this sort of thing for a while: Checking for prior-data conflict.

Hi Corey. Thanks for the response and link. I’m assuming that those who use Bayesian methods (including myself) are pragmatic about our assessment methods, so repeated-sampling properties are fair game.

What I was getting at was that criticizing confidence, per se, just because it gives weird results in some cases, seems too broad. Confidence is simply a reliability metric, and it makes sense at that level; however, I totally agree that specific methods of constructing them are susceptible to “relevant subsets problem” or are trivially true. But, you can utilize the likelihood function to help get more sensible intervals.

As for “risk of being wrong”, I wasn’t referring to a wrong prior, but about how you calibrate your assessment of how reliable a given credibility interval is without appeal to repeated sampling performance under a suitable reference set. For example, if I give you a 95% Credibility Interval for P(HEADS) of a coin, and it’s (0.6,0.7), are you saying you’d be very sure that the actual P(HEADS) is contained in this interval? If so, why and how does this 95% come into play?

Calibration in the Bayesian jargon sense of the word is a property of sets of probability assessments. I’m content to agree that there’s no sensible notion of calibration in any sense for just a single probability assessment. One way a particular probability assessment like 95% can come into play even in the context of a single interval is by seeing the interval as the output of a statistical-decision-theoretical procedure, which in the Bayesian context means posterior loss minimization. The 95% arises as a result of the trade-off between wide intervals and incorrect intervals as encoded in the loss function.

Thanks Corey. Thanks makes a lot of sense.

Oops, I meant “that makes a lot of sense”.

I actually read it as “that makes a lot of sense”; I wouldn’t even have noticed the typo if it hadn’t been pointed out. ;-)

Mike:

You write, “I’ve always thought of (95% or 99%) CI’s as “trusted advisers” — you know they are usually correct, so you trust them, even though you cannot know if they are correct for this particular question . . .”

The trouble is that this “usually” is conditional on the scenario. For example, take the set of 95% confidence intervals in PPNAS papers edited by Susan T. Fiske. I don’t think those are trustworthy at all!

But if you combine then with some prior on the effect sizes (for example, normal with mean 0 and sd 0.1), then the intervals will become much more trustworthy.

People have this idea that classical intervals are safe and Bayesian intervals are perhaps more informative but risky. In many settings, though, it’s the classical intervals that are risky and the Bayesian intervals that are conservative. See this paper with Aleks Jakulin.

From the radical link “we should think carefully about the motivations underlying our principles.”

Agree, an assessment of good for what?

+1 to Andrew’s comment.

Thanks Andrew! Honestly, I don’t want to start an overly philosophical debate here that is already tired and worn.

I don’t necessarily think that a classical CI is in any sense “safer” than a Bayesian credibility interval. I use Bayesian methods in my own work using machine learning models. However, I see Bayesian methods as providing a form of regularized inference/estimation, as hinted at in your comment and linked paper.

I come from a machine learning background, so I appreciate “posterior predictive checks” on hold out data as a good assessment of a model (as suggested in your Bayesian Data Analysis book and linked paper). But with predictive models, we generally want to capture some set where a future value will occur (e.g., temperature reading, class membership) and to know how accurate that set is…hence, we need to know the conditional distribution of errors for our model — with that we can calibrate our prediction sets to match a our repeated sampling accuracy criterion (for example, as we do this when calibrating classifiers).

I am sympathetic to the criticism that attempting to bracket the “true parameter value” is a misdirected goal, given that our model, by virtue of being a model, is wrong. At best, we can attempt to estimate “optimal” parameters under some loss function (which leads us back to posterior predictive checking and cross-validation) or find the most consistent parameter values under the assumption that our model is correct. We can then use resampling and simulation to get a sense of the variability of these “optimal” parameters under repeated sampling from the same population.

Frankly, I like the Bayesian model-based, get-assumptions-out-there, type approach to my own work. All I wanted to emphasize (verbosely, it appears ;-) is that at the end of the day, we want to be “close” to reality, and we need some way of objectively measuring how well we are doing. Hence, something like confidence is needed at some point in our evaluation process to ensure we are grounding our reliability estimates to reality.

Thanks again for responding. I really appreciate this blog and your contributions to our craft.

> we want to be “close” to reality, and we need some way of objectively measuring how well we are doing

Agree that we want to be “close” to reality but there will never be a way of objectively measuring how “close”.

For instance “Probability models make representations that try to get at some aspect of reality that cannot be directly assessed but do provide indirect checks on their adequacy” and further discussed here http://www.stat.columbia.edu/~gelman/research/unpublished/Amalgamating6.pdf

Thanks Keith. Looks like an interesting read!

Thanks to Andrew for always bringing back the real world!

Agree with Noah Motion above – I find the Cox/Fisher interpretation the most straightforward and relevant for scientists. A consistency interval obtained by hypothesis test inversion.

As mentioned by Mike the issue is usually adding additional ‘conditioning’ to make it more relevant to your particular problem. No need to do this via a prior distribution – eg you could use ancillaries. The basic point, made by Fisher and many others is that you need to define a relevant reference frame. So yes ‘prior info’ but no not necessarily ‘prior distribution’ (fine if you want to do this, but be aware it makes different assumptions).

I’d also like to see (in simple cases) more people just plot the pvalue function p(t(y)>t(y0);theta) as a function of theta. And if you really want an approximate (confidence) density on parameter space then differentiate this wrt theta.

A problem faced by both repeated sampling and Bayesian interpretations is that they sneak in a ‘true model’ assumption. A consistency interpretation need not make such an assumption.

ojm: Isn’t the assumption just being kicked up to a higher level?

That is approaches (without a ‘true model’ assumption) choose properties to be optimal under for a given class of applications with no direct justification for the goodness of the property nor guarantees of a particular application belonging to that class. That is, with no way to assess the goodness of the property or belonging within that appropriate class, without making some representation of reality (assumption of range of possible models) to average or maximize over.

Hi Keith,

Yes something along those lines but I wouldn’t say it has anything to do with optimality.

The issue is how to calibrate or interpret your measure of consistency if not relative to the ‘truth’.

This requires additional assumptions or standards but these need not be in the form of assuming a true model.

At least one thing is clear. Richard McElreath has a good predictive model for comment behavior on this blog:

https://www.youtube.com/embed/BrchZeyo7cg?start=1712&end=1742&autoplay=1

I’m grateful they finally figured out what Confidence Intervals are, because now I can create a 95% CI for the number of angels fitting on the head of a pin ( 3923.32 +/- 59.54 ).

Is that for the composite of all orders of angels, or just for the lowest order? ;~)

113 comments from really clever people about what a confidence interval means (OK, need to subtract a few from me)

Fair to conclude there is some doubt about this!

Confidence intervals about confidence intervals are very wide?

I bet you can get the physicists talking at length over what quantum ‘measurement’ really amounts to, too

Yes, but isn’t the main value of a CI to support some sort of inference? If it can’t be interpreted, that’s sort of the whole game. The usefulness of quantum theory doesn’t rise and fall on our ability to understand why the equations work.

To me this is really the key insight that those people who pooh pooh the whole “confidence interval debate” miss… The problem is Frequentist inference is appropriate for repeated games of chance, and cryptography, and computing schemes for sampling from probability distributions, and the analysis of abstract mathematical properties of sequences of numbers……. and COMPLETELY falls flat when it comes to usefulness and interpretability for doing science. And that’s a big practical problem because we spend billions of dollars on “science” and we’re using the wrong technology… it’s like everyone is out there measuring the length of stuff with digital thermometers, and a few of us are here RANTING that you CAN’T MEASURE LENGTH by reading the temperature off a thermometer.

But what do credible intervals provide when the model is misspecified (most or all of the time in many fields)?

Passing some ad hoc PPC doesn’t give you probabilistic guarantees.

At least the CI mean something about real life.

Daniel:

“there is nothing “fair” about bets placed on the *realized* interval”

“the terminology seems to imply something about bettability of the realized interval, but in fact it just means “it came from a certain kind of procedure””

I don’t understand. Can you explain? I described a game where you bet on the realized interval. (Not my game! I just paraphrased M-N, who reference Buehler (1959) and Robinson (1977).) What’s unfair about it? (Not being sarcy; serious question.)

Mark, as far as I can see that’s not the game of interest. Bet proofness doesn’t rely on Theta being unknown, it says no matter what Theta is there isn’t a betting scheme that makes you money on average over multiple plays, when you have to make the bet before seeing the data.

So, we both agree we’ll use a cryptographic random number generator to generate normal random deviates using Theta = 0 and a bet-proof CI scheme. Can I make money off of you if I am not told the X? No. If I AM told the X? absolutely yes, either every interval contains 0, in which case I can’t make money, or I bet the house on those that don’t contain 0 and I win every time that I make a nonzero bet.

The bet-proofness averages over unknown repeated X, and is bet-proof for any Theta. It’s not averaging over Theta for fixed X. That’s Bayesian Decision Theory that does that.

In your game where Theta is unknown, if I choose a specific Theta, such as 0, then *if theta is 0* I will make money off you when you reveal the X to me before I bet. Therefore there does not exist any bet-proof Confidence procedure for revealed X because you can never get uniformity of bet-proofness, there is always at least 1 point where I can choose a parameter arbitrarily, and if the betting machine does in fact use that parameter, I will make money.

I suppose the one case that might prove tricky is *when the confidence interval procedure itself is nondeterministic*

So, if you draw the random normal X and then you construct a confidence interval using a separate random number generator and X together… and I don’t have access to that ability to construct CI(X)… then you could reveal X (but not the CI(X)) and I still couldn’t make money.

But once the CI(X) is revealed… if I get to bet after the fact. I can always choose Theta=0 and if Theta=0 isn’t in the interval and Theta=0 is the true parameter… I make money so the doesn’t exist bet-proofness for revealed confidence intervals.

Another corner case is where your confidence interval is the whole real line. No matter what the parameter, it’s always in the interval, so I can’t make money. But short of choosing the whole real line every time, a revealed confidence interval means I can choose some parameter arbitrarily, and if that was the real parameter, then the scheme “bet when that parameter isn’t in the interval” always has positive expected winnings, so there always exists some parameter value for which my scheme has positive winnings, so there’s no bet-proofness for “normal” cases (normal meaning the confidence interval is finite, and isn’t guaranteed to always contain the true parameter).

Surely you should be allowed to bet on ‘in the interval’ or ‘not in the interval’ with the nominal alpha giving the payoffs, so if they claim it is only a 95% CI yet by construction it is always the whole real line then you can win by always betting ‘in the interval’ and getting the (1.053 * bet) payoff.

If so then I think you’re right that the only interpretation of bet-proofness can be where you have to bet before having seen the data (or the realised CI).

No, in this construction you only have the option to bet on “not in the interval” — that is, conservative intervals are permitted. (When models are even slightly complicated, often the only way not to violate the confidence constraint at some point in the parameter space is to allow intervals to be conservative at other points in parameter space.)

Ack, this bet proofness is really VERY unintuitive, worse than normal in my opinion. anyway re-reading the M-N paper I see they say (and I misinterpreted)

Bet proofness is the property that *for every betting scheme b(x)* there exists a parameter Theta such that b(x) doesn’t win. From definition 1.

Now I was previously misunderstanding that *for every Theta* there does not exist a betting scheme b(x) that will consistently win. (which I maintain is impossible for revealed CI and would be interesting for unrevealed CI.. but is NOT RELEVANT to this discussion, so sorry again).

Now, with the above reworded Definition 1, I can see how the game is about revealing X and not revealing Theta. But, I don’t see how it is interesting. Let me give you an example:

Our bet is to sample a random sample of every adult who lives within 2 blocks of me. We will measure their heights X, then you’ll construct a confidence interval CI(X), and I will bet whether after we measure all the rest of them, the population mean Theta will be in the CI.

Now, being a good Bayesian, I use Wald’s theorem and realize that any strategy to decide what my bet winnings will be that doesn’t use a prior and a loss function will be dominated by one that does…. So I’ll go out to Wikipedia and google up info about the population of the US and their heights, and I”ll construct a real world prior and then I’ll place my bet.

Now bet proofness says that because it’s the case that if the actual height of people in a 2 block radius is 1 Million Feet (ie. THERE EXISTS A THETA), my prior will bias my bets in the wrong direction and I will not be able to make money…. that this CI is all good, it’s bet proof.

And how is that relevant to any bet we will actually place?

With Wald’s theorem telling us that all the non dominated betting schemes are in the class of Bayesian schemes, we might as well identify betting schemes with priors… So “bet proofness” is another way of saying “for every Bayesian prior on a parameter, there exists some point outside the high probability region of that prior”

Now, this is no doubt mathematically true for sufficiently good definitions of “high probability region” but the “there exists” is a pure mathematical fantasy. Whether there exists in reality a corresponding thing in the world… is totally disconnected from this mathematical existence question.

If there doesn’t exist in the world a 1 million foot tall person, then the existence of the number 1 million by itself isn’t very interesting for betting purposes.

Daniel:

Yes, I think we are on the same page now, and you are making a good point. I think it is basically the same point that Carlos was making above with his constructed example. From M-N (in the “inspector” formulation):

“Is it possible for the inspector to be right on average with her objections no matter what the true parameter is, that is, can she generate positive expected payoffs uniformly over the parameter space?”

And you and Carlos point out that a lot hangs on “uniformly over the parameter space”.

Different point: you say “we might as well identify betting schemes with priors”. There’s a theoretical connection between betting schemes, bet-proofness and Bayesian credible sets: “The analysis of set estimators via betting schemes, and the closely related notion of a relevant or recognizable subset, goes back to Fisher (1956), Buehler (1959), Wallace (1959), Cornfield (1969), Pierce (1973), and Robinson (1977). The main result of this literature is that a set is “reasonable” or bet-proof (uniformly positive expected winnings are impossible) if and only if it is a superset of a Bayesian credible set with respect to some prior.” (M-N again)

Right, so a bet proof confidence interval is any one that is bigger that a Bayesian interval with SOME prior. And that again is insane. Suppose you and I take on the repeatedly sampling from random neighborhoods to see what the average height is, it’s a fun game… You set up a confidence procedure. It will be, you’ll calculate a Bayesian posterior using normal(1000000,10) as your prior on height in feet, and then widen it by 0.02 feet.

This is now a “bet proof” interval. And yet if you’ll kindly allow me to do this repeatedly with you throughout Los Angeles county, with a maximum bet of $1000 each time, I will happily purchase a Porsche by the end of the month.

Daniel: what you propose cannot be a bet-proof confidence interval because it isn’t a confidence interval (it doesn’t contain the parameter with probability higher than 1-alpha).

“The main result of this literature is that a set is “reasonable” or bet-proof (uniformly positive expected winnings are impossible) if and only if it is a superset of a Bayesian credible set with respect to some prior”

So, the only if should apply right? I showed my interval was a superset of a bayesian interval with respect to some prior.

Also note that the question of whether it contains the parameter with probability higher than 1-alpha is predicated on the correctness of the generating process… the Frequentist doesn’t believe in Bayesian probability, so implicitly using that “prior” actually means using a different “likelihood” in which the generating process is generate

m ~ normal(1000000,10);

and then generate

x ~ normal_rng(n,m,s)

let’s suppose that s is known and n is a number of data points pre-specified.

When that *is* the generating process, the confidence interval should be a confidence interval… And this is all that CI procedures EVER give you, an “if we know the generating process, then in the long run, we cover the parameter 1-a of the time”

so, yes, it is a CI procedure, it just doesn’t correspond to anything that exists *in the real world*, just like most CI procedures, which is why most CI procedures produce intervals that in the long run don’t contain the true value of the parameter anywhere near as often as they claim.

> Also note that the question of whether it contains the parameter with probability higher than 1-alpha

A 95% confidence interval has to verify a very simple property. For every value of theta, if you generate data according to the model and calculate the corresponding confidence interval then at least 95% of them will contain theta. It has nothing to do with the real world.

Right, so if you generate data according to

m ~ normal(1e6,10)

x ~ normal_rng(n.m,s)

Then the intervals generated by my Bayesian method expanded by 0.02 feet would be calibrated confidence intervals would they not?

If the only possible value of theta is 1e6 then yes, those would be good CIs. But if theta is 1e6 you wouldn’t win (in expected value) by betting against theta being included in the CI. In fact, inference questions are not very interesting if we know from the start that theta=1e6.

On second thought, I don’t think my comment makes sense because I really don’t know how to interpret yours… Assuming the value theta=1 is possible. If you generate data for theta=1 does your CI cover theta=1 with probability 95% or not?

Again, this is my point. Though, it’s not that the only possible value is 1e6, but rather that we generate thetas from normal(1e6,10) but the basic point remains.

we seem to be coming at it from different perspectives but arriving at the same set of weird properties.

Frequentist confidence intervals want to have their cake and eat it too. Wald’s theorem says that if you want a non-dominated betting procedure you (essentially) *have to choose a prior* and then make a bayesian decision. If you don’t, there’s a Bayesian procedure that does as well or better everywhere in the parameter space.

The confidence procedure’s coverage guarantee is predicated on perfect knowledge of the generating process…. To get bet-proofness apparently (according to above from Mark) you have to cover some Bayesian interval. This means it’s sufficient to look at Bayesian intervals. When you create a Frequentist confidence set based on a Bayesian interval though, you implicitly make STRONG ASSUMPTIONS about the FREQUENCY PROPERTIES of the generating process, you in essence alter your likelihood to be a mixture likelihood as above in which the prior is incorporated as part of the frequencies. The confidence coverage now only comes when your new generating process actually is correct.

But now, if you know that the parameter comes out of a frequency distribution with the shape of the prior you used…. you don’t even need to look at the data to get confidence coverage, just give the 95% interval for the prior for the parameter (in my case 1e6 +- 20 or so)

Finally, let’s go back to the casino game. The casino uses a crypto rng to generate a parameter m from the interval 0,1000 using a distribution known to them, but NOT known to you. Then they use the parameter to generate

x ~ normal_rng(n,m,s)

and they repeat the procedure, re generating m from an unknown distribution and re-generating x.

If a confidence interval procedure even exists in such a case, it must converge *pointwise* so that whenever the casino generates m = m0 in 95% of those cases the coverage property occurs… and this is true for all m0. Because otherwise it would seem that you’d also have to know the frequency distribution with which the casino generates m.

If you also require a bayesian interval for some prior, and *pointwise* convergence, it seems that this symmetry property requires uniform prior on the parameter. I can’t see how else that could make sense. It’s this need for pointwise convergence because you don’t know the frequency properties of the casino’s RNG *for the parameter* that induces a flat prior.

In summary:

1) Bet proof confidence intervals having been proven (by others Mark quotes) to come from bayesian intervals, in essence then come from mixture likelihoods where the prior is taken to be the frequency distribution of the parameter generating process. Confidence coverage is not guaranteed if the prior isn’t the actual frequency distribution.

2) “Standard” Frequentist confidence intervals require pointwise confidence across the whole parameter space, and so to be bet-proof they necessarily seem to correspond to uniform priors, if the space is not bounded then they may not even exist.

3) Bayesian probability intervals eat “bet proof” intervals for lunch on real-world bets whenever the Bayesian prior actually corresponds better to the real world generating process than whatever the “prior” was that the bet-proof interval used.

4) Bet proofness only requires a single point in the parameter space where each betting procedure loses… which isn’t all that impressive.

IN the end…. bet proofness seems to evaporate in a puff of smoke.

Carlos: confidence intervals don’t *have* to cover the Theta with confidence uniformly pointwise for all theta. It’s just that if you do

repeat ZillionsOfTimes {

m = normal_rng(1e6,10)

coverage[i] = contains(m,CI(normal_rng(n,m,s)))

}

The variable coverage should be true 95% of the time.

since theta=1 comes out of normal(1e6,10) almost never, the goodness of the interval for that value of theta isn’t really important… unless of course the *real world* “generating process” doesn’t have the normal_rng(1e6,10) frequency distribution you thought it did.

“Thus we can look for two functions \theta_{lower}(E) and \theta_{upper}(E), such that

P { \theta_{lower}(E) <= \theta_{1}^{0} <= \theta_{upper}(E) | \theta_{1}^{0} } = \alpha (20)

and require that the equation (20) holds good *whatever* the value \theta_{1}^{0} of \theta_{1} and *whatever* the values of the other parameters \theta_{2}, \theta_{3}, .., \theta_{l}, involved in the probability law of the X's may be."

[emphasis in the original]

J. Neyman (1937), Outline of a Theory of Statistical Estimation Based on the Classical Theory of Probability, II-CONFIDENCE INTERVALS, (a) Statement of the Problem

Raises some questions certainly. That may be the definition given by Neyman, but is it the definition used in the proof of the result about bet-proofness containing a Bayesian interval? Let’s suppose it is. I agree that this condition seems unlikely that a Bayesian interval would meet. The Bayesian interval would only meet the more general repeated trials requirement I mentioned.

Elsewhere I’ve mentioned that in many problems, every interval is a bayesian interval for some enormously concentrated prior… so the statement that a CI is bet-proof if and only if it contains a bayesian interval for some prior…. is for many problems more or less the statement that a CI is bet-proof if it contains some points…

So I confess, I’m confused because our discussion seems to show that the whole “bet-proof iff contains a bayesian interval” is potentially meaningless or at least misleading and since we don’t know what went into that proof… hard to debug without digging in… and I’m not that motivated.

It seems that something like this might work:

Using a “standard” confidence interval procedure that is pointwise convergent for any Theta, calculate CI(X), then taking uniform over CI(X) as your prior prove that the bayesian interval for that prior contains CI(X) (it has to) then call your CI(X) bet-proof.

If that’s what a bet-proof interval is… ok then.

Still doesn’t solve the issue that a betting procedure can win everywhere in the parameter space except a single point where it doesn’t lose, and be called a bet-proof interval.

All of it seems weird to me.

Mark: I think we will agree that this simplified setup is bet-proof (as defined in the paper).

Theta (which is also the parameter of interest) is one of -1, 0, 1

x is generated according to this distribution: 25% theta-1, 50% theta, 25% theta+1

CI(x)=x is a 50% coverage interval

b(x) is a function from {-2,-1,0,1,2} to {0,1}. There are 32 different betting schemes. It is easy to show that there is none of them wins uniformly. The only schemes that are expected to break even in the worst case are: never reject, always reject, reject when x is -2, reject when x is 2, reject when x is -2 or 2. None of them has positive expected payoff for theta=0.

Rejecting when x is -2 or 2 has positive payoff when theta is -1 or 1: we will bet (and win) 25% of the time. It will never bet if theta is 0, so the CIs are bet-proof.

Carlos:

I think you are onto something. Or I am missing something obvious. (Or both.) You have an example where there is a betting strategy that has positive expected winnings in some parts of the parameter space, and zero expected winnings in the rest of the parameter space. Everywhere in the parameter space the strategy can’t lose, and in some places it will win, in expected value terms. But because it doesn’t win outright everywhere in the parameter space, the confidence set CI(x) we are betting against is still called “bet-proof”. Which is odd, though I’m not sure of the importance or implications. (Did I get this right? It’s late in my time zone again….)

Happy to pursue this off-blog by email if you prefer.

Right, this is also my point. all that’s required for “bet proofness” is *the existence of a single parameter value where it can’t win* it has nothing to do with whether that parameter value would ever actually be used during any actual betting procedure. In fact, a Bayesian decision theory based betting procedure could win big EVERYWHERE except one point and the interval would be called “bet proof”.

This is what I meant, and I don’t really have more to say. I was not familiar with this “bet-proof” reasoning, thanks for the pointer. I can’t say I understand the whole paper, my comments were based mostly on the definition. This is an improvement over vanilla confidence intervals, but it remains a frequentist concept without a clear translation into the real world. Maybe one could restrict the “bet-proof” label to confidence intervals where for any betting scheme the worst-case is a loss. But they included the break-even case in the definition, and there might be a reason for that. And my main point stands, a betting scheme could win everywhere except for some value of the parameter. The interpretation of this property in general doesn’t seem very intuitive (or interesting) to me. (It could also be that I got something wrong.)

How about the issue I brought up above… if I accept your “bet proof if and only if it contains a bayesian interval for some prior” then whenever the confidence procedure produces an interval, I can always take uniform over that interval as my prior, and produce a posterior interval, which will always be completely contained in the CI interval… and so by this reasoning every CI procedure that produces an interval …. is bet proof

I suppose the issue might be that you’re changing the prior each time you repeat the process… and the result may be that there has to be a fixed prior and that all the CIs from the procedure have to contain the bayesian intervals you get for the repeated data, but for this fixed prior.

I guess we’d have to see the conditions of the proof… But honestly, the thing about there has to exist one point where the betting procedure doesn’t lose… that says it all right there. There’s nothing really about “you can’t win” (which is what bet-proof sounds like) it’s all about “you must have a theoretical possibility of at least not winning”

ack typo: “there has to exist one point where the betting procedure doesn’t *WIN*”

How to interpret credible intervals (when the model is false/misspecified)?