Statistics is easy! part 2 – can we at least make it look easy?

Posted on March 30, 2010 7:28 AM by Keith O’Rourke

Well can we at least make it look easy?

For the model as given here, there are two parameters Pc and Pt – but the focus of interest will be on some parameter representing a treatment effect
– Andrew chose Pt – Pc.

But sticking for a while with Pt and Pc – the prior is a surface over Pt and Pc as is the data model (likelihood)

In particular, the prior is a flat surface (independent uniforms)
and the likelihood is Pt^1 (1 – Pt)^29 * Pc^3 (1 – Pc)^7 (the * is from independence)

(If I reversed the treatment and control groups – I should be blinded to that anyways)

Since the posterior is proportional to prior * likelihood we take logs and suggest plotting 3 surfaces LogPrior, Loglikelihood, and LogPosterior (i.e. LogPrior + LogLikelihood)
– along with a tracing out of a region of highest posterior probability or some simple approximation of that.

This shows all inference pieces and their sum (summative inference) in this problem.
If researchers could think in clearly in two dimensions we would be done.

Regardless the convention is to think in one dimension so …

Transform Pt and Pc into (Pt-Pc) and Pc; and then focus on just (Pt-Pc)
– now just a curve.

This is (formally) easy to do with the posterior (integrate out Pc from the surface to get a curve for just (Pt-Pc)).

Andrew’s simple method, I believe depends on (knowing) the quadratic curve centered at (Pt-Pc) with curvature = -1/(Pt * (1-Pt)/nt + Pc * (1-Pc)/nc) in the (Pt-Pc) axis but constant in the Pc axis
– approximates the poserior surface well as does going down two units from the maximum to get an interval.

Maybe not all statisticians will immediately get this – on first look.

But it would be nice to still show the pieces and how they add in one dimension.

Fortunately in most Bayesian analyses, this is (formally) possible with no loss (see this paper)

– any posterior curve for a parameter of focus (obtained by integrating out the other parameters from the surface) can be rewritten as

Integrated posterior ~ Integrated prior + Integrated likelihood

The technical problem that arises here is getting the integrated likelihood where the integration has to be done with respect to the prior assumed
(sometime this does not exist but with modifying the prior so that it does – actually doing the integration to get a curve can be very difficult)

For this problem, using priors and elegant math from here and brute force numerical integration, we can show all inference pieces and their sum in one dimension for the log odds ratio parameterization for treatment effect.

The graph shows the LogPrior (red), LogIntegratedLikelhood (blue) and their sum the LogPosterior (purple) – just for the log odds ratio. Also a green curve for later. Their maximums have been arbitrarily set to 2 so that the horizontal line at 0 provides approximate credible interval.

plot2.pdf

Sorry I have yet to try this for (Pt-Pc) – probably doable by brute force – but log odds is a very convenient parameterization.

Now lets compare and contrast with the frequency approach.

In principle, the same integrated likelihood could be used – now just erase the LogPrior and LogPosterior. Then you go down about 2 units from the maximum of the LogLikelihood to get a approximate 95% confidence interval.
(Yes getting this just right, going down just the right distance and perhaps deviating the horizontal line from 0 degrees – such that it would have 95% coverage and this coverage is a constant function accross Pt and Pc is mathematically impossible but within any reasonable model uncertainty you can usually get close enough and always >= 95%)

The least wrong likelihood – that is just a function of log odds ratio – for this problem is the conditional likelihood (same math that gives the Fisher’s Exact test) and I should add that to the plot (it is not hard but not at hand right now)

The more general though a bit wronger approximation to the least wrong likelihood is the profile likelihood
– for each value of log odds ratio replace the unknown Pc with the mle for it and treat it as known. This traces out the peak over the surface in log odds direction and its known to approximate the conditional likelihood quite well. It is what drives logistic regression software and the _default_ in frequency based modelling.

That is added as the green curve in the plot above. It fails in the paired data case i.e. Neyman-Scott problems but otherwise works fairly generally.

Hopefully this shows why credible and confidence intervals will be very similar – in this problem. Both intervals mostly come from the blue/green curve (where they intersect the horizontal line).

This is a real simple problem – binary outcomes, two groups and randomized. Explaining this to people with little training is statistics – something I need to do soon – will likely be challenging.

Whats nice about the Bayesian approach here is that it can be displayed just using curves – for any parameter / parameterization one wants to focus on

– always using the same method. But there is actually no need to obtain the curves, one can grab a sample from the posterior surface and extract the posterior curve one wants to focus on.

On the other hand, one could use the profile likelihood in leu of the integrated likelihoods to get a approximate display – the error of thie approximation would show up in the difference between the extracted from the posterior surface log curve with the (marginal) LogPrior + LogProfileLikelhood curve.

But its also nice to show that the credible interval mostly comes from the peices that also provide confidence intervals and hence the confidence coverage should be pretty good (or maybe even better as in this example – Mossman, D. and Berger, J. (2001). Intervals for post-test probabilities: a comparison of five methods. Medical Decision Making 21, 498-507.)

Summary of an easy stats problem
Bayes: Grab posterior sample and marginalize to parameter of focus
Frequency: Marginalize the likelihood surface to something that is just a function of the parameter of focus – and do extensive math or simulation to get and prove its a confidence interval

Easy if both intervals mostly come from a log likelihood?

Question: Why don’t we give these picturesque descriptions of the workings of statistics to others?

2 thoughts on “Statistics is easy! part 2 – can we at least make it look easy?”

Dave Vanness on March 30, 2010 8:15 AM at 8:15 am said:

I am a co-author on a recent paper by Doug Mossman (whom you cite). He is one of the most imaginative and thoughtful colleagues with whom I have worked. We have come up with a particularly "picturesque" (and hopefully illustrative) description of Gibbs sampling. Would be curious to hear your thoughts on it (see p. 7 of the manuscript).

http://www.springerlink.com/content/t34063x7j18h6…
K? O'Rourke on March 30, 2010 12:38 PM at 12:38 pm said:

Dave: I like it – the drunk being a random walk more likely to seek lower rather than higher ground and never likely to stop anywhere in particular.

But it is hard to understand what others need to hear to understand something – one of those Rumsfeld's unknown unknowns.

My guess about Gibbs or any other MCMC is that it is something more basic.

Begining with posterior ~ prior * likelihood I think this needs to be replaced with more decriptive terms

"your model" ~ model for unknown * model for data (I'll take logs though and add)

To simulate from "your model" first draw the unknowns from thier model and then given those particular unknowns draw potential data

This generates a sample from the joint model of unknowns and potential data – now just keep those where the potential data was equal to (or approximately) the actual data "you got".

That makes it "your model" and gives a sample from the posterior – albeit very ineffieciently.

This will work – today – for simple problems like the one given here (or at least I have done it for similar ones).

But as we all know it will not work for most realistic problems as potential data will almost never be close to actual data in the simulations.

You simply then point this out and say that MCMC generates samples from the joint distribution that always have potential outcomes exactly equal to your actual outcomes.

But then I think you will want to point out it does not always work (converge) and then you can use your drunkard adding a local depression where he can get stuck too long and provide the wrong posterior.

Hope this was not too long – it is important to me!

K

Comments are closed.