My talk in the upcoming conference on Inference from Non Probability Samples, 16-17 Mar in Paris:

Looking for rigor in all the wrong places

What do the following ideas and practices have in common: unbiased estimation, statistical significance, insistence on random sampling, and avoidance of prior information? All have been embraced as ways of enforcing rigor but all have backfired and led to sloppy analyses and erroneous inferences. We discuss these problems and some potential solutions in the context of problems in applied survey research, and we consider ways in which future statistical theory can be better aligned with practice.

The talk should reflect how my thinking has changed from this talk a couple years ago.

So where should we find it?

In careful statistical and theoretical analysis that is constantly related to and compared against other knowledge we have of the world, and through the steady accumulation of statistical evidence from multiple perspectives and sources of variation? Which I guess Andrew would describe as the reasonable and judicious combination of prior knowledge and new empirical evidence.

But either way, not just by running some regressions and over-interpreting the combination of point-estimates and p-values to tell whatever story is palatable to our field and consistent with the pattern of stars in our results tables. Even if those point-etimates and p-values come from randomized trials or some other “rigorous” method.

Probably as good as one could get for a two paragraph summary.

But I doubt that a two paragraph summary will do any good — since the devil is often in the details. Some partially-baked thoughts:

Some central ideas that need to be conveyed (probably not a complete list):

1.”One size does not fit all” — so it is impossible to give a list of *specific* points that will ensure rigor. Case in point: “randomized controlled trials” have been called “the gold standard” and given the handy abbreviation RCT’s — but as jrc mentions, you can get nonsense out of a good RCT by using in inappropriate statistical analysis. Analogously, you can have an RCT that is so poorly designed and carried out that no statistical analysis can get any meaningful information from it.

2. Every statistical technique depends on assumptions. So one needs to consider carefully how well those assumptions fit the situation where they are used. And they will rarely (if ever) fit exactly, so virtually any statistical analysis leaves some uncertainty in its appropriateness — in addition to all the other sources of uncertainty.

3. Statistical concepts are widely misunderstood — usually in the form of oversimplifying and expressing more certainty than is realistic.

All of these are difficult to convey to most people who use statistics (perhaps in part because of a common desire to have a list of “If I do these, it will be OK.”)

Some tangential comments inspired by your points:

#1 Both of your points are absolutely factually true: There are tons of crappy RCTs.

But then again, we shouldn’t judge a tool or procedure by corner cases. Sure you can abuse and RCT. But then again, what technique is immune to misuse?

#2 Aren’t what you call assumptions very close or analogous to what constitutes priors? If yes, then why do we get people claiming that it doesn’t really matter much what prior you choose in most cases?

If you agree that ” one needs to consider carefully how well those assumptions fit the situation” how can one get away without carefully considering and choosing the right prior?

I sense a contradiction.

#3 On the other hand, to be useful, a technique must solve the problem at hand.

The problem at hand is very often a binary or at least a multi-option but discrete decision. Call it simplistic or what will you the fact is that the world is full of binary choices and people are looking for models / tools that will make these choices. Shying away from including decision analysis in your statistical toolkit is not the solution.

There’s a uptick in the number of structurally elegant tools, that purportedly add to our “understanding” of the phenomenon but are predictively useless.

Rahul I may be missing your point a little, but in my modest experience the likelihood is usually where more important assumptions are made. In many cases, the difference in posterior inference between a strong prior and a “uninformed” prior is quite small- all depends on model and strength of data. So I don’t think there’s a contradiction afoot in what Martha is saying. The point is to do model checking :)

@Chris

No, you may be right. I’m just trying to understand this better.

To restate: If the validity of conventional statistical techniques are strongly contingent on the correctness of the assumptions, then shouldn’t the validity of a Bayesian analysis also be contingent on the prior, which, in a way, is our “assumption” about the problem?

If not, why not?

I think it depends on how much data one has. The way I teach the intuition is by considering the simple conjugate Normal-Normal case (the prior is normal, the posterior too). The posterior mean m* and variance v* is a weighted mean of the prior, m, and maximum likelihood estimate ($\bar{x}$). The weighting is in terms of the precision of the prior of m and the precision coming from the data (the inverse of $SE^2$). So, if you have little data, the w1 term in equation 2 is larger and will dominate in determining the posterior. But if you have a lot of data, the w2 term will dominate, and the maximum likelihood estimate will dominate, and it won’t much matter what the prior was.

I hope this renders correctly.

I meant to embed this image: here.

@Shravan

I think what you write makes a lot of sense. So, I’d add that anecdotally in at least 50% of the applied problems, if not more, one does not have so much data that the role of the prior is insignificant.

And I think Bayesian proponents often downplay the importance of good quality priors to getting good model output.

Yes I think a good prior can be important, and should be subject to examination and checking. I think what I would push back against is the idea that the priors are uniquely assumption-laden part of modeling. specifiyng the likelihood embodies many more assumptions in most cases, i.e. choice of distribution, how to condition on covariates (Linearity or if not linear than some non-linear model or family of models, or using a logit link versus something else), what covariates to include or exclude, etc. I actually find specifying priors to be a stabilizing part of Bayesian inference because it forces me to think about what else I know, try to be clear on what I’m doing, and is an opportunity to provide some regularization.

Rahul: of course, priors can be important, and mis-specified priors can cause problems, and well specified priors are better….

But what really goes awry and often without anyone even blinking an eye, is the likelihood. Consider the difference between a likelihood in which you assume that “doing x causes y to have value z plus some random error of unknown magnitude” vs “doing x forces a change in a dynamic equilibrium so that y changes from whatever its value was, through time, via oscillation, until it eventually at the long time equilibrium achieves value z to within 1%”

Those are two really different views of how the world works. Plenty of times people analyze a situation where the second model is the appropriate one using the first model, and arrive at totally meaningless results, and if you focus on the prior for say the unknown magnitude in the random error, you’re missing the point entirely.

@Daniel: Nice example, thanks.

@Daniel: and for that, we can be glad Stan can do ODEs.

Rahul, what i wrote *has* to make sense. It is just math :). More seriously though i suspect your 50% estimate is a lower bound. Nowadays we run such complex models that there is rarely (in my field never) enough data for likelihood estimates to dominate for all params.

Daniel:

> different views of how the world works

I see something like this as a major issue here – using assumptions as a way to represent a communal view of what the reality we are attempting to deal with is approximately versus making (minimal) assumptions that guarantee _good_ properties (e.g. unbiasedness) regardless of what reality is like as long as its in this sub-domain (e.g. approximately additive effects).

When you question good for what, in the fuller scientific process as well as different ways of evaluating the procedure’s repeated use properties – what was formally taken as good – ain’t necessarily so https://andrewgelman.com/2016/08/22/bayesian-inference-completely-solves-the-multiple-comparisons-problem/

It is also opposed to just representing one’s subjective view of the reality and being self consistent.

@Rahul: Some comments on your tangential comments.

#1 RCT’s were just one example to illustrate my point that one can’t say “Do this to get rigor”. Also, (in the example of RCT’s) there are lots of published abuses of the techniques — calling them “corner cases” misses the point that there are lots of misunderstandings of what makes a good RCT.

#2 I consider choice of prior to be an example of what I was talking about — so no contraction in what I said. But a little elaboration:

However, I will say that in many — not all — situations there is empirical evidence that choice of prior has less effect on results than model assumptions that affect the likelihood. In a specific case, checking )

Somehow this got posted while I was trying to edit it. So please delete the sentence beginning “However” and continue with:

a. In many cases, the prior can (and therefore should) provide information that cannot be incorporated in the model giving the likelihood.

b. In many cases, there is empirical evidence that choice of prior has less effect on results than model assumptions that affect the likelihood — but this does need to be provided on a case to case basis; one also need to explicitly state what prior information is taken into account in choosing the prior.

#3 If the problem at hand is indeed binary or multi-choice discrete problem, then a method resulting in a binary or multi-choice decision is appropriate. But one does need to be careful that the real problem is indeed binary or multi-choice discrete — not just that that is what people want.

@Rahul, Chris, and perhaps others:

It sounds like there is some confusion in what I meant by my item #3. I intend it to include assumptions that go into a prior as well as assumptions in the model that gives the likelihood.

Of course. I was just curious as to how this was to be a change from 2 years ago; I may have misunderstood.