Winston Lin wrote in a blog comment earlier this year:

Paul Rosenbaum’s 1999 paper “Choice as an Alternative to Control in Observational Studies” is really thoughtful and well-written. The comments and rejoinder include an interesting exchange between Manski and Rosenbaum on external validity and the role of theories.

And here it is. Rosenbaum begins:

In a randomized experiment, the investigator creates a clear and relatively unambiguous comparison of treatment groups by exerting tight control over the assignment of treatments to experimental subjects, ensuring that comparable subjects receive alternative treatments. In an observational study, the investigator lacks control of treatment assignments and must seek a clear comparison in other ways. Care in the choice of circumstances in which the study is conducted can greatly influence the quality of the evidence about treatment effects. This is illustrated in detail using three observational studies that use choice effectively, one each from economics, clinical psychology and epidemiology. Other studies are discussed more briefly to illustrate specific points. The design choices include (i) the choice of research hypothesis, (ii) the choice of treated and control groups, (iii) the explicit use of competing theories, rather than merely null and alternative hypotheses, (iv) the use of internal replication in the form of multiple manipulations of a single dose of treatment, (v) the use of undelivered doses in control groups, (vi) design choices to minimize the need for stability analyses, (vii) the duration of treatment and (viii) the use of natural blocks.

Good stuff. Someone should translate all of Rosenbaum into Bayes at some point.

A lot of Rosenbaum’s research revolves around matching and propensity scores. I have read that these concepts don’t mix well with Bayesian analyses.

What did you have in mind on how to translate:

1. Unification of methods?

2. Or like taking the rationale of Rosenbaum (i.e. why you should match or use propensity scores, emphasis on design etc..) and come up with a Bayesian solution?

“I have read that these concepts don’t mix well with Bayesian analyses.”

I don’t know who’s saying that but what is the general idea there? I don’t see anything non-Bayesian about matching. Matching in essence is looking for pairs of observed cases that are similar in most ways other than what you consider the “treatment” right?

Matching is about investigating cases that differ mostly in a single dimension (treatment)

p(Outcome | A,B,C,D,E,T) becomes p(Outcome | T) because A,B,C,D,E are all “the same”

Propensity scoring is just collapsing a high dimensional vector of values down to a low dimensional summary which is highly predictive.

Instead of writing your model as:

p(Outcome | A,B,C,D,E)

you write it as:

p(Outcome | f(A,B,C,D,E))

Perhaps the justifications or interpretations are different, but I don’t see the techniques being at all “non Bayesian”.

In fact, having skimmed over some google searches on PSM techniques, I think they are Frequentist, but that there is a closely related idea which is very Bayesian.

In a Frequentist model, the goal of PSM is to somehow balance the frequency with which different sub-groups appear in each treatment group. If you had a reasonable sample size and randomized assignment, you’d have the same number of say coffee drinkers and smokers in each group. But in the absence of this randomized assignment, perhaps substantially more coffee drinkers and fewer smokers appear in the T=1 group than in the T=0 group for example.

A Frequentist analysis summarizes exposure to these two things in such a way that a sub-sample with balanced exposure can be created so that frequencies are similar to those you’d find in randomized assignment. Then, you ignore modeling the effect of these things, and just hope they cancel out as if they were random in the sampling distribution for outcome given treatment, as they would have been if you’d randomly assigned treatment.

But in a Bayesian model, it’s not so much that we need frequencies to be similar so that our sampling distribution isn’t distorted, because our model isn’t about the frequency of things, it’s about the plausibility of different predictions coming true… *conditional* on known stuff about the treated unit. Conditioning is easier when those exposures are summarized into a low dimensional summary. We still build a model:

p(Outcome | score, treatment)

but it’s an easier model than

p(Outcome | A,B,C,D,…,Treatment)

where A,B,C,D etc is high dimensional

so perhaps this kind of thing shouldn’t be called “propensity score matching”, just “score matching”

Maybe I was grouping propensity scores with matching unfairly. What would your model look like if you say for example matched 1:1 on age and sex? Would you just make your prior more informative of a null effect?

If I recall correctly the problem with Bayesian propensity scores is when you are jointly estimating the exposure and the outcome. With a Bayesian analysis you tend to pool that information into one joint model, which might in practice go against the advice given by Rubin.

The idea being that you should estimate the propensity score w/o looking at the outcome data (keeping the “design” and analysis separate).

But I do remember Jamie Robbins writing about this recently.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4667748/

I am guessing you could estimate the propensity score, treat that as a covariate and then turn the Bayesian crank. Would that be full-Bayes then?

Not sure.

There are I guess two processes. One is the process by which some people in the population are more likely to be in one vs the other group. (ie. they have a “propensity”). The other is the process by which some people who have certain characteristics necessarily have different outcomes due to causal connections with those characteristics.

In a Bayesian analysis I don’t think you need to be concerned necessarily with the “propensity” for certain covariates (that is, the frequency with which people naturally assort themselves into groups), what you need is to come up with a model for the causality behind how some “score” makes people have different outcomes. The “score” part is just dimension reduction and applies to Bayesian analysis as much as anywhere else. The “propensity” part is about frequency of occurrence and need not be part of your Bayesian model. The calculation of frequencies would typically come from post-stratification type calculations. Where the Bayesian model gives you conditional inference based on the observed covariates or a dimension-reduced form of them, and the post-stratification gives you inference about frequencies based on observed population frequencies, and conditional predictive outcomes.

f(outcome) = p(outcome | score, Treatment) f(score)

where here f is frequency and p is bayesian probability.

I think this keeping Bayesian probability and frequency separate conceptually is key to Bayesian analysis of observational or survey type data.

Of course, saying all this, recently in an analysis of a telephone survey on alcohol usage I simply sub-sampled the survey sample using sampling weights that reweighted the full thing to be representative by state population, and sex. Since I was only using a random sample of less than 1 percent of the total number of survey answers, it was a good opportunity to simply correct for frequency biases in the answering population by reweighting. If you have data to spare this can simplify your analysis considerably.