I received the following note from someone who’d like to remain anonymous:
I read your post on ethics and statistics, and the comments therein, with much interest.
I did notice, however, that most of the dialogue was about ethical behavior of scientists. Herein I’d like to suggest a different take, one that focuses on the statistical methods of scientists.
For example, fitting a line to a scatter plot of data using OLS [linear regression] gives more weight to outliers. If each data point represents a person we are weighting people differently. And surely the ethical implications are different if we use a least absolute deviation estimator.
Recently I reviewed a paper where the authors claimed one advantage of non-parametric rank-based tests is their robustness to outliers. Again, maybe that outlier is the 10th person who dies from an otherwise beneficial medicine. Should we ignore him in assessing the effect of the medicine?
I guess this gets me partly into loss functions and how we evaluate models. If I remember correctly you were not very appreciative of loss functions in one of your blog entries. As a side note I’d be interested to know of a paper where you explain your rationale for this.
The general point I would make, however, is that there is no “ethically” neutral method. When we adopt a method we, along with it, adopt an ethical stance, whether we know it or not. Ideally scientists ought to be aware of the stance they are taking and be able to offer justifications for it.
I think the resolution to this problem is to consider varying treatment effects (see, for example, here and here). If the treatment effect is constant, than the issues discussed above don’t arise: there is one parameter being estimated, and the ethical thing to do is to estimate it as accurately as possible. (In which case it could be considered unethical to use a non-Bayesian approach if good prior information is available, but that’s another story.)
My correspondent wrote back:
Some reactions to the readings:
1. Strictly speaking the problem does not go away. So long as (heterogeneous) estimates still involve some (locally) weighted average, we are still using weights. The choice of any weights is, in my view, of moral significance.
2. The problem becomes less salient — for the statistician. When estimating CATEs the statistician is only responsible for within strata weighting choices. As more strata are added, these choices become less consequential. In the extreme, individuals within strata are identical on relevant covariates so within strata weights don’t matter. The statistician reports CATEs and passes the buck to policy makers.
3. The problem may remain for the policy maker — and bounce back to statistician. Often treatments cannot be finely targeted to sub-populations, for logistical, technological, economic and other reasons. In such situations policy makers will insist on an ATE estimate — that is the quantity of interest. If so, how will the statistician estimate and make inferences about the ATE? Once again he faces an aggregation problem. Alternatively, he may push back and say to politician: “I give you CATEs, you compute ATEs (or provide me with your loss function so I can do it for you)”.
My general response is that if this sort of thing is a concern, it would be good to formally model the decision problem and the costs and benefits of different options.
Statisticians typically focus on inference rather than on decisions. In decision analysis, these issues you mention will arise. Most work I’ve seen in statistical decision analysis tends to have utility or loss functions chosen based on mathematical principles rather than applied considerations. We have some examples of more applied decision analysis in chapter 22 of Bayesian Data Analysis.