Is linear regression unethical in that it gives more weight to cases that are far from the average?

I received the following note from someone who’d like to remain anonymous:

I read your post on ethics and statistics, and the comments therein, with much interest.

I did notice, however, that most of the dialogue was about ethical behavior of scientists. Herein I’d like to suggest a different take, one that focuses on the statistical methods of scientists.

For example, fitting a line to a scatter plot of data using OLS [linear regression] gives more weight to outliers. If each data point represents a person we are weighting people differently. And surely the ethical implications are different if we use a least absolute deviation estimator.

Recently I reviewed a paper where the authors claimed one advantage of non-parametric rank-based tests is their robustness to outliers. Again, maybe that outlier is the 10th person who dies from an otherwise beneficial medicine. Should we ignore him in assessing the effect of the medicine?

I guess this gets me partly into loss functions and how we evaluate models. If I remember correctly you were not very appreciative of loss functions in one of your blog entries. As a side note I’d be interested to know of a paper where you explain your rationale for this.

The general point I would make, however, is that there is no “ethically” neutral method. When we adopt a method we, along with it, adopt an ethical stance, whether we know it or not. Ideally scientists ought to be aware of the stance they are taking and be able to offer justifications for it.

My reply:

I think the resolution to this problem is to consider varying treatment effects (see, for example, here and here). If the treatment effect is constant, than the issues discussed above don’t arise: there is one parameter being estimated, and the ethical thing to do is to estimate it as accurately as possible. (In which case it could be considered unethical to use a non-Bayesian approach if good prior information is available, but that’s another story.)

My correspondent wrote back:

Some reactions to the readings:

1. Strictly speaking the problem does not go away. So long as (heterogeneous) estimates still involve some (locally) weighted average, we are still using weights. The choice of any weights is, in my view, of moral significance.

2. The problem becomes less salient — for the statistician. When estimating CATEs the statistician is only responsible for within strata weighting choices. As more strata are added, these choices become less consequential. In the extreme, individuals within strata are identical on relevant covariates so within strata weights don’t matter. The statistician reports CATEs and passes the buck to policy makers.

3. The problem may remain for the policy maker — and bounce back to statistician. Often treatments cannot be finely targeted to sub-populations, for logistical, technological, economic and other reasons. In such situations policy makers will insist on an ATE estimate — that is the quantity of interest. If so, how will the statistician estimate and make inferences about the ATE? Once again he faces an aggregation problem. Alternatively, he may push back and say to politician: “I give you CATEs, you compute ATEs (or provide me with your loss function so I can do it for you)”.

My general response is that if this sort of thing is a concern, it would be good to formally model the decision problem and the costs and benefits of different options.

Statisticians typically focus on inference rather than on decisions. In decision analysis, these issues you mention will arise. Most work I’ve seen in statistical decision analysis tends to have utility or loss functions chosen based on mathematical principles rather than applied considerations. We have some examples of more applied decision analysis in chapter 22 of Bayesian Data Analysis.

17 thoughts on “Is linear regression unethical in that it gives more weight to cases that are far from the average?

  1. Plotting a Loess Curve seems helpful if one is trying to determine if the same inference/relationship can be applied across the full range of your sample.

    • jme6F4 – these plots directly get at what each individual observation contributed to the overall analysis – http://statmodeling.stat.columbia.edu/2011/05/missed_friday_t/

      (In fact I once labeled the right side of figure 7, as dictator versus left side as democratic in fugure13.pdf.)

      Thinking about this post a bit more, a common parameter is always false and some assessment of non-commonness in the analysis always prudent … but I had not thought about arguing for this as being _of moral significance_.

      Maybe I should – as the last journal reject seem to be mostly due to the reviewer claiming it was unclear as to what could be seen or appreciated in the plots not what they were really about.

      In any case, if you want to discern what individual observations contribute to an analysis, I belive you will need to use techniques like those these plots are based upon.

  2. > The choice of any weights is, in my view, of moral significance.

    First for the common parameter for all:

    Would you not avoid spitting into the wind as to possibly avoid offending that direction?

    Actually, I made the same argument (in particular why some survival times get more weight in survival analysis than others) to my biostatistics advisor on a Friday afternoon when I was a masters student: they made it quite clear that agreeing to be my advisor was a huge mistake because I had yet to even grasp likelihood (I actually learned a lot that weekend).

    This is most easily seen in a Bayesian analysis – prior probabilities are conditioned on the data you observed and the likelihood (what the data had to say about the unknown and where formulas like least squares come from) is proportional to posterior/prior.

    Some observations have more of an impact on conversion or prior to posterior than others – this is (should be) no more mysterious than than a feather falling slower that a needle of the same weight.

    For non-common parameters, I agree with Andrew that you need decison theory and the ethical dimension needs to be one of the ingredients (i.e. costs, that might be infinite).

      • Referring to when one (and the same) slope parameter will not suffice for all observations.

        For instance, if y=b*x does not fit males as well as females (an interaction) y=Bm*x or Bf*x might be needed and now there is not a common parameter slope for all. (But almost everyone will still go with a common sigma parameter for all – perhaps too unthinkingly.)

        Strange to me that “non-common” seems so strange in this setting – it is what is meant by an interaction – the effect was not common or the same.

        Understand why a “common in distribution” parameter (my preferred term for a random effect parameter) seems strange, though if random parameters were sampled from arbitrarily different distributions they would not be a sensible statistical model.

        Many seemed to prefer “exchangeable” but I think that refers to observations that can be represented as draws from a common distribution.

    • Though I am no statistician I would argue the issue here is that the choice of a bayesian approach, as well as of functional form for the likelihood, has moral consequence.

      Nothing wrong with posterior is proportional to likelihood times prior.

      Fisher might have used randomization based inference to problems others might have approached using bayesian framework. In small samples, inferences will likely differ. The issue is not what approach is right or wrong, but to recognize that methodological choices have a bearing on inference, and presumably policy.

  3. > “In which case it could be considered unethical to use a non-Bayesian approach if good prior information is available, but that’s another story.”

    I would really like to read this story, I hope you’ll cover the subject.

  4. I can see what the writer is getting at, though I think that this depends heavily on the application and can only reasonably be discussed on a case-by-case basis. I don’t think that one can make a general case against given outliers the weight they have in LS analysis, particularly because this is not an absolute weight but only relative to where the other points are (every point has the chance to be an outlier if the others allow it). Weighting outliers down may be considered as unethical in some applications, too. So probably it’s better to characterise what the methods do in a comprehensive and technical way (weight of outliers etc.) and to bother about “methodological ethics” when it’s clear what the data mean and what is done with the results.
    Same holds for “it’s unethical not being Bayesian where there is prior information” – I’d still want to see the information in question and be convinced by the Bayesian that it’s properly formalised in his or her analysis (and it’s still a myth that frequentists cannot use prior information; of course this can be used for model building).

  5. My grad school Multivariate Statistics professor continually emphasized that all statistical measures and techniques are merely tools and that it would be foolish to expect a hammer to know which nail to strike without some human to guide it. The existence of outliers may be irrelevant or the mere fact that there are outliers may be clues that guide us to a better explanation. With specific reference to econometric models, the lesson I learned was that an econometric models are most useful at the explanatory level, may be useful for predicting, and should never be used to prescribe.

  6. Of somewhat relevance:

    The illusion of predictability: How regression statistics mislead experts

    Does the manner in which results are presented in empirical studies affect perceptions of the predictability of the outcomes? Noting the predominant role of linear regression analysis in empirical economics, we asked 257 academic economists to make probabilistic inferences based on different presentations of the outputs of this statistical tool. The questions concerned the distribution of the dependent variable, conditional on known values of the independent variable. The answers based on the presentation mode that is standard in the literature demonstrated an illusion of predictability; the outcomes were perceived to be more predictable than could be justified by the model. In particular, many respondents failed to take the error term into account. Adding graphs did not improve the inference. Paradoxically, the respondents were more accurate when only graphs were provided (i.e., no regression statistics). The implications of our study suggest, inter alia, the need to reconsider the way in which empirical results are presented, and the possible provision of easy-to-use simulation tools that would enable readers of empirical papers to make accurate inferences.

    http://www.sciencedirect.com/science/article/pii/S0169207012000258

  7. pls more ethics wikipedia copying for $
    I have PhD in molecular biology, and have stopped contributing because of this issue (and having to re re re correct really stupid errors – that I can sort of live with, but not someone taking my work and making money off of it)

  8. Rather than think of this as a purely methodological problem, I think it makes more sense to think of this as a representational problem.

    If one intends to describe a characteristic with an estimator, and the application of the estimator to the measured values is somehow unethical or incorrect in the context of the problem, then this suggests one either hasn’t obtained the correct measurement to represent that characteristic, that the measurement needs to be transformed such that applying the method is consistent with one’s intent, or that one needs to choose a method appropriate for the current representation of the data.

    Of course, there is no ethically “neutral” way of deciding on a representation either – at some point, it comes down to a definitional problem. imo, people that expect to find a purely neutral and objective way of understanding the world are delusional.

  9. It seems to me that there’s an unstated assumption in the note. OLS weights outliers more heavily, but those errors could arise from the measurement process, rather than a property of individuals (variance in response to treatment). In that case, it seems odd to make an ethical attribution, particularly if the purpose is to measure some common parameter of the population.

Comments are closed.