Skip to content

On summarizing a noisy scatterplot with a single comparison of two points

John Sides discusses how his scatterplot of unionization rates and budget deficits made it onto cable TV news:

It’s also interesting to see how he [journalist Chris Hayes] chooses to explain a scatterplot — especially given the evidence that people don’t always understand scatterplots. He compares pairs of cases that don’t illustrate the basic hypothesis of Brooks, Scott Walker, et al. Obviously, such comparisons could be misleading, but given that there was no systematic relationship depicted that graph, these particular comparisons are not.

This idea–summarizing a bivariate pattern by comparing pairs of points–reminds me of a well-known statistical identities which I refer to in a paper with David Park:


John Sides is certainly correct that if you can pick your pair of points, you can make extremely misleading comparisons. But if you pick every pair of points, and average over them appropriately, you end up with the least-squares regression slope.

Pretty cool, and it helps develop our intuition about the big-picture relevance of special-case comparisons.


  1. anon says:

    Mark Berman wrote a nice 1988 paper giving the general case of the averaging-pairs result – and describing how it had been rediscovered over the years, multiple times.

  2. Ken Williams says:

    "Misleading" is such an ugly word. Let's call it "robust".

  3. Sebastian says:

    that's a really neat little proof – I often find it worthwhile to think about more "qualitative" comparisons in terms of their statistical counterparts. Gerring and Seawright have a nice piece on analogies between statistical techniques and case selection strategies in small n research in Gerring's case study book.

  4. K? O'Rourke says:

    Actually, the _well known_ generalized inverse (and special case of trivial likelihood factorization).

    Pena's paper being a nice accessible source –

    I asked him once, while being a Bayesian why he had not considered using likleihood – he said he just did not think of it.

    For some detail of the trivial likelihood factorization see

    And hopefull soon in a preprint I'll post here.


  5. K? O'Rourke says:

    Opps, Pena's weighted sum expressed slightly differently.

    His was weighted sum of (yi – – i from 1 to n.

    Used his version once in teaching linear regression and it confused the students soo much I never used it again.

    Hiding the and by writing it as a double sum might help – it makes it look like each weighted piece only depends the yi and xi and not the others.

    I do though find likelihoods easier to understand and fully explicit about what parameters are involved (intercept? and slope).


  6. derek says:

    When I clicked on the "evidence that people don't always understand scatterplots," I expected to find something other than a repetition of the NYT staff's belief that their readers don't understand scatterplots. But it was just the same single data point again. The evidence amounts to no more than the prejudice of one office full of people, repeated over the internet again and again.

  7. Justin Smith says:

    Cool. This reminds me of the Theil estimator from nonparametric statistics, where betahat_1 is the median of all possible pairwise slopes.