Further evidence of a longstanding principle of statistics

The principle is, Whatever you do, somebody in psychometrics already did it long before.

The new evidence comes from an article by Lawrence Hubert and Howard Wainer:

There are several issues with the use of ecological correlations: They tend to be a lot higher than individual-level correlations, and assuming what is seen at the group level also holds at the level of the individual is so pernicious, it has been labeled the “ecological fallacy” by Selvin (1958). The term ecological correlation was popularized from a 1950 article by William Robinson (Robinson, 1950), but the idea has been around for some time (e.g., see the 1939 article by E. L. Thorndike, On the Fallacy of Imputing Correlations Found for Groups to the Individuals or Smaller Groups Composing Them).

8 thoughts on “Further evidence of a longstanding principle of statistics

  1. As a psychologist with psychometric tendencies, this is so very, very true.

    I really need to go back and read Thorndike, Thurstone and all of the others, as they appear to have been insanely ahead of their time.

    It also makes you wonder what happened to psychology? Personally, I blame SPSS, but I may be somewhat biased.

    • I kind of agree. Sometimes I review a paper, and the author’s response says “This can’t be done in SPSS, so we didn’t.”

  2. What’s the title of the Hubert & Wainer article? And what journal is it in? A text search for ‘wainer’ on Hubert’s publications page returns nothing, and a search for ‘hubert’ on Wainer’s page gets one hit in reference to a 2012 book they’re publishing together.

  3. This showed up on my ecological screen. In Lupia and McCue (1990, Law and Policy), formulas for the relationship between the correlation coefficient and ecological regression estimates (under the assumption that there are two groups in the electorate and responses (vote) are only available in aggregate form) are derived. In particular, regarding Andrew’s comment:

    There are several issues with the use of ecological correlations: They tend to be a lot higher than individual-level correlations,

    this is a function of the number of voters in an electoral unit (under some assumptions on how the decisions of individual voters aggregate).

    In particular, if p1 is the probability of those in group one who support a candidate, x1 the number in group 1 in an electoral unit and p2 is the probability of those in group two who support a candidate and x2 the number in group 2 in an electoral unit, then letting

    V = p1 x1 + p2 x2 + u,

    the correlation of V and x1 (as related in the above) is simply

    (p1 – p2)/[(p1 – p2)^2 + Var(u)/Var(x1)]^.5.

    This is equation (4) in Lupia and McCue. To obtain an idea of when this correlation is lower or higher than p1 – p2, it is necessary to make some assumptions about the error term. Skip and I followed Hawkes (1969, Journal of the Royal Statistical Association) in assuming a central limiting process. When this is done (details in the paper),
    orders of magnitude for the correlation coefficient relative to the proportions p1 and p2 can be derived.

    Results for different electorates and different distributions of x1 and x2 are given in Figure 2 in the paper. Our conclusion
    (page 358), was that “once the number of voters in an electoral unit is over two hundred, the correlation coefficient will almost always be greater than p1 – p2”. Since most electoral units in the United States are above that size, one would expect the correlation coefficient to be greater than p1 – p2. I’m not certain what statistic Andrew is referring to regarding individual correlations, but p1 – p2 (the difference in voting behavior in the two groups) is the usual concern in Voting Rights Act analyses of racially polarized voting.

Comments are closed.