Skip to content
 

Unalphabetize!

I dream of a day when a journalist such as Ezra Klein, when seeing a graph such as this from Rob Goodspeed,

wordcounts.jpg

will immediately say, Hey! Why are these items in alphabetical order? That just confuses things. (It’s not like they need to be in alphabetical order so that we can look up “faith” in the index or whatever.)

I have no substantive comment on the graph except that it seems unfair to McCain in that his page has fewer total words, which as displayed in the graph makes him look less substantive overall. I mean, maybe it’s just a choice for him to focus on just a few issues.

P.S. I’m not knocking Goodspeed, who put in the work to make the graph, or Klein, who went to the trouble of finding it. I’m just saying that in the ideal world, an irrelevantly alphabetized graph would JUMP OUT OF THE PAGE as something not quite right, in the way that a typo or grammatical error does now. But, hey, my job is education, right? So here’s my try.

P.P.S. Howard Wainer has called this the Alabama First error and wrote an article on the topic in Chance in 2001.

8 Comments

  1. Radford Neal says:

    I think this is about as bad as it gets for alphabetization. Putting states in alphabetical order does often make sense – someone may indeed wonder what the numbers for Utah are. But nobody is going to wonder what the numbers are for "Human dignity" or "Urban policy", because nobody could guess a priori that these categories exist. Which is perhaps another problem with the data…

  2. Peter says:

    There's a nice article on this topic by Michael Friendly…

    Friendly & Kwan (2003)
    Effect ordering for data display
    Computational Statistics and Data Analysis
    v. 43, pages 509-539

    I'm guessing that ordering by the ratio of Obama/McCain would be good…. or maybe change it to 'proportion of words' and order by the ratio of proportions

  3. Do you suggest I sort it by McCain, Obama, or perhaps in descending order of a new index combining both?

  4. Steve Kass says:

    Hm. A few journalists have reported, if not asked why it is, that the issues on Obama's issues page are listed in alphabetical order.

    Which they aren't.

    When something is not in alphabetical order, asking why not might yield an interesting answer.

    Why aren't the issues [on Obama's issues page] in alphabetical order?

    Here's one hypothesis: the "Seniors & Social Security" item on Obama's page might once have been titled "Social Security". Other thoughts?

    Steve

  5. Andrew says:

    Radford: Yes, perhaps a tree structuring would be better.

    Peter: Yes, I'm familiar with the Friendly and Kwan paper and should've mentioned it.

    Rob: I'm not sure. When writing the blog entry I was thinking that there's more than one way to do it. One approach is simple decreasing order in total mentions, another approach (which I think I favor) is in decreasing order of Obama – McCain difference, so that the graph starts with Obama's issues on top, goes through the mixed issues, and ends with McCain's at the bottom. I would first divide each candidate's numbers by his total number of mentions.

    Steve: Interesting example of data exploration.

  6. I like your idea to do it by the total difference.

    I didn't want to normalize the data by dividing by total because I think there is meaning in the absolute variation itself. Although it could be interesting as a secondary analysis, but then again any meaning here is already pretty subjective. On the other hand, one could do a word frequency analysis of their speeches …

  7. Hadley says:

    In my experience, ordering by means tends to give the best visual effect. I've ordered by totals and differences in the past and spent 5 minutes wondering if the code worked or not because the appearance did not match what I expected (my expectations were wrong).

    In R, this is very easy to experiment with the reorder function. Perhaps Rob could post his code and we could provide some other options? (A parallel coordinates plot would be another option, which would increase the data-ink ratio).

  8. Lord says:

    I disagree this is unfair to McCain. The willingness to say less may be less substantive or may be more principal based, offering less specifics. Fewer issues may signify more focus or an absence of anything to say on some subjects. Not favorable or unfavorable in itself.