Statistical distribution of incomes in different countries, and a great plot

This post is by Phil Price.

This article in the New York Times is pretty good, and the graphics are excellent…especially the interactive graphic halfway down, entitled “American Incomes Are Losing Their Edge, Except at the Top” (try mousing over the gray lines and see what happens).

The plot attempts to display the statistical distribution of incomes in about 10 different countries. That alone is not so easy; one natural idea is to display a bunch of histograms in a small multiples plot. But the plot also tries to show how each of the distributions has changed since 1980. I can think of other approaches to this plot that might be worth trying, but I’m not sure any of them would be better. Nicely done, NYT graphics team.

If I wanted to make interactive graphics like this myself, I could presumably figure it out. But suppose I want to do it routinely, as part of exploratory data analysis. I don’t necessarily need polish, just the basics. I’d like to work within R but am open to other possibilities. What are my options? GGobi? iPlots? Anything else worth considering?

This post is by Phil Price

14 thoughts on “Statistical distribution of incomes in different countries, and a great plot

  1. There’s a lot of cool stuff you can do just with shiny, but you can only manipulate the chart through checkboxes and dropdown menus etc, so not by directly interacting with the chart. rCharts and ggvis seem to only let you hover over a data-point to get extra info. The best thing about the NYT chart is selecting the data series by clicking on one of its members! You can do that with the googlevis package, but that only really does prepackaged google charts, some of which are pretty cool (the Hans Rosling “MotionChart”), but they are essentially uneditable – you have no control over most graphical parameters. I’d love to hear of an R option for this!

    • The brush function in ggvis can be used to select individual and/or groups of points, but it takes a bit of effort to do so. If you search for selecting points with linked brush in ggvis you’ll probably come up with a few good links to help you out.

  2. These seem like good suggestions. Google Fusion Tables comes to mind, and I’ve heard of but not used plot.ly.

    There are ways of interacting with graphics by typing instead of mousing that might be worth considering, too. That opens the field up to J’s plot routines (http://www.jsoftware.com/jwiki/Plot) and Gnuplot. You can certainly program interactivity into J’s graphics, and you can probably do that in other languages.

  3. I like the plot, but I have a problem with the statistic they’re using: median per capita income? The only way that makes sense is if you take household income, divide by household person count, and then take the median of those numbers. But fertility varies a lot across income levels (I think), so is this telling us about how income is decreasing or how family size is increasing? I don’t know.

    Consider the stereotypical Google or Microsoft employee: single, white, male, computer programmer, spends 12 hours a day writing code, makes $150k salary, and has some kind of stock options which when exercised produce a big income boost, household of 1, per capita family income $150k

    compare to a successful university professor: married with 2 or 3 kids, spouse works or doesn’t work, family income between $100,000 and $200,000, per capita family income between say $20k and $70k depending on family size… that’s a big difference from the other employee.

    • In case it’s not clear, on paper the Googler is getting a lot more income, but by every measure I would care about, the second family is “wealthier”.

      Perhaps this doesn’t affect the median, but I have to think that it does. This measure is mixing up single gas station attendants making $20k and household of 1, with a family with a tenured university professor and an adjunct spouse with 3 kids making $100k per year. It just doesn’t make sense to me.

      • I don’t know how “median per capita income” is defined but I’d guess it’s something like you suggest. I agree there are problems with it, but you have to start somewhere, and you can only use data that you can get your hands on. I think that if they do per capita income in some way similar to the way you posit, that’s not bad.

        There is indeed a relationship between fertility and income but I don’t think it’s all that big anymore. I could be wrong. There are other funny things about the statistic too. But as an overall picture, I think looking at per capita income is pretty good. Sure, a single guy earning $50K after taxes and a household with 2 parents and 2 children and a total household income of $200K after taxes are not exactly equally wealthy, in spite of their equal after-tax income per capita, but they’re not wildly different either. It’s certainly better to look at incomes per capita than per household. I could imagine another parameter that counts a child as 1/2 and adult or something, that might better reflect the difference in true wealth, but I’d call that a minor improvement, I don’t think it would change the overall picture very much.

        I think a much bigger flaw is looking at after-tax income without looking at what the tax money is doing; for comparing country to country this is a big problem. Those tax dollars are not just removed from people’s paychecks and discarded; they are used to pay for things. Americans get to keep more of our paychecks, but we get less in the way of government services. The methodology of the article implicitly puts the value of government services at 0.

      • Well, to be fair with regards to your wealthiness assessment, they are trying to measure income, not wealth. I don’t find your example too objectionable overall in this light — I’m sure the 150K a year constant coder with stock options is a lot more free-spending and makes decisions with a far more financially unconstrained mindset.

        Seems very american to mistake income for quality of life ;)

        • nah, you say ” they are trying to measure income, not wealth” and that’s true, but I think people only care about income as a proxy for wealth.

    • Yeah, examples of bad plots are fun, and can be instructive if you think through what it is about a plot that makes it bad, but having good examples to follow is even better.

      Andrew has previously mentioned to me that he prefers Cleveland’s “The Elements of Graphing Data” to the Tufte books, because the former tries to establish explicit principles you can follow to make a good plot. Tufte does a bit of that too, but (a) not as much, and (b) I disagree with him on some of his principles, in fact I find some of them sort of ridiculous. (For instance, I think Chernoff faces are a bad idea in the first place, but Tufte’s “improvements” are ludicrous and make a bad idea much much worse in application).

      Anyway, yes, it’s nice to see a graphical approach that works and that I would like to emulate.

Leave a Reply to Mark Patterson Cancel reply

Your email address will not be published. Required fields are marked *