As Kaiser writes, “from a purely graphical perspective, the chart is well executed . . . they have 54 points, and the chart still doesn’t look too crammed . . .” But he also points out that the graph’s implicit claims (that tax rates can explain happiness or cause more happiness) are not supported.
Kaiser and I are not being picky-picky-picky here. Taken literally, the graph title says nothing about causation, but I think the phrasing implies it. Also, from a purely descriptive perspective, the graph is somewhat at war with its caption. The caption announces a relationship, but in the graph, the x and y variables have only a very weak correlation. The caption says that happiness and progressive tax rates go together, but the graph uses the U.S. as a baseline, and when you move from the U.S. point on the graph to the right-hand side (more progressive taxes), you see a lot more points below the line than above the line. Thus the visual impression of the graph is that more progressive taxes will lead to lower happiness—the opposite of the message from the caption.
What can be done here?
I don’t exactly think the graph is “bad data,” and, although the graph says little directly about causation, the data have some relevance to our understanding of policy debates over taxes. If nothing else, we learn that tax progressivity and average happiness some variation among countries. I think a start would be to reframe and put happiness on the x-axis and the tax system on the y-axis, which would allow us to see that, at any happiness level, there is a range of tax systems. with none of the very happiest countries having flat taxes.
Better still might be to make a line plot with three columns: First, a list of country names, in decreasing order from richest to poorest (using, for example, per-capita GDP (yes, I know, such data aren’t perfect!)), then a column showing tax progressivity (if that’s the measure they want to use), then a column showing average happiness.
The advantage of this pair of dotplots is that you get to see the spread in each of these variables with respect to a natural measure (how rich the country is), and there’s no implicit causal story getting in the way.