Where the fat people at?

Posted on February 3, 2016 9:16 AM by Andrew

Pearly Dhingra points me to this article, “The Geographic Distribution of Obesity in the US and the Potential Regional Differences in Misreporting of Obesity,” by Anh Le, Suzanne Judd, David Allison, Reena Oza-Frank, Olivia Affuso, Monika Safford, Virginia Howard, and George Howard, who write:

Data from BRFSS [the behavioral risk factor surveillance system] suggest that the highest prevalence of obesity is in the East South Central Census division; however, direct measures suggest higher prevalence in the West North Central and East North Central Census divisions. The regions relative ranking of obesity prevalence differs substantially between self-reported and directly measured height and weight.

And they conclude:

Geographic patterns in the prevalence of obesity based on self-reported height and weight may be misleading, and have implications for current policy proposals.

Interesting. Measurement error is important.

But, hey, what’s with this graph:

Screen Shot 2015-07-31 at 10.45.29 AM

Who made this monstrosity? Ed Wegman?

I can’t imagine a clearer case for a scatterplot. Ummmm, OK, here it is:

fatties

Hmmm, I don’t see the claimed pattern between region of the country and discrepancy between the measures.

Maybe things will be clearer if we remove outlying Massachusetts:

fatties_no_ma

Maryland’s a judgment call; I count my home state as northeastern but the cited report places it in the south. In any case, I think the scatterplot is about a zillion times clearer than the parallel coordinates plot (which, among other things, throws away information by reducing all the numbers to ranks).

P.S. Chris in comments suggests redoing the graphs with same scale on the two axes. Here they are:

fatties_square fatties_no_ma_square

It’s a tough call. These new graphs make the differences between the two assessments more clear, but then it’s harder to compare the regions. It’s fine to show both, I guess.

26 thoughts on “Where the fat people at?”

Chris on February 3, 2016 9:24 AM at 9:24 am said:

your x and y axis should be the same scale or maybe add the y=x diagonal

Reply ↓
- Rahul on February 3, 2016 10:41 AM at 10:41 am said:
  
  Here’s one point that’s always bugged me:
  
  Are there scatter plot tools that can automatically jitter labels to prevent overlap? Neither Excel nor Matlab nor Gnuplot had any easy way of doing this when I had tried.
  
  Reply ↓
  - Jacob on February 3, 2016 12:06 PM at 12:06 pm said:
    
    ggplot (in R) can do that. In this case the labels ARE the points so that wouldn’t be ideal.
    
    Reply ↓
    - Rahul on February 3, 2016 12:44 PM at 12:44 pm said:
      
      Well I don’t know if some jitter in a point is worse than not being able to read a point at all?
  - Ed Hagen on February 3, 2016 12:09 PM at 12:09 pm said:
    
    There’s a new ggplot extension geom_label_repel:
    
    http://ggplot2-exts.github.io/ggrepel.html
    
    Haven’t tried it yet, though.
    
    Reply ↓
    - Rahul on February 3, 2016 12:19 PM at 12:19 pm said:
      
      Awesome. Looks like this solves the problem.
David Blankley on February 3, 2016 9:28 AM at 9:28 am said:

Why don’t you have the x and y scales and range the same?
That way the reader can quickly tell which states are over or under-reporting vs. a 45 degree line.

Reply ↓
Mitch on February 3, 2016 9:47 AM at 9:47 am said:

– it’s obvious (by inspection) that their conclusion was made (by inspection) using the top two in each list.
– Hi Massachusetts, this is your mom. Is everything OK? Are you eating enough? I worry about you.

Reply ↓
Dale on February 3, 2016 9:49 AM at 9:49 am said:

The first two comments beat me to it. Certainly the trellis chart is not as useful as the scatter, but the scatter does not directly address the claims about whether self-reporting is biased from the actual data by region – without including the 45 degree line. I have plotted it that way as well as just comparing ranks with a 45 degree line. In either case, there is no apparent regional difference. I would note that virtually all states under-report obesity in terms of the raw measures, while they are more or less equally split (between under- and over-reporting) by rank.

Reply ↓
Raghuveer Parthasarathy on February 3, 2016 9:58 AM at 9:58 am said:

I was also going to suggest equal axis ranges and a y=x diagonal. Looking at the paper, I noticed that the authors do exactly this, but for the regionally averaged values rather than individual states (Figure 4).

Reply ↓
Andy W on February 3, 2016 10:06 AM at 10:06 am said:

Yes I get annoyed by these slopegraphs to – a scatterplot is more appropriate in the vast majority of situations. I wrote a (not short enough) paper on the topic, http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2410875. The thing that prompted me was a similar slopegraph on the cover of Albert Cairo’s book.

Reply ↓
Terrence Vance on February 3, 2016 12:06 PM at 12:06 pm said:

If one method of measurement is of better quality (i.e. a “Gold Standard), a Bland-Altman plot might be useful.

Reply ↓
David Einstein on February 3, 2016 12:41 PM at 12:41 pm said:

The self reported numbers, particularly ignoring MA, seem uncorrelated with the measured numbers. I’m too lazy to copy the numbers out of the data though. If anything the reported numbers seem to indicate whether or not the state borders an ocean.

Reply ↓
Rahul on February 3, 2016 1:03 PM at 1:03 pm said:

How about something like this plot:

http://i.imgur.com/d90yxKq.png

Reply ↓
- Dale Lehman on February 3, 2016 1:09 PM at 1:09 pm said:
  
  That is much better. The regional differences can easily done with color, but this display is far cleaner.
  
  Reply ↓
  - Rahul on February 3, 2016 1:34 PM at 1:34 pm said:
    
    Good point. How about this?
    
    http://i.imgur.com/VegUBYY.png
    
    Reply ↓
    - Dale Lehman on February 3, 2016 2:09 PM at 2:09 pm said:
      
      Yes, and I think the upshot is that there is no apparent regional pattern. The only thing left to do (which I have not done yet) is to go back to the scatterplot and see if there is a regional pattern with respect to a regression line (not the 45 degree line). This would tell us if some regions misperceive more than others – but, admittedly, this is a second order effect that I would not find terribly interesting.
    - Rahul on February 3, 2016 2:39 PM at 2:39 pm said:
      
      This?
      
      http://i.imgur.com/kRpZ9eH.png
      
      http://i.imgur.com/BBNK95Z.png
Rahul on February 3, 2016 2:11 PM at 2:11 pm said:

Just for fun I plotted the rank differences too:

http://i.imgur.com/LXGQy6X.png

The ordering by rank difference mismatch seems a lot different than if I just plot the raw percent differences. Not sure if this is just a calculation bug.

Reply ↓
- Dale Lehman on February 3, 2016 2:25 PM at 2:25 pm said:
  
  I also looked at the rank differences a bit, and as you say, they appear quite different. I think the rank differences speak to perceptions about relative positions while the absolute differences just speak to the obesity perceptions vs. reality. However, we are really digging deep here into, what seems to me to be, fairly subtle and uninteresting “findings.”
  
  Reply ↓
  - Erf on February 3, 2016 3:46 PM at 3:46 pm said:
    
    Garden of forking paths? Maybe the researchers found they could see a “clearer pattern” in the rank differences and ran with that?
    
    Reply ↓
  - Rahul on February 3, 2016 3:59 PM at 3:59 pm said:
    
    I haven’t really been able to intuitively grasp the difference between what the “ranking by raw difference” means vs what “ranking by difference of ranking” means.
    
    All I can intuite is that the second approach ought to amplify noise. Akin to how taking numerical derivatives of noisy data does.
    
    But if the relative position of states changes based on the approach I don’t know how to explain that to myself.
    
    Reply ↓
Bob Carpenter on February 3, 2016 3:40 PM at 3:40 pm said:

Those scatterplots are horrible! What’s with the tiny ticks and labels so far away from the ticks? I would also prefer a grey background with some nice lighter grey rules so the contrast isn’t so glaring. And maybe drop the primary colors—they’re tacky. Add a standard legend indicating the color coding rather than sneaking it into the subtitle. Why label all the states rather than just color coding—you don’t want everyone to read all the state names, do you?

And isn’t the whole issue much more socio-economic than geographic? But what I really want is a red-state/blue-state analysis (of income vs. Republican vote share), but instead with income versus obesity. There’s a huge difference (pun intended) in size of people moving from 12th St (where I get on the subway downtown) up to 125th St (where I get off the subway uptown). Similar difference to moving from the neighborhood where my sister now lives (Birmingham, MI) to where I went to high school (Livonia, MI), two very different socio-economic communities in the Detroit suburbs.

Reply ↓
- Martha (Smith) on February 3, 2016 11:05 PM at 11:05 pm said:
  
  Since I grew up in Detroit proper, I couldn’t help but look up whether the differences between Birmingham and Livonia are similar to what I remember — and also to compare with Detroit proper. The differences between Birmingham and Livonia seem a little greater than when I was a kid (with Birmingham still the ritzier), but both are still predominantly white, whereas Detroit is about 83% black. Per capita incomes compare as follows:
  
  Birmingham$68,435
  
  Livonia $29,536
  
  Michigan overall $25,681
  
  Detroit $14,870
  
  Reply ↓
Dalton on February 3, 2016 4:21 PM at 4:21 pm said:

Maryland ain’t the Northeast or the South! Maryland is Mid-Atlantic, where Mid-Atlantic is defined as any state where you can find scrapple on a diner menu.

Reply ↓
Karen Harker on February 8, 2016 8:27 PM at 8:27 pm said:

The original chart was focused on the differences in *ranks*, and the scatterplots are of the actual obesity rates. Hence the authors’ misguided attempt to illustrate the differences in rank with that chart. However, I do believe that focusing on the differences in *rates* is more informative than the ranks.

Regarding what the (definitely improved) graphs reveal – it appears that there is a line that runs parallel to the “no difference” line, only starting at about 33% on the REGARDS axis (FL/NY to LA/MS). Most of these points are in the Southeast, while the blue points are more varied above & below this line. Not sure if this means anything…just an observation.

Reply ↓

Statistical Modeling, Causal Inference, and Social Science

Where the fat people at?

26 thoughts on “Where the fat people at?”

Leave a Reply to Rahul Cancel reply