Your conclusion is only as good as your data

Posted on June 6, 2012 9:51 AM by Andrew

Jay Livingston points to an excellent rant from Peter Moskos, trashing a study about “food deserts” (which I kept reading as “food desserts”) in inner-city neighborhoods.

Here’s Moskos:

From the Times:

There is no relationship between the type of food being sold in a neighborhood and obesity among its children and adolescents.

Within a couple of miles of almost any urban neighborhood, “you can get basically any type of food,” said Roland Sturm of the RAND Corporation, lead author of one of the studies. “Maybe we should call it a food swamp rather than a desert,” he said.

Sure thing, Sturm. But I suspect you wouldn’t think certain neighborhoods are swamped with good food if you actually got out of your office and went to one of the neighborhoods. After all, what are going to believe: A nice data set or your lying eyes?

“Food outlet data … are classifıed using the North American Industry Classifıcation System (NAICS)” (p. 130). Assuming validity and reliability of NAICS occupational categories is quite a red flag. It means that if something is coded “445110,” then — poof — it’s a grocery store! What could make for easier analysis? But your 445110 may not be like my 445110. . . . A cigarette and lottery seller behind bullet-proof glass is not a purveyor of fine foodstuffs, and if your data doesn’t make that distinction, you need to do more than list it as a “limitation.” You need to stop and start over.

Here’s one way to do it: a fine 2010 Johns Hopkins study edited by Stephen Haering and Manuel Franco. They actually care about their data. Read the first page in particular for the problems of food-store categorization. It matters. And notice the sections titled “residents personal reflections on their local food environment” and “food store owners’ attitudes regarding stocking healthy food.” What a concept for researchers to actually talk to people!

Uh oh. I better be careful here, as I don’t talk to people myself! But I think that’s ok in my case; I’m just a political scientist . . .

Moskos continues:

I find this so frustrating because so much quantitative analysis is so predictably problematic, over and over, again and again, in exactly the same way. Here’s the mandatory (and then ignored) disclaimer (p. 134, emphasis added):

Possibly even more of a limitation is the quality of the … business listings, although this is a criticism that applies to all similar studies, including those reporting significant fındings…. More generally, categorizing food outlets by type tends to be insufficient to reflect the heterogeneity of outlets, and it is possible that more detailed measures, such as store inventories, ratings of food quality, and measuring shelf space, would be more predictive for health outcomes. Unfortunately, such data are very costly and time consuming to collect and may never exist on a national scale.

So let me get this right, because “all similar studies” use this flawed data, it’s OK? And because getting good data may be “very costly and time consuming to collect,” we’ll simply settle for what we have at hand? Bullshit!

You know, perhaps we never will have good data on a national level about what produce is sold in each and every store in America. I can live with that. But it is neither very costly nor time consuming to simply go into every store in any one neighborhood and see what is there. Do a spot check. Or at least read and learn from the John Hopkins study. I just found it on google without even trying. They managed just fine. And if a corner store sells three moldy heads of iceburg lettuce and some rotting root vegetables, it is not the same as Wholefoods simply because they’re both coded 445! . . .

And he keeps going from there. Connoisseurs of multilevel models will appreciate this bit from Moskos:

And if you have bad data, it doesn’t matter what fancy quantitative methods you use. It’s putting lipstick on the damn pig of correlation. Garbage in, garbage out:

The primary dependent variables (i.e., counts of food consumption) are regressed on the explanatory variables using negative binomial regression models, a generalization of Poisson models that avoids the Poisson restriction on the mean-variance equality.

Wow! Negative binomial Poisson regression models to avoid the mean-variance equality restriction. I (to my shame) no longer have any idea what that means, even though Poisson regressions were all the rage when I was in graduate-school. But I do remember the fatal flaw of non-random missing data.

I’m not against quantitative methods. I’m against bad research.

Yup. To put it another way, the researchers should just use the damn overdispersed Poisson regression and don’t make such a big deal about it. The quoted paragraph is a paragraph that wasn’t needed. You should never be doing a non-overdispersed Poisson regression anyway; it shouldn’t even be an option. Mean-variance equality, indeed.

P.S. Moskos also wrote a book called In Defense of Flogging which, as you might imagine from the title, recommends flogging or caning as an alternative to prison as a punishment for convicted criminals. I’m glad somebody wrote this book. I’ve thought for a long time that the flogging alternative is a good one, but when I mentioned it to my friends who were law professors, they said it would never fly. I agree with Moskos, who writes:

After all, who hasn’t committed a crime? Perhaps you’ve taken illegal drugs. Maybe you once got into a fight with a friend, stranger, or lover. Or you drove back from a bar drunk. Or you clicked on an online picture of somebody who turned out to be a bit young. Maybe you’re outdoorsy and were caught hunting without a permit.

Or maybe you’re a boss who knowingly hired illegal immigrants. Perhaps you accepted a “gift” from a family member and told the IRS it was a loan. Or did you go for the white-collar big leagues and embezzle millions of dollars? In truth, you may be committing some crimes you don’t even know about. If your luck runs out, you can end up in jail for almost anything, big or small. And even if you’re convinced that you’re the most straitlaced, law-abiding person in the world, imagine that through some horrific twist of fate, you were accused of a crime. It’s not inconceivable; it happens all the time.

27 thoughts on “Your conclusion is only as good as your data”

Mark on June 6, 2012 11:39 AM at 11:39 am said:

Wow, as a frequent ranter myself, I absolutely love this. Thanks for posting! And, Moskos gets at some of the things that I typically rant about, too, in particular the ubiquitous “limitations paragraph(s)”. I personally think these paragraphs, burried in the Discussion section, are mostly ridiculous. As Moskos suggests, big limitations should be on the first page, right up front (or at least in the Methods).

However, I have a quibble with the way you “put it another way.” I don’t think Moskos is at all talking about the type of model used. I don’t get the sense that he cares about that at all. What he seems to care about (and what I most care about) is paying attention to where your data came from (i.e., study design) and the quality of those data, since that should limit the inferences that you are entitled to make. I know that this will conflict with the modeling/Bayesian/likelihood approaches of many readers, but I think that study design is much more important that what model you use to model observed data.
Pingback: Garbage in, garbage out: what if your Big Dataset is lousy data? « Oikos Blog
zbicyclist on June 6, 2012 1:28 PM at 1:28 pm said:

“New York is filled with bodega “grocery stores” (PROBABLY coded 445120) that don’t sell groceries.”

PROBABLY? A long rant, based on PROBABLY being in a certain SIC code?

I’m skeptical of the whole “food desert” thing, perhaps because I’ve been working in the grocery marketing research industry for far too many years. These guys will build a store anywhere it’s profitable. I’ll try to generate some actual data relevant to this issue.
A. Zarkov on June 6, 2012 3:28 PM at 3:28 pm said:

Based on experience, I think we do have food deserts, especially in places like Detroit. Supermarkets tend to leave high crime areas because the cost of doing business is just too high. Nevertheless the root cause of the problem is the people who live in these deserts. They are too tolerant of the criminals who live amongst them.
Dimitriy Masterov on June 6, 2012 3:30 PM at 3:30 pm said:

I can’t access the study, but I think 445110 excludes convenience stores, so “cigarette and lottery seller behind bullet-proof glass” may be too strong:
http://www.census.gov/econ/industry/current/c445110.htm
zbicyclist on June 6, 2012 5:57 PM at 5:57 pm said:

I’ve looked at some data this afternoon, and what I’m seeing is in line with the RAND study.

First, I need to get a set of relevant points. I chose block group centroids in Cook County, IL (Chicago and near suburbs). I like to do exploratory work with an area I’m familiar with. There are 3992 block groups. I excluded 85 based on large size (e.g. O’Hare Airport) or low population density (<1000 people per square mile), leaving 3907 block groups. There are about 5 million people in Cook County. This screening was done after looking at the block group demos, but before merging with the store list.

Second, I need a set of relevant stores. I picked supermarkets, club stores and CPG mass (Walmart, KMart, Target). Because of the city of Chicago's hostility towards CPG mass zoning, there are very few in the city. I want to emphasize that convenience stores, small food stores (e.g. bodegas), dollar stores, drug stores, liquor stores, and defense commissaries (etc.) are in my parent list but specifically screened out here.

Third, I looked at a count of how many centroids had no store within 1.5 miles of the centroid, which is my operational definition of a food desert. There are very few of these (consisting of block groups with <4% of the population). Of these with no store within 1.5 miles, the largest distance to a store was 2.8 miles. I corrected for driving (versus crow flies distance) and largest drive distance is 3.7 miles (this corrects for "across the river" issues.

While there is an overall income and minority difference, these occurs only at high numbers of stores nearby, and are not really visible for block groups with 0 through 4 choices within a mile and a half of the centroid. Those groups are about half the population. 11% of the population has 10 or more choices within 1.5 miles.

My impression is clearly that if I look at low income areas versus high income areas the low income areas will have FEWER choices (a hardly surprising finding; high income areas have more of most everything) but I'm not finding much evidence for NO choices.

If I get the chance I'll go further, but this is a group of methodologists here and I'd like to get some reaction before I run more stuff.
- zbicyclist on June 7, 2012 8:03 AM at 8:03 am said:
  
  A bit more: I ran Wayne County, MI (Detroit, etc.) since it was mentioned here. 7.2% of the population had 0 choices, 19.4% had only one choice (27% total for no choice / highly restricted choice). The corresponding numbers for Cook County were 3.7% and 7.2% (11% total). That’s a pretty dramatic difference.
  
  I may drop the density and size filtering because since I’m evaluating by percent of population these won’t likely make a difference in the numbers and I don’t want the particular choice of filter to become a red herring.
Bob Carpenter on June 6, 2012 6:14 PM at 6:14 pm said:

My sister’s a defense attorney practicing in Detroit. Many of her clients have zero options to buy fruit or vegetables within walking distance of their homes and they don’t have cars and Detroit’s public transportation is a mess. She says they buy junk “food” at gas stations.

@A. Zarkov: What makes you believe the non-criminal residents of Detroit are too tolerant of criminals? What are they supposed to do? Take up vigilante-ism a la Clint Eastwood in Gran Torino? One huge aspect of the problem is that the city’s gone broke and the automobile factories have relocated for tax advantages elsewhere, so there is very little work in Michigan, much less in Detroit. Another large part of the problem is that anyone with the means has already moved out of the city of Detroit. My family moved out of our West side neighborhood in the early 70s as soon as my parents could afford to. But before that, our home and garage were broken into several times, our car windows were shot out with a shotgun, many houses were burned down and left as shells, etc. etc. We got barbed wire around our elementary school, bulletproof glass in all the shops — all the usual trappings of inner city decay. But I don’t think it was because my parents were “tolerant of criminals”.
- A. Zarkov on June 6, 2012 11:53 PM at 11:53 pm said:
  
  By “too tolerant” I mean community failure to cooperate with police who are sometimes looked upon as an occupying army. About a third of black men will get incarcerated sometime in their life. As such many families see the police as enemies, and won’t bear witness to crimes they see. Detroit never recovered from the 1967 riots. Compare Detroit to Pittsburgh PA. The steel industry took an even bigger hit than the automobile industry. But Pittsburgh adapted to economic reality and rebuilt the city. Today Pittsburgh is a livable prosperous city while Detroit is a hell hole. Don’t blame crime on poverty, blame poverty on crime.
  - krippendorf on June 7, 2012 10:06 AM at 10:06 am said:
    
    Blame the victim much?
    
    There are a lot more differences between Pittsburgh and Detroit than the attitudes of residents in hypersegregated neighborhoods to crime.
    - A. Zarkov on June 7, 2012 3:26 PM at 3:26 pm said:
      
      I blame the criminals and the members of the community who tolerate crime by not cooperating with the police. How can a city be prosperous when crime is rampart? If you want to understand why Detroit is dysfunctional and Pittsburgh is now prosperous look at who lives in those places, and their attitudes.
  - Popeye on June 7, 2012 11:40 AM at 11:40 am said:
    
    If a community views the police as an occupying army, this may be a moral failing on part of the community but it is also a problem with the police and government. To put it mildly.
    - A. Zarkov on June 7, 2012 3:31 PM at 3:31 pm said:
      
      Over the last 35 years Detroit has had a black government and a black police force, so the “occupying army” excuse has worn a little thin.
      
      Cassius:
      “The fault, dear Brutus, is not in our stars,
      But in ourselves, that we are underlings.”
      Julius Caesar (I, ii, 140-141)
Steve Sailer on June 6, 2012 8:28 PM at 8:28 pm said:

Most urban dwellers live within convenient driving distance of a well-stocked supermarket. One issue, though, is that many don’t drive, and thus do most of their grocery shopping at smaller nearby outlets that don’t stock fresh arugula (or whatever magic green is supposed to prevent obesity).

Somebody could do an interesting study comparing urban poor without cars to small town poor with cars. My guess is that both would turn out pretty obese on average. Or, somebody with a lot of foundation funding could try an experiment: open a Whole Foods in the middle of a public housing project. Any guesses whether that would reduce obesity?
- Andrew on June 6, 2012 9:48 PM at 9:48 pm said:
  
  Steve,
  
  There is a Whole Foods across the street from a public housing project not far from where I live. So no funding needed; the natural experiment is already happening.
- Jared on June 7, 2012 5:12 PM at 5:12 pm said:
  
  Accessibility to healthy food is only one necessary but wholly insufficient part of the problem; cost and education and probably a host of other things matter too. But it doesn’t make sense trying out interventions to encourage healthy eating if people can’t actually get the good stuff. You know, cart and horse…
Jonathan on June 7, 2012 8:59 AM at 8:59 am said:

I am actually doing a study using a difference in difference design for a low income neighborhood (that is classified as a food desert. I present results next week, but essentially the store was given a host of financial incentives to build the store. However there is no real difference in (self-reported) food consumption within the community for adults and children.
krippendorf on June 7, 2012 10:24 AM at 10:24 am said:

As I recall, one of the stronger findings of the quasi-experimental (Moving to Opportunity) MTO studies were that the people who moved out of high poverty neighborhoods and into low poverty neighborhoods had significantly lower incidence of obesity and obesity-related illnesses at the various follow-ups than the people in the Section 8 voucher-only (i.e., only enough to move to Section 8 HUD housing) and no-voucher groups. Granted, that’s not evidence of food deserts, per se — first, there was likely non-random compliance with the “treatment”, and second, any causal effect could be from some other source (e.g., safer neighborhoods = more opportunities for low-cost exercise).

zbicyclist: in parts of Wayne and Cook counties, 1.5 miles is a helluva long way to go to get to a “real” grocery store, particularly if you are a woman and/or have a job that requires that you shop at night. Which I think gets back to Moskos’ broader point: if you (general you, not you in particular) want to understand the lived experience of poverty, you need to walk a mile in a poor person’s shoes. Or a mile and a half, in this case.
- Jonathan on June 7, 2012 11:36 AM at 11:36 am said:
  
  Yeah I did my results looking at subgroups (individuals who live close to the new store). If anything the results remain the same, reinforcing the results.
Pingback: Blogging about flogging « Brandon Seah
Eli Rabett on June 7, 2012 11:33 AM at 11:33 am said:

There is a fairly simply way of estimating this. Pick the local supermarket chain and go to their store locator.

It should also be pointed out that cities have expended a great deal of energy and money trying to get supermarkets into inner city/high poverty neighborhoods with only limited success.

BTW, poor people can;t afford Whole Foods.
- zbicyclist on June 7, 2012 1:59 PM at 1:59 pm said:
  
  It’s not quite that simple. In general, the conventional chains have disinvested in low income neighborhoods over decades, and so if you just check these chains you won’t find much. Here, for example, is Google Maps version of Jewel (the largest supermarket in Chicago). You won’t see much on either the west or south sides of the city proper.
  
  http://goo.gl/maps/jMYX
  
  So if you just focus on the large conventional chains you will come to the conclusion that there’s not much available.
  
  Most of the remaining stores (or re-opened Jewels) are independents. A major exception is the giant chain Aldi, which obviously has figured out a way to make money in these areas (Hint: it’s not by imitating Whole Foods — but they certainly do have fresh produce). Here’s the same map for them, which is much different.
  
  http://goo.gl/maps/0gQE
- A. Zarkov on June 7, 2012 3:42 PM at 3:42 pm said:
  
  Indeed. Even the middle class can no longer afford Whole Foods. I’m near a Whole Foods store, and I used to buy a lot there ten years ago. Today I buy little because their prices have gone through the roof. Toady I shop at Trader Joe’s and Lucky’s. Lucky’s produce section is actually better than Whole Foods and much cheaper. I also use a local farmers market where I know some of the farmers personally, and I get excellent produce.
  
  One of the reasons Whole Foods is so expensive has to do with so-called “organic produce” which is a scam. There is nothing wrong with conventional produce. Putting a Whole Foods in a poor neighborhood does a disservice to the people who live there.
jrkrideau on June 7, 2012 3:42 PM at 3:42 pm said:

I have no problem thinking of a food desert if a person has to walk 1.5 miles to shop. It would seem to me that at 1.5 miles (3.0 miles round trip) to a grocery store we are asking the shopper to make a very considerable investment in time and effort to access healthy food.

An adult human in passably good condition can probably walk at 3.5 or 4 miles an hour for a distance of 1.5 miles, less if carrying loaded shopping bags [1]. The elderly or those with mobility impairment would of course take longer–this would include the morbidly obese.

(All times calcs below,done on the fly so not exact by any means.)

I expect a backpack would provide the same 3.5 – 4 mph speed on the return trip but carrying heavy shopping bags would increase travel time as the weight is poorly distributed, hurts hands and puts a strain on arms and shoulders so may mean rest stops as well as a slower pace.

So we are talking about a round trip of 30 to 40 minutes plus shopping time (perhaps 15 minutes?) at best.

Now how much can a person reasonably carry? This is a question of both mass and volume.

I know that soldiers carry in excess of 50 lb but we are talking about healthy, highly trained people. Even with a knapsack I would not expect most healthy people to want to carry much more than 15-20 lb and probably less. As above elderly, mobility impaired, etc would likely carry much less.

I find it difficult to envisage a shopper walking a 3 miles carrying a 10 lb bag of potatoes, a 3 lb bag of onions and 10 lb bag of rice. A litre of milk weights in the neighbourhood of 1 kg or 2.2 lb.

Of course the 10 lb bags are perhaps a bit extreme but if you go for a 2 lb bag then the frequency of your shopping trips goes up very quickly and typically the items are more expensive on a per unit cost.

For volume, again the bags of potatoes and rice, even the small ones are unwieldy and once you add a bag of healthy oranges, a couple of head of lettuce and perhaps a 3 lb bag of carrots you are starting to hit some real volume constraints unless you have specialized carrying gear[2].

So we have a trade-off here. Ability to carry the load versus the number of trips to the store. Are you willing to stagger back home with very large weights of groceries, or vast numbers of carrying bags or make multiple trips.

It strikes me that if one was feeding a family of, say four, one would have to plan on 3 or 4 trips to the grocery store per week at a minimum and I may be underestimating as I really have not worked out the food requirements, just making some wags.

So, at my very rough estimate, in order to eat a healthy diet we would expect a shopper with a 4 person household to spend (very roughly) 3 to 4 hours a week on shopping trips spread over 3 or 4 days although I suppose one could do several trips in one day.

This does not include any other shopping such as paper towels, toilet paper, cleaning supplies, pop, or any of the other strange items I find my local Loblaw’s selling and both toilet paper and paper towels are volume consuming items so again we have volume constraints.

Besides krippendorf’s comments about walking in some neighbourhoods, which I agree with, don’t forget to factor in weather and traffic.

Rain makes it unpleasant; snow and ice can slow down the walker and pose a hazard to him or her; high heat can be dangerous; climbing and descending steep hills is a strain. Poor walking surfaces (broken sidewalks, pooled water, potholes) are both a hazard and slow down the walker. Lack of sidewalks forcing a person to walk on the roadway increases safety concerns.

I think the 1.5 mile distance is an overestimate of what one can realisticly expect a person to travel regularly to shop if less nutritious but eatable foods are available nearer.

As an experiment, try it. Make up a reasonable, relatively small, shopping list and walk to your nearest decent grocery store that is at least 1 mile from your home, shop and walk home.

Then ask yourself 1)how often in a week would I do this? 2 how often would I have to do this for me and my family to eat as healthily as we do now? If 1 != 2 we have a problem.

1. Based on some estimates I have made working from such things as military marching speeds, some sports studies-no idea where most of the references would be now–and informal observations of pedestrians.

2. Personal experience in that I have not owned a car for years and do my grocery shopping by bicycle. Volume constraints are a constant issue when buying things.
Evan on June 7, 2012 6:28 PM at 6:28 pm said:

Almost, but here’s the real kicker that the author of this blog probably buys into:

Results = Data + Assumptions (statistical model, lost to followup, exposure/outcome misclassificiation, etc.)

Who cares what one specific technique or model the authors used? How much error is the model selection introducing?

I’d prefer that several models are run and the results of all models, significant or not, attention-grabbing or not, are reported so that the reader can understand the potential error introduced by the model selection.

For this study type and all epidemiological studies.

As we know in practice this almost never happens. Publish something publishable or perish. And comprehensive uncertainty analyses are almost never published. Are we trying to good public health or not?
Steve Sailer on June 7, 2012 7:33 PM at 7:33 pm said:

One undiscussed issue is that groceries have gotten heavier over the years, making it harder to get them home without a car, especially for women (who do the bulk of shopping). Consider orange-flavored beverages. In the 1960s, my mother bought powdered Tang and mixed it with water from the sink. By 1980, she had switched to frozen condensed orange juice. By 1990, cartons of orange juice. With a car and a house with a driveway, this made perfect sense in terms of improved flavor, but rising standards of quality are a real problem for people without cars, and who live in walk-up apartments.
Helen DeWitt on June 9, 2012 3:57 PM at 3:57 pm said:

In “Better,” Atul Gawande talks about work done by Save the Children in Vietnam. Two aid workers talked to poor villagers, asking them to identify the families whose children were well-nourished. These families were then asked to share tips. These were seen as workable by other members of the village, since put into practice by people with similar challenges, a similar background. It might have made better sense to do something in the deprived neighborhoods the US government wants to improve. There are presumably some families that are successful in providing healthy diets for children, avoiding the obesity which is seen as such a concern. What are they doing? How are they circumventing the limitations of the environment? Is easy access to a store with a range of healthy foods in fact the thing that would be most useful in helping other families to follow their example?

I note that I live in an apartment which is a 5-minute walk to two small supermarkets, both with a decent range of fresh produce, and two delicatessens, both with an excellent range – and I’m afraid I am doing well if I eat even one piece of fruit or any kind of vegetable in the course of a week. (Years ago I thought it was very funny that a friend’s sister, a vegetarian, did not like vegetables. I’m a vegetarian, and I STILL don’t eat fruit and vegetables.) There may well be a solution to these appalling habits; if so, it can hardly be the introduction of a convenient supermarket, and is likely to be simpler and cheaper.

Comments are closed.