From Discover:

Razib Khan asks:
But follow the gradient from El Paso to the Illinois-Missouri border. The differences are small across state lines, but the consistent differences along the borders really don’t make. Are there state-level policies or regulations causing this? Or, are there state-level differences in measurement? This weird pattern shows up in other CDC data I’ve seen.
Turns out that CDC isn’t providing data, they’re providing model. Frank Howland answered:
I suspect the answer has to do with the manner in which the county estimates are produced. I went to the original data source, the CDC, and then to the relevant FAQ.
There they say that the diabetes prevalence estimates come from the “CDC’s Behavioral Risk Factor Surveillance System (BRFSS) and data from the U.S. Census Bureau’s Population Estimates Program. The BRFSS is an ongoing, monthly, state-based telephone survey of the adult population. The survey provides state-specific information”
So the CDC then uses a complicated statistical procedure (“indirect model-dependent estimates” using Bayesian techniques and multilevel Poisson regression models) to go from state to county prevalence estimates. My hunch is that the state level averages thereby affect the county estimates. The FAQ in fact says “State is included as a county-level covariate.”
I’d prefer to have real data, not a model. I’d do the model myself, thank you. Data itself is tricky enough, as J. Stamp said.
While not providing the data from which they base their results, IHME in Seattle has a better data set of estimates of diabetes prevalence here.
CDC provides both raw data and model. You can get the BRFSS data from CDC, http://www.cdc.gov/brfss/technical_infodata/surve…
Andrew and I once published a paper about decision analysis, applied to home radon measurements and remediation, that had a map of indoor radon concentrations that showed similar discontinuities across state boundaries (Utah and South Carolina, for those of you keeping score at home). The data were collected using separate surveys in each state, and there were some minor differences in the protocols used. On a continuous color or grayscale, the effect didn't look very big, but when we used discrete colors for different ranges (to facilitate printed reproduction) the discontinuities were obvious. We said "Apparent discontinuities across the boundaries of Utah and South Carolina arise from irregularities in the radon measurements from the radon
surveys conducted by those states…" and left it at that. Whaddyagonnado?
Aleks:
I pretty much agree with you, but it's not always so easy to just provide the data. For example, raw disease rates vary by age, so you want age-adjusted values, otherwise you'll just see that people in Florida seem to get sick a lot. Sure, you can define something called "raw age-adjusted data" but that's model-based too.
Beyond this, I like model-based estimates: When they show a problem (as in the map above), that's a useful sign that the model can be improved.
This reminds me of an internet investigation I did into how herpes varies by nation. I was surprised to find out from a wesbsite that every country had about 20% infection percentage. It turns out the website seemed to be crudely multiplying each nation's population by .2 or something similar.
Interesting correspondences are apparent when you compare this CDC map with a map of U.S. federal and state Indian reservations. Of course, shapes differ, because the CDC map displays data within county boundaries.
This is it, as "Hopefully Anonymous" says information like this is not always exact and can change depending on variants, thats said, comparing a study on diabetes with herpes made me chuckle!