Skip to content
 

Crime data bonanza!!!

Mike Maltz writes,

A New Data Set Available through Ohio State University’s Criminal Justice Research Center

So you think you know how to analyze time series! Well, how would you like to test your mettle on over 400,000 time series, each with up to 540 data points? The time series in question are monthly data from 1960-2004, for over 17,000 police departments, for seven crime types (murder, rape, robbery, aggravated assault, burglary, larceny, vehicle theft), as well as their sum (the so-called Crime Index), and an additional 19 subcategories – e.g., robbery with a gun, knife, personal weapons (hands, feet, etc.), or other; attempted rape; auto, truck or bus, or other vehicle theft. Or you can just view the data in different cities over time and see whether it rises and falls with various tides (unemployment, immigration, poverty, age or ethnicity distribution, etc., whatever your pet theory is). I [Maltz] have put all of the files and a plotting utility (so you can see each agency’s crime history) in a zipped file. Download it from http://sociology.osu.edu/mdm/UCR1960-2004.zip.

The data consist of monthly counts of these crimes reported by police departments throughout the country to the FBI as part of its Uniform Crime Reporting (UCR) Program. Since reporting to the UCR Program is entirely voluntary, some agencies are less than diligent in doing so, but for the most part they comply. However, major gaps still remain; for a discussion of these gaps, see “Bridging Gaps in Police Crime Data,” published by the US Bureau of Justice Statistics. Under a series of grants from the US National Institute of Justice, Harry Weiss, a graduate student here at OSU, and I cleaned the data as best we could.

Some of the gaps are just inadvertent (or, as statisticians would say, MCAR, missing completely at random). These can usually be filled in using relatively simple algorithms. The more significant problems, however, are those that are not gaps but “underestimates,” as when the City of Atlanta was bidding (successfully) for the Olympics and lowered its crime statistics in a more, shall we say, “hands-on” way (see http://www.cnn.com/2004/US/South/02/20/atlanta.police.audit.ap/index.html); New York, Philadelphia and Boca Raton also have had their own reporting scandals (http://query.nytimes.com/gst/fullpage.html?res=9F06E2D91F38F930A3575BC0A96E958260); and according to the creator of HBO’s “The Wire,” Baltimore is even better at it (http://www.huffingtonpost.com/david-simon/the-wires-final-s_b_91926.html):

“In Baltimore, where over the last twenty years Times Mirror and the Tribune Company have combined to reduce the newsroom by forty percent, all of the above stories pretty much happened. A mayor was elected governor while his police commanders made aggravated assaults and robberies disappear.

“… It would not have been easy for a veteran police reporter to pull all the police reports in the Southwestern District and find out just how robberies fell so dramatically, to track each individual report through staff review and find out how many were unfounded and for what reason, or to develop a stationhouse source who could tell you about how many reports went unwritten on the major’s orders, or even further — to talk to people in that district who tried to report armed robberies and instead found themselves threatened with warrant checks or accused of drug involvement or otherwise intimidated into dropping the matter.”

Not all cities manipulate crime statistics. Even so, you might want to get rid of all of your preconceptions of how to deal with these data. It’s for that reason that a plotting utility is the centerpiece of the data set. You have to look at the data, not just throw it into the computerized maw and let Stata or SAS or SPSS give you some p values. By visually inspecting the data, you might see what the effect of a new policy, or police chief, or law has on crime. You might compare different cities with different characteristics. Whatever you do, it’s a relatively new data set that hasn’t yet been used much at all, so you’re getting in on the ground floor.

5 Comments

  1. ZBicyclist says:

    I haven't looked at crime data in decades; the old thought was that murder is best reported because it's serious and RELATIVELY unambiguous.

    Therefore, it might be possible to use relative rates of murder and other crimes as some sort of under-reporting indicator.

  2. scott cunningham says:

    Very interesting. You know, this is one of the main sub-plots in all five season of The Wire, too.

  3. skh.pcola says:

    That dataset is 150 MBs, if anybody interested in downloading it. I think I'll wait to use the campus high speed lines, instead.

  4. Hadley says:

    I've been working with Mike to get the data in a format that's more amenable for analysis with a statistics package (i.e one csv for each state). Let me know if you're interested – my contact details are available at http://had.co.nz

  5. This is fantastic. I'd like to add it to http://infochimps.org — may I?