Skip to content

Spatial patterns in crime: Where’s he gonna strike next?

Wouter Steenbeek writes:

I am a criminologist and mostly do spatial analyses of crime patterns: where does crime occur and why in these neighborhoods / at these locations, and so on. Currently, I am thinking about offender decision-making behavior, specifically his ‘location choice’ of where to offend.

Hey, how about criminologists instead of looking to someone else to solve their problem, do something about maybe taking CPR classes?

But I digress.

Steenbeek continues:

You may be surprised that I contact you as you don’t work on this substantive subject, but the models used to analyze such behaviors are familiar to you as they are used in (voting) choice behavior: multinomial logit. More specifically, most criminological studies use McFadden’s conditional logit model. Just like a voter can choose from 4 political parties, or a traveler can choose between 5 modes of travel behavior (choice between car, bus, taxi, walk, or bike), a criminal can choose where to offend. Characteristics of the offender, characteristics of each choice alternative (e.g. deterrence level at each location), and offender-specific characteristics of each choice alternative (e.g. distance from each offender to each location) are then used to model the choice of crime location.

The main difference is that the offender chooses the *location* where crime is committed, and therefore the choice set is usually much larger, depending on the definition of ‘location’. Often, location refers to “neighborhood” (census tract). When studying offenders within one city, each offender is modeled to choose which neighborhood he commits crime in. This can easily be a choice set of 50-100 neighborhoods. But location can also refer to smaller spatial units of analysis such as street segments (the part of a street between two intersections), leading to a choice set of a few thousand (!) alternatives.

A disadvantage of the conditional logit model is the assumption of the independence of irrelevant alternatives. Especially in spatial analyses where nearby locations are vey similar to each other, this violates the IIA assumption. Exactly *two* studies have used the ‘mixed logit’ model that does not suffer from IIA. The R package RSGHB ( can be used to estimate these models using a Hierarchical Bayesian framework.

(1) I would prefer to work with. But: can such mixed logit models be programmed relatively easily in Stan? Or would you suggest to keep using RSGHB? (RSGHB uses Metropolis-Hastings)

A second question is with regard to the use of an informative prior / knowledge about each offender. One can only commit crimes in locations where one has knowledge of. (Let’s assume for now that offenders only have knowledge of locations that he visits himself, and that the study area is limited to one city and the locations chosen are neighborhoods). In the ideal data situation, we would know exactly how familiar each offender is with each neighborhood. Then one could use multinomial logits with varying choice sets. (An equivalent example from voting behavior would be that a voter in district A can choose from candidates of political parties A, B, C, but a voter in district B can only choose from candidates of political parties A and B: in that case voter[B] should not be modeled as if he can choose C, because he simply *cannot* choose C by definition).

In practice however, we only have (at best) a “likely” familiarity of each offender with each neighborhood, predicted using other sources (such as smartphone travel data of the population). I cannot quite wrap my head around how to incorporate such offender-specific best guesses/proxy of the familiarity with each neighborhood into the model. I suppose I can simply add it as an offender-specific covariate to the model (but there is a lot of uncertainty in the prediction, so this variable will need to incorporate measurement error).

Theory suggests that if the offender is unfamiliar with a neighborhood, then the chance that he commits there is essentially 0. So perhaps I should include an interaction between the offender-specific location-familiarity variable and all other variables to capture this?

But actually, my feeling is that the offender-specific location-familiarity is some kind of “prior”, i.e. our individual-specific prior knowledge of an offender’s choice set. This prior would be on the Y’s, similar to a multinomial model with varying choice sets, but without removing some choice alternative completely (as we cannot be 100% sure that an offender is really totally unfamiliar with a neighborhood). I have no idea if that is a feasible approach, however.

(2) What do you recommend for the situation described above?

My reply:

1. Yes, you should definitely be able to fit those Rsghb models in Stan, this should be no problem at all. If difficulties arise, you can post questions to the Stan users group and you’re likely to get a polite and helpful answer (we’re a Ripley-free zone).

2. If you have information on how “likely familiar” an offender is with each area, I think you should just put this in as a continuous predictor. The model should work fine, and it shouldn’t really matter how many nonzero cells there happen to be.

P.S. I wonder if we could set up this sort of model for junk science: we’d have a grid with research topics in one direction, and journals in the other. The challenge would be to predict where this stuff would be published next.


  1. Pointeroutguy says:

    Ripley-free zone?

  2. OK, it’s fine to fit models like this as a descriptive model, but when using characteristics of the locations and the criminals, almost always the bias present in the data set generated by actual processes means that your results are invalid for decision making. It’s REALLY BAD to make decisions on the basis of a non-causal model, especially when the models are often so directly applied. It’s even really bad to make decisions on the basis of non-applied studies that have implications that get directly applied. At the very least, I suggest that an extremely careful look at the model biases and the structure of how the dataset was generated is needed. If you have an incorrect generative model, especially one where you ignore bias in the data generation correlated to the variables you used, you can bias it it really nasty and hard to see ways.

    In my experience, criminology models and papers exploring dynamics based on those models are applied much too directly for anything less than the utmost caution. For this reason, despite being a very big believer in STAN, I think it’s the wrong choice for this type of work. The people who develop Stan often have a hard time characterizing exactly why complex models work exactly the way they do, and most researchers simply don’t have the expertise needed to understand how their model is creating bias, and when these biases are damaging. The researchers should choose simple tools that they fully understand, and then exercise extreme care in interpreting the results.

    • You need to model the process that created the data set. Only if the data is an unbiased subsample of reality can you ignore the difference between what is going on in the world and what is going on in the data set.

      On the other hand Stan is perfect for doing this. Using simple but wrong models isn’t a solution imho

      • I would argue that using a simple but well understood and incorrect model for directing policy decisions is almost always better than using a complex and less well understood model when correctness is difficult to verify. (Chapter 2 of my dissertation, which is being typeset now, discusses this point in far, far too much detail.)

        Per the below response, I misunderstood the purpose of the model here, but to respond to your point, I’m very wary of using generative models *in cases where the generative process is not already understood* to direct decision-making. On the other hand, I certainly agree that ignoring the generative process isn’t a better solution.

    • Thanks for your reply, David. I understand your remark (that you need to model the process that created the dataset) but I don’t see the connection to my question. You emphasize results being invalid for decision making / it’s really bad to make decisions on the basis of a non-causal model. But to be clear, it was my intention to try to study ‘decision-making of offenders’, not ‘making (law enforcement?) decisions’ based on the statistical model or anything like that. Perhaps I shouldn’t have used the term decision-making (and perhaps imply that offenders make a priori rational decisions on where to commit their next offense)? What I mean is to model the location *where* an offender offender commits crime as a revealed preference ( choice model ( I.e., similar to a revealed preference design on mode of transportation (bus, metro, car, bike) but then with a whole lot more decision alternatives (namely the locations / neighborhoods of offense).

      • Thank you for clarifying, and apologies – modeling the offender’s decision-making process is definitely different than using the model for decisionmaking, and I misunderstood. In my (limited) experience with crime policy, I have usually seen criminology research used to make conclusions about crime-enforcement policy, such as how to decrease propensity for crime, and how to allocate reources to prevent it, and so I assumed that this research was similar.

        • digithead says:

          Which is why there is a delineation between studying what causes crime (i.e., criminology) and studying social response to crime (i.e., criminal justice). Moreover, no behavior is criminal until laws are made prohibiting such behavior so we also have to delineate what causes lawmaking (i.e., political science) and then studying how laws are implemented (i.e., public policy).

          Which is why I share your initial skepticism of these efforts as they have limited utility because the underlying data generating process is much more complicated. I think the more interesting aspect of these endeavors is not how accurate they are in prediction but in looking at the instances that they failed to predict. It’s what we don’t understand that helps us to learn new things.

  3. Dan Simpson says:

    This seems to call for a point process model, right? (Although you can see a binomial model as a pseudolikelihood for a clustered point process, so it’s not that far off). See, for example: this paper

    Although I’d typically use a log-Gaussian Cox process because they’re straightforward to fit using something like the INLA or INLABru or geostatsp or lgcp packages in R and the coefficients are daily easy to interpret.

    • The difference is that as far as I know the (spatio-temporal) point process model you referred only uses the locations of previous offenses: the locations of all (reported) crimes are used to predict the likely location of the next crime. With this model I’m skeptical whether the model actually matches the data-generating process, as mentioned by David and Daniel). Also, this model assumes a continuous space in which offenses can occur (I think), even though that’s not always very realistic. But I digress. My question is different because we know more than only the locations of crimes but also who committed (some of) them. So it is then a natural step to try to model to what extent (for offenders that were caught) offender characteristics (such as age, home location, and so on) are related to the locations of (known) crimes they committed.

  4. Hence says:


    You may have committed a misdemeanor to the spelling of your correspondent’s surname. My guess is that it happened on the second instance, since his given name is of Dutch origin. I speculate that this might have been due to your thinking of John Steinbeck, whose novels, as it happens, also have crimes.

  5. Mike Maltz says:

    Kim Rossmo, who then was a Detective Inspector for the Vancouver (Canada) PD, wrote a book “Geographic Profiling,” which Steenbeck might want to check. He bases his analyses on actual cases that he and others have worked, rather than logit, RSGHB, or other canned routines. He’s now at Texas State University.

Leave a Reply