What is spatial epidemiology, anyway?

Every time I talk or teach about spatial epidemiology, I find myself confronted with the difficulty of defining what it is. More specifically, I have a hard time defining what my version of it is, why I do research in this area, teach about it, and just think about it a lot of the time. I also worry that students who came for maps, GIS, fancy statistical models, and all that good stuff will be a bit disappointed when they get my version, which has some of that but is also more eclectic and navel-gazey.

Sometimes, I think about changing the name of the class to something like “relational epidemiology”, “spatial and contextual epidemiology” or just health geography – but I’m not a geographer and I’m not totally sure what health geography is, either.

At the end of the day, spatial epidemiology is interesting and important to me because it is relational in nature. Maybe this just reflects the way my brain has been poisoned by training in the social sciences and infectious disease dynamics, which are all about relationships and interpersonal dependence. But if we called it relational epidemiology, what would the most important relationships be?

  • Relationships between individuals, e.g. in a classic social network.
  • Relationships between people and the environment, i.e. climate change and other types of human-driven ecological change.
  • Relationships between areas of the physical environment, e.g. dispersal of dust and other pollutants through the air, movement of bacterial and viral pathogens via water sources.
  • Hierarchical relationships between social units, i.e. neighborhoods within cities.
  • Within-individual change over time, e.g. the progression of chronic illness, natural aging.

This defines the problem space for what I think of as being the super-group of “relational epidemiology” topics. Then we have a set of ideas or approaches that act as useful frames through which to view these ideas: Spatial analysis clearly falls under this heading, but so do network analysis, time-series analyses, non-spatial hierarchical models, individual-based models, and on and on. These also touch on other well-established fields like ecology, social epidemiology, environmental health, sociology, economics, political science, and on and on.

Making Choices

One of the early lectures in my online spatial epidemiology course is titled “Making maps means making choices”. I like this one because it gives me the opportunity to feel smart by reiterating a point that has been made many times before: Spatial approaches to public health are powerful because they are decidedly non-neutral. Maps have the pleasing appearance of something settled and clear, but we know they obscure more than they show. A disease map includes the information on risk and relationships we want to highlight. The stuff that is left out is implicitly understood to be less important than what is left in. This makes it just like any other model, statistical, mathematical or otherwise.

I guess this is why I keep calling the class spatial epidemiology rather than something more expansive that could allay some of the mildly guilty feeling I get about teaching a version of this class that is heavier on ‘spatial thinking’ (whatever that is) than ‘spatial methods’. (Honestly I’m not even sure what exactly belongs in that set or doesn’t – but that’s for another day).

When I say it’s a course about spatial epidemiology, to me that ultimately means that space is the starting point rather than the destination. In other words, if we put things on a map or estimate a model of the distances between individuals with different attributes or outcomes, then we have to ask why the patterns we see are the way they are. We get to tangle with all the wooly questions about relationships and interdependence, but we start from a place that most people grok on at least some level.

This can be done as effectively through other lenses: social network analysis, ethnography, agent-based modeling and others. But to me the reason space is particularly powerful for building a relational perspective in epidemiology and public health is that you can put anything on a map: Everything that is within the concern of public health can be pinned down to some location on a map. Whether or not that location is meaningful is another question, but at least it gives us some place to start.

Ok, so what?

I don’t know – up here in Michigan we’re on spring break (it’s above freezing!), and I’m taking a few minutes to think about why I do the things that I do. But more than that, spatial analysis feels like one more slippery set of tools or concepts among the ones I care about. Asking why I care about spatial epidemiology is not that different from asking why I think Bayesian statistics, transmission models, hierarchical analysis, and many other things that sound kind of well-defined but aren’t are good and important things other people should care about.

Teaching about these things, but also publishing on them and writing grants to get people to pay for the work, forces us to articulate what they are all about. But it might be helpful sometimes to zoom out and admit to ourselves and everyone else that these are all fuzzy concepts, more like a question we have to continually ask and answer rather than one that has a fixed meaning.

And maybe you already knew that – but I wrote this to remind myself for the next time I forget.

(Thanks to Krzysztof Sakrejda and Joey Dickens for ideas & feedback! h/t to Justin Lessler et al. for their great paper “What is a hotspot anyway?” that got me thinking about this.)

Stan Weekly Roundup, 28 July 2017

Here’s the roundup for this past week.

  • Michael Betancourt added case studies for methodology in both Python and R, based on the work he did getting the ML meetup together:

  • Michael Betancourt, along with Mitzi Morris, Sean Talts, and Jonah Gabry taught the women in ML workshop at Viacom in NYC and there were 60 attendees working their way up from simple linear regression, through Poisson regression to GPs.

  • Ben Goodrich has been working on new R^2 analyses and priors, as well as the usual maintenance on RStan and RStanArm.

  • Aki Vehtari was at the summer school in Valencia teaching Stan.

  • Aki has also been kicking off planning for StanCon in Helsinki 2019. Can’t believe we’re planning that far ahead!

  • Sebastian Weber was in Helsinki giving a talk on Stan, but there weren’t many Bayesians there to get excited about Stan; he’s otherwise been working with Aki on variable selection.

  • Imad Ali is finishing up the spatial models in RStanArm and moving on to new classes of models (we all know his goal is to model basketball, which is a very spatially continuous game!).

  • Ben Bales has been working on generic append array funcitons and vectorizing random number geneators. We learned his day job was teaching robotics with lego to mechanical engineering students!

  • Charles Margossian is finishing up the algebraic solvers (very involved autodiff issues there, as with the ODE solvers) and wrapping up a final release of Torsten before he moves to Columbia to start the Ph.D. program in stats. He’s also writing the mixed solver paper with feedback from Michael Betancourt and Bill Gillespie.

  • Mitzi Morris added runtime warning messages for problems arising in declarations, which inadvertently fixed another bug arising for declarations with sizes for which constraints couldn’t be satisfied (as in size zero simplexes).

  • Miguel Benito, along with Mitzi Morris and Dan Simpson, with input from Michael Betancourt and Andrew Gelman, now have spatial models with matching results across GeoBUGS, INLA, and Stan. They further worked on better priors for Stan so that it’s now competitive in fitting; turns out the negative effect of the sum-to-zero constraint on the spatial random effects had a greater negative effect on the geometry than a positive effect on identifiability.

  • Michael Andreae resubmitted papers with Ben Goodrich and Jonah Gabry and is working on some funding prospects.

  • Sean Talts (with help from Daniel Lee) has most of the C++11/C++14 dev ops in place so we’ll be able to start using all those cool toys.

  • Sean Talts and Michael Betancourt with some help from Mitzi Morris, have been doing large-scale Cook-Gelman-Rubin evaluations for simple and hierarchical models and finding some surprising results (being discussed on Discourse). My money’s on them getting to the bottom of what’s going on soon; Dan Simpson’s jumping in to help out on diagnostics, in the same thread on Discourse.
  • Aki Vehtari reports that Amazon UK (with Neil Lawrence and crew) are using Stan, so we expect to see some more GP activity at some point.

  • We spent a long time discussing how to solve the multiple translation unit problems. It looks at first glance like Eigen just inlines every function and that may also work for us (if a function is declared inline, it may be defined in multiple translation units).

  • Solène Desmée, along with France Mentré and others have been fitting time-to-event models in Stan and have a new open-access publication, Nonlinear joint models for individual dynamic prediction of risk of death using Hamiltonian Monte Carlo: application to metastatic prostate cancer. You may remember France as the host of last year’s PK/PD Stan conference in Paris.