Soil Scientists Seeking Super Model

I (Bob) spent last weekend at Biosphere 2, collaborating with soil carbon biogeochemists on a “super model.”

Model combination and expansion

The biogeochemists (three sciences in one!) have developed hundreds of competing models and the goal of the workshop was to kick off some projects on putting some of them together intos wholes that are greater than the sum of their parts. We’ll be doing some mixture (and perhaps change point) modeling, which makes sense here because of different biogeochemical processes at work based on system evolution and extrinsic conditions (some of which we have covariates for or can be modeled with random effects), and we’re also going to do some of what Andrew likes to call “continuous model expansion.”

Others at the workshop also expressed interest in Bayesian model averaging as well as model comparison using Bayes factors, though I’d rather concentrate on mixture modeling and continuous model expansion, for reasons Andrew’s already discussed at length on the blog and in Bayesian Data Analysis (aka BDA3, aka “the red book”).

One of the three workshop organizers, Kiona Ogle, did a great job laying out the big picture during the opening dinner / lightning-talk session and then following it up by making sure we didn’t stray too far from our agenda. This is always a tricky balance with a bunch of world class scientists each with his or her own research agenda.

So far, so good

We got a surprising amount done over the weekend—it was really more hackathon than workshop, because there weren’t any formal talks.

GitHub repositories: Thanks to David LeBauer, another of the workshop organizers, we have GitHub organization, with repositories with our work so far. David and I were really into pitching version control, and in particular GitHub, for managing our collaborations. Hopefully we’ve converted some Dropbox users to version control.

Stan “Hello World”: The soil-metamodel/stan repo includes a Stan implementation of a soil incubation model with two pools and feedback, which I translated from Carlos Sierra’s system SoilR, an R package implementing a vast variety of linear and non-linear differential-equation based soil-carbon models (the scope of which is explained in this paper).

Taking Michael Betancourt’s advice, I implemented a second version with lognormal noise and a proper measurement error model (see the repo), which fits much more cleanly (higher effective sample size, less residual noise, obeys scientific constraints on positivity).

“Forward” and “Backward” Michaelis-Menten: Bonnie Waring, a post-doc, not only survived having a scorpion attached to her ankle during dinner one night, she’s leading one of the subgroups I’m involved with on reimplementing and expanding these models in Stan. Apparently, Bonnie’s seen much worse (than little Arizona scorpions) working in Costa Rica at the lab of Jennifer Powers (the third workshop organizer), to which Bonnie’s returning to run some of the enzyme assays we need to complete the data.

I’m very excited about this particular model combination, which involves some state-of-the art models taking into account biomass and enzyme behavior. There are two different forms of Michaelis-Menten dynamics under consideration, as they both make sense for different subsystems of the aggregate soil and organic matter biogeochemistry.

The repo for this project is soil-metamodel/back-forth-mm, the readme for which has references to some papers, including one first-authored by another workshop participant, Steve Allison, one of the workshop participants, and some colleagues, Soil-carbon response to warming dependent on microbial physiology (Nature Geoscience).

Global mapping: Steve’s actually involved with a separate group doing global mapping, using litter decomposition data. The GitHub repo is soil-metamodel/Litter-decomp-mapping.

They’ve got some stiff competition (ODE pun intended), given the recent fine-grained, animated global carbon map that NASA just put out.

Non-linear models: Kathe Todd-Brown, another post-doc, helped me (and everyone else) unpack and understand all of the models by breaking them down from narratives to differential equations. Kathe’s leading another subgroup looking at non-linear models, which I’m also involved with. I don’t see a public GitHub repo for that yet.

Science is awesome!

Right after Carlos, David, and I first arrived, we ran into a group of tourists, including some teenagers, who asked us, “Are you scientists?” We said, “Why yes, we are.” The teenager replied, “That’s super awesome.” I happen to agree, but in nearly 30 years doing science, I can’t remember ever getting that reaction. So, if you’re a scientist and want to feel like a rock star, I’d highly recommend Biosphere 2.

It’s also a fun tour, what with the rain forest environment (i.e., a big greenhouse), and the 16 ton rubber-suspended “lung” for pressure equalization.

9 thoughts on “Soil Scientists Seeking Super Model

  1. I’ve been following the Bayesian change-point model literature on-and-off for a while (although it’s only recently that I’ve actually had to face a real-world change-point problem, and I’m not yet at the point of formulating a Bayesian model for it). It seems to me that AG’s anti-Bayes-factor philosophy (if followed in a doctrinaire way that I’m not certain AG would endorse) would forbid the use of change-point models. The reason is that a change-point model is actually a discrete set of models; inference on the location of the change-point(s) is entirely equivalent to the kind of discrete model posterior calculation that AG hates when it arises in Bayesian “hypothesis testing”.

    The continuous-model-expansion finesse would be, I suppose, something like putting a latent Gaussian process with a neural net covariance function (see 4.2.3) in the prior.

    • Or if you think there’s a single change-point, a single “sigmoidal” mean function with unknown position and amplitude, plus a gaussian process around that mean function of some other form. But I have to say that neural net covariance function is cool beans!

      • Note that the neural net covariance function requires defining the origo. Prior states that the function changes fast near the origo and it gets smoother further away from the origo. You can see this also in Fig 4.5 in GPML book. Fig 5.10 in GPML book is super misleading as the origo happens to be exactly where the change point is. Try repeating the experiment with changepoint further away from origo and you’ll be disappointed. I used to think that neural net covariance function is cool, but nowadays I recommend not to use it (unless you really really know what you are doing).

        • Huh. Now that I inspect the actual form of the NN covariance function(s), I see how it expects changes near the origin. I can think of an approach that might get around that — assume that the origin is unknown and put a prior on it — but I’d worry about inducing some kind of degeneracy in this parameter-expanded model (something akin to how parameter-expanded Gibbs sampling is a null-recurrent Markov chain). Also, this means that a NN covariance function can account for at most one change-point, so this isn’t actually a continuous model expansion. For that, we want a stochastic process that is mostly smooth but has relatively rare localized regions with large-magnitude derivatives.

  2. I think you will find Michales Menton has limited predictive power at a fine scale, and use of the equation at higher levels of ecosystem organization leads to excessive interpretation of efficiency effects, such as Carbon Use Efficiency cited in the Nature Geosciences paper. Nature tolerates inefficiencies in ways Allison’s simply models don’t account for. Just because a curves fits MM dynamics does not necessarily mean that efficiency is a driver for soil carbon dynamics. I think spatially explicit hierarchical models linking soil carbon dynamics with microbial activity, including enzymes, and soil properties will be more fruitful and illuminating of the mechanisms involved.

  3. Pingback: Bonnie and Jennifer Warm Up in Biosphere 2 | tropicaldryforest

  4. The successful “herding of cats” seems to depend on quickly getting them communicating and working together and most critically continuing to work together to at least a reasonable first draft of something completed.

    The GitHub repositories and your blog post here likely are helping in various ways – hopefully you will give us an update post at some point.

Leave a Reply

Your email address will not be published. Required fields are marked *