Stabilizing feedback as a bad thing in scientific inference

Someone writes:

I am working on a dynamic vegetation model – one that can be coupled to climate predictions in the next generation of coupled carbon-climate models.

I am new to modeling, and would like to know your opinion on how I (and others) can avoid making an error. Specifically, how would you handle the following:

If the prediction of a climate model is very much outside the consensus predictions, it is not likely to be published. As I am developing a model, if a model prediction seems unlikely, I look for (and usually find) some error in the inputs. A plausible prediction will receive less scrutiny.

Is it possible that this feedback has helped to keep climate change forecasts constant over the past century (since Arrhenius’ 1-D model, presumably solved by hand, in 1896) in agreement presently? Would it be possible to compare the probability and observed frequency of outlier model predictions?

More importantly, how can I (and the modeling community as a whole) avoid such a trap? Would it be necessary to keep track of model inputs, outputs, and structural changes over time?

My reply: I know people have thought about this issue for a long time–the file-drawer problem and all that. In my own work, I’ve never thought it really made sense to try to formally analyze the inference/publication process; rather, I just try to do a good job with the study at hand. In the modern era in which we have more and more access to data, I’m hoping the solution will just be to incorporate old data directly into our analyses so that we don’t have to worry so much about what people did earlier. But perhaps others have more systematic thoughts on this.

9 thoughts on “Stabilizing feedback as a bad thing in scientific inference

  1. I agree there's a certain pressure to make your results agree with previous results, but if you do an analysis that *doesn't* agree, and after hard work you can't find an error to explain it, then there's a HUGE incentive to publish those results, because if they hold up to the scrutiny of others too, they may represent a breakthrough. How those two forces balance out is surely different for every researcher. But I feel confident that there are enough qualified people out there who would love to debunk (or significantly improve, or however you want to think of it) the standard models in such a hot area like climate research, that consistency-pressure couldn't keep results consistent over time all by itself.

  2. > If the prediction of a climate model is very much outside the consensus predictions, it is not likely to be published.

    More arguing that climate science is a nonesuch science. Taken to the logical extreme, we can argue that Einstein's papers on Special and General Relativity are not likely to be published (and that is why we still use epicycles today). Taken to the logical extreme, we can posit that Alex Rodriguez is not likely to swing for the fences. The blockbuster behavior of the players in the 99.999% percentile is poorly predicted by the tentative behavior of the average player.

    Secondly, science publishing is not the only market for climate modeling. Commodities traders and the reinsurance market for hedging risk on multi-year massive construction projects have a need for accurate climate modeling, because on those time scales a long range weather report would be worthless. Those players are willing to leave millions on the table just so their hired gun scientists can parrot safe results that are unlikely to rattle tea cups at the next faculty function? Unlikely.

    I beg your forgiveness for the following snarkiness. Can your anonymous concern troll name a single branch of science that has remained on a strictly linear trajectory since 1896? Besides phrenology.

  3. Sander Greenland coined this "conformity bias" – fairly recently (Rothman et al Epi text 2008).

    Selection of results/data is very nasty as it can lead to large biases and reversals of _pooled_ effects from true ones.

    It is more easily appreciated when an investigator choses which observations to leave out on their analysis – with no trace of those left out (and no full disclosure of their preferences).

    As for selection modeling, Greenland is less pessimistic than I, but more pessimistic than John Copas (probably "the" current authority on this). You have to not mis-specify the selection process too badly and I believe the publication selection process is very poorly understood but highly variable and couple with small sample sizes (number of studies). Copas does not go further than sensitivity anlyses (I believe).

    As for "try[ing] to do a good job with the study at hand", I did succeed in convincing one of my statistical mentors that this was impossible – at least without being able to prove the study at hand would never have been unselected especially before or even after it (or future studies) got to you( i.e. it is from a population of studies that is protected from selection. ) This is what makes replication from unselected future studies so important.

    Research is supposed to be self-corrrecting and (past and) future reporting selection foils this – in a possibly very big way.

    Trail registration and easier publication of results will help.

    Keith

  4. I don't think the question suggests nonesuch science.

    The question is about how to apply statistical tools to model development. This could increase the rate of discovery and the rigor of modeling.

    Models are continuously under development, becoming more complex and running at finer resolutions.

    If output were collected throughout the course of development, data could be compared to output from multiple iterations of the model.

    This would allow modelers to determine the predictive power gained at each step in the model development, or to determine the appropriate resolution and complexity.

    Even published results may only represent a snapshot of a model that is undergoing continuous development and refinement, often along multiple lines by multiple groups at once.

    Fundamentally, the answer might be as simple as to archive model output and then compare the output with data over time in a "simple" but comprehensive model comparison context. The biggest challenge might not be the statistical framework so much as keeping track of the model versions and inputs. Maybe it would find some value in runs that were carelessly thrown out.

  5. Granger Morgan and David Keith have explored this in some detail with regard to climate models. They find that if you ask climate experts to characterize their subjective best guess as to the distribution of key climate change parameters, you observe far more variance, both within individual experts' PDFs and across the experts, than you observe when you look at the distribution of all the climate model outputs.

    Their initial work was M. Granger Morgan and David Keith, "Subjective Judgments by Climate Experts," Environmental Science & Technology, 29(10), 468-476, October 1995 and they have followed it up with other publication.

  6. Marc Levy

    > They find that if you ask climate experts to characterize their subjective best guess as to the distribution of key climate change parameters, you observe far more variance … than you observe when you look at the distribution of all the climate model outputs.

    This is to be expected, because no one would represent *any* model as perfectly describing reality – if it was perfect, it would no longer be a model, anymore. Only pure mathematics has the benefit of being able to switch the analysis to a proved isomorphism that is easier to compute. Every model is an adequate simplification, over a domain, and it is hoped the failure modes are understood so the model is not misused. But the option of "proving" the model a perfect representation of reality is not available.

    Useful scientific models typically give sharp results – sharper results than field readings, even counting for input precision or rounding in iteration, etc. The models are useful *because* they give sharp results – or else you would have the perverse consequence of improving the usefulness of a model by adding slop into it to increase the variance. A bound on the error is useful to track, but no one would actually mix in slop in a model to force the variance wider, even if the model's variance doesn't match field readings.

    Any expert would know very well all the possible failure modes and other limitations of a particular model, that so their subjective guessed distribution would have greater variance than the considered model because of that knowledge. The scientist possess what humans value as knowledge, the documented model cannot (and so scientists cannot be replaced with the models of their creation). Why else might the variance be greater? — perhaps the scientist is in possession of they consider to be a better model, not yet published. Or perhaps, the scientist is simply aware of the possibility of a better model.

    The relatively uncontroversial model of satellite orbits is informative. They are tighter because they, of course, consider fewer particles than Mother Nature is able to consider. So they can consider events in the future, because they run faster than reality, and so they can run on economically available hardware. No one would consider their tighter variance than the variance of observatory readings to be surprising, much less consider it a failure of the model. Only if there was misrepresentation of best knowledge of the model's error bound, or failure modes, or applicable domain, and then, it would not be a failure of the model, it would be a misapplication by a human agent.

    Can I note that the goalposts have been moved? The original issue was "stabilizing feedback" and the original question contained the assertion "If the prediction of a climate model is very much outside the consensus predictions, it is not likely to be published." There are other interesting issues to consider, but only after the parties admit that the goalposts have been moved and the focus of the argument shifted.

  7. "If the prediction of a climate model is very much outside the consensus predictions, it is not likely to be published."

    I think this could be more realistically expressed as:

    "If the prediction of a climate model is very much outside the consensus predictions, it very likely to be published in a prominent journal such as Nature, and even more likely to be wrong."

    Two non-contentious examples from recent years are the supposed shutdown of the thermohaline circulation, and large-scale methane production from trees.

    (That is, they are non-contentious in the sense that everyone accepts they are wrong.)

  8. There are constraints on conformity.

    First, for climate, there's an active skeptic community with very strong incentives to find a plausible alternative hypothesis. Their most spectacular failures are surely due to the fact that many are cranks, but not all are, and yet nothing has emerged.

    Also, for physical systems more than social systems, conservation laws dictate that their are a limited number of options to explore, which means that as time progresses likelihood of a correct new idea should go down. Observing conservations laws and the like also reduces the chance that an unscrutinized plausible error can persist.

    It does seem that the greater likelihood of persistence of errors that yield plausible results, or are easily rationalized, is a good argument for transparent models and computer codes.

    As I recall, Arrhenius' model was "right for the wrong reasons" and a modern, simple energy balance model would be different, though having roughly the same climate sensitivity.

    There's an interesting model of the dynamics of scientific paradigm succession at
    http://web.mit.edu/jsterman/www/SDG/self.html

Comments are closed.