No problem, we’ll adjust the data to fit the model

“…it has become standard in climate science that data in contradiction to alarmism is inevitably ‘corrected’ to bring it closer to alarming models. None of us would argue that this data is perfect, and the corrections are often plausible. What is implausible is that the ‘corrections’ should always bring the data closer to models.” – Richard Lindzen, MIT Professor of Meteorology

Background:

Back in 2002, researchers at NASA published a paper entitled “Evidence for large decadal variability in the tropical mean radiative energy budget” (Wielicki et al., Science, 295:841-844, 2002). The paper reported data from a satellite that measures solar radiation headed towards earth, and reflected and radiated energy headed away from earth, and thereby measure the difference in incident and outgoing energy. The data reported in the paper showed that outgoing energy climbed measurably in the late 1990s, in contradiction to the assumptions of predictions from climate models that assume positive or near-zero “climate feedback.”

One of the people who wasn’t surprised by these results was Richard Lindzen, one of the best-credentialed of the anthropogenic global warming (AGW) skeptics. Lindzen has always believed, or at least long believed, that the climate is much less sensitive to greenhouse gases than most researchers assume. Lindzen and a colleague analyzed the temperature data, in conjunction with satellite data that showed that the cooling could not be attributed to decresed solar radiation, and published a paper (Chou and Lindzen (2005, Comments on “Examination of the Decadal Tropical Mean ERBS Nonscanner Radiation Data for the Iris Hypothesis”, J. Climate, 18, 2123-2127) that demonstrated that the results imply a strong “negative feedback” in the climate. That is, the greenhouse effect from carbon dioxide — an effect no credible scientist denies, including Lindzen — is almost entirely counteracted by some unknown effect. As Lindzen said in a guest article in early 2009 on the AGW skeptic blog Watts Up With That?, “the results imply a strong negative feedback regardless of what one attributes this to.”

Around the time Chou and Lindzen were working on their paper demonstrating negative feedback, a NASA scientist named Josh Willis was analyzing data from a large array of autonomous underwater robots (called Argo) that measure ocean temperatures. In 2006, Willis reported reported that the oceans worldwide had cooled quite a bit from 2003-2005. AGW skeptics naturally claimed that this report, too, showed that AGW is overstated: if the oceans are cooling when they’re “supposed” to be warming, obviously there are large negative feedbacks that are not included in the models.

So, as of early 2006, both satellite data and ocean temperature data seemed to indicate that the oceans were cooling. But then, aha, the “corrections” began.

First, the authors of the Science paper about the satellite data published a new paper, “Reexamination of the observed decadal variability of the earth radiation budget using altitude-corrected ERBE/ERBS nonscanner WFOV data” (Wong et al., Journal of Climate 19:4028-4040, 2006). This paper corrected for a previously unrecognized (or perhaps just unaccounted-for) effect in the satellite data: the satellite moved about 20km closer to earth during the 1990s. The main measurement instrument “sees” the entire earth plus a ring of space around it, as the satellite moved closer to the earth it intercepted more of the earth’s radiation and saw less of the space around it, and thus saw more outward-going radiation…not because radiation from the earth had increased, but because the satellite was intercepting more of it. After correcting for this effect, there was no increase in outgoing energy in the late 1990s. The apparent “negative feedback” was an artifact of bad data.

And as for the cooling of the oceans, well, there was a “correction” for that, too. It started with a close look at the ocean temperature data, which were not only hard to explain, they seemed to contradict other data sources. It’s true that the huge array of floating robots was intended to be the best single source of worldwide ocean temperature data, but it’s not the only source, and other sources didn’t see the cooling. While the satellite data had suggested that cooling could have occurred, it was possible to believe, barely, that it had. But once the satellite data were corrected and showed that the earth was gaining, not losing, heat, the ocean temperature data looked increasingly wrong. Eventually the original NASA investigator, Willis, had to agree that something was wrong: as data continued to pour in, month after month, some parts of the oceans, especially in the Atlantic, were cooling very quickly. So quickly, in fact, that it seemed physically impossible to account for the missing heat. By early 2007, Willis was convinced: his data were wrong, and the ocean cooling he had reported several years earlier may not have occurred at all. You can read the story on a NASA website, it’s pretty interesting. In short, although most of the thermometers on the 3000 undersea robots were reporting accurate data, a small number were reporting temperatures far too low…so low that even the limited numbers of such measurements were enough to substantially underestimate the average temperature. What’s more, some of the older measurements, from before the start of Argo, were found to be too high. When the older measurements were adjusted downward, and the more recent measurements were adjusted upwards, the result was an ocean warming trend that was consistent with the satellite measurements…which, remember, were themselves adjusted in a way that removed a cooling trend that had initially been reported.

Lindzen, the AGW skeptic, was apparently unaware of these corrections/adjustments when he wrote his article on Watts Up With That. In that article, he repeated the key result from his 2005 paper, saying “The earth’s climate (in contrast to the climate in current climate GCMs [General Circulation Models, i.e. computer models of the earth’s climate]) is dominated by a strong net negative feedback.” But a month after making that post, Lindzen sent a letter to Watts Up With That, acknowledging the corrections to the energy data and agreeing that they would change his results. In that letter, he made the statement that leads off this blog entry, including this: “What is implausible is that the ‘corrections’ should always bring the data closer to models.” Lindzen doesn’t claim that any particular data adjustment or correction is incorrect, and in fact, he seems to agree that initial data are sometimes wrong and that corrections are therefore necessary. But he suggests that if those corrections are always in the same direction — always supporting “alarming” models — then something is fishy.

There are several reasons “corrections” to data can tend to make the data agree better with models, not just in climate science but in any field. Here are two of the most common:

1. When you see data that disagree with your model, you take a close look for ways in which the data could be wrong — especially in the direction that makes them fit poorly. If there are several adjustments that could or should be applied (like correcting or adjusting a measurement for instrument drift, pressure, frequency response, etc.) you might only think of, or only apply, the ones that act in a favorable direction. If you work this way, your adjustments will always lead to data that are better fit by your model.

2. When you see data that disagree with your model, you find ways to reject the data. “We had trouble with the instrument that day,” “Oh, I remember thinking at the time that that experimental sample looked funny,” “that patient shouldn’t really have been in the study anyway, they snuck through a loophole in the selection protocol.” If you discard poorly-fit data this way, but you don’t apply the same standards to the data that are fit well, then your adjustments will always lead to better agreement between data and model.

In the real world, effects like these occur all the time. Item 2 is perhaps more widely recognized — accusations of “cherry-picking” the data are common in many areas of science — but item 1 occurs too. Presumably Lindzen’s comment about the implausibility of corrections always leading to better fit with “alarming” models indicates his conviction that either or both of the effects above are the explanation. Actually, by agreeing that adjustments are necessary but saying it’s implausible that the adjustments should always lead to better model fit, he’s implicitly plumping for reason 1.

But there is a third reason adjustments can systematically lead to data that are better fit by models:

3. The models are close to being correct. In this case, gross discrepancies between data and models will indicate problems with the data. Fixing those problems will lead to data that are in better agreement with the models.

For instance, I was once the teaching asistant for a physics lab class in which one of the experiments involved timing a small metal ball as it fell from different heights, and recording and plotting the results. Even with this simple exercise, several things could go wrong, including a stuck switch that the ball was supposed to trigger at the bottom, or a student mis-recording a time. None of the data from that lab could possibly have convinced me that there was a problem using d = 1/2 * g * t^2 to calculate the distance for that experiment. (The balls were very dense so air resistance was low).

Except in rare cases like that simple physics lab experiment, it’s a mistake to assume that when data and models disagree, it’s the data that are the problem. In fact, when the data are simple and the models are complicated, as is often the case, it’s almost always the models that are wrong. But when the data are complicated — by which I mean, there are many different effects that must be accounted for in order to interpret the raw data as a measurement of a parameter of interest — then it’s not necessarily a surprise to find problems with the data, and to find that when those problems are fixed the result is better agreement with a model.

It’s good to have a healthy suspicion of “adjustments” or “corrections” to data, especially if (a) those corrections are made only after disagreement with a model has been noticed, (b) corrections are made to completely different datasets (as with the energy balance satellite data and the ocean temperature data from the robots), and (c) the corrections change the data to fit the model rather than the other way around. Be suspicious. Give the data and the corrections extra scrutiny, absolutely. Be on the lookout for biases introduced by effects 1 and 2 above, because those really do occur.

But a healthy suspicion can be taken too far. Researchers can’t be expected to keep using data that are known to have systematic errors, so one has to allow corrections to be made. And if, in fact, there is a model that correctly captures the behavior being measured, those corrections are going to lead to better agreement with the model.

In the cases of the energy balance measurements and the ocean temperature measurements discussed above, the corrections are necessary. And if you make the corrections, you find that the oceans did not cool, and the energy balance of the earth did not shift in a way that implies strong “negative feedback.”

========

A few notes:
1. People who don’t think anthropogenic global warming is occuring or is of practical significance take umbrage at being called “deniers” — insulting, dismissive, yada yada — but apparently some or many of them are happy to use their own insulting or dismissive terms. Lindzen refers to estimates of moderate climate sensitivity as “alarmism.” I find this irritating.

2. For an example of the complexities of calibrating satellite data, here’s an interesting short write-up that I came across while preparing this post.

29 thoughts on “No problem, we’ll adjust the data to fit the model

  1. Phil Price, scientist at Lawrence Berkeley National Laboratory (which is not the same as Lawrence Livermore National Laboratory).

  2. 'And as for the cooling of the oceans, well, there was a "correction" for that, too. It started with a close look at the ocean temperature data, which were not only hard to explain, they seemed to contradict other data sources.'

    Why not just use the other data sources?

  3. Part of Lindzen's point, I think, is that there are a lot more people interested in looking for corrections which make the data fit the models than there are people who are looking to make corrections the other way around.

    If the error is random, the fact that it is, in practice, occurring in only one direction says something about the error detection mechanism

  4. You should also think about what happens when the data agrees with your model. Nothing. Who looks closely at data when it fits? That means there will always be a bias in "legitimate data discrepancies found" towards those that didn't agree with the model. This can be ameloriated to a certain extent by peer review, but the original researchers are almost always better positioned to find certain types of problems with the data (especially when gathered by instruments). I find it completely plausible that data corrections are almost always in the direction of better model fit.

  5. William Ockham:

    > You should also think about what happens when the data agrees with your model. Nothing. Who looks closely at data when it fits?

    Lindzen, for one. That name wasn't particularly hard to find, in this particular case. Mendel's pea experiments have been scrutinized for fitting the theory *too* well. Were you counting on people not being able to come up with the cases that refute your statement?

    > This can be ameloriated [sic] to a certain extent by peer review, but the original researchers are almost always better positioned to find certain types of problems with the data (especially when gathered by instruments).

    Are you asserting two specific weaknesses in peer review?: (1) because it is not perfect, it has weak effect (2) self-review trumps peer-review (especially for instrument based observation) to such a point as to prejudice against peer-review out of hand. Please give supporting arguments for both, because they are controversial, to say the least. Or disown them, and scavenge for presentation whatever is left of your statement.

    > I find it completely plausible that data corrections are almost always in the direction of better model fit.

    Meaningless, because fame and tenure follow strong unique confounding results. That is why we remember the Michelson-Morley experiment. "Corrections in the direction of better model fit", regardless of number, enjoy neither such fame nor such infamy. Please demonstrate that human value and interest is primarily placed on the the *quantity* of corrections.

    The simplest explanation (ha!) for your frothy comments is that you are a concern troll. Sharpen your statements so I am not striking against mere suds.

  6. @Cody, "Why not use the other data sources," I guess it depends on what you're using them for. The intent of Argo (3000 robots sampling at all levels of the oceans!) is to give really good spatial and depth coverage to measure temperatures. No other data source has coverage like that. But one example of contradictory data is temperature measurements taken by lowering instruments from ships, contradicted the Argo data at some locations, so that was enough to suggest the Argo data were wrong. Only suggestive, though, at least at first, because it's always possible that the ships just happened to miss an upwelling of cold water, or happened to take extra measurements from a warm current. Another example of contradictory data isn't temperature measurements at all, it's sea level measurements: Argo said the seas were cooling, but sea level measurements still showed the seas rising. The only way that could happen is if thermal contraction of the oceans was offset (and more) by an enormous influx of water from melting ice, which nobody thought was happening. But sea level is really a blunt instrument for measuring ocean heat. What you really want to do is fix the Argo data, which they think they have done now.

    @William, your style is amusing, interesting, and irritating, but irritating is starting to win out.

    @Bill, and I assure you, the reason I'm listed only as "Phil" isn't because I'm trying to hide in any way. Hmm, I think the solution would be to add a link for me in the "more info on our research" section…but this would require me to create a page that gives info about my research, and the closest link I have like that hasn't been updated in six years and discusses only a subset of my research. OK, I'll create the necessary page, and get Andrew to link it, and it'll be problem solved.

  7. Believe William's point is important –

    when researcher's see results they were hoping for
    – the rush for publication is understandable

    when researcher's see results they think somehow might be wrong (hopefully) – the rush to double check data and calculations is understandable

    Good scientific process tries to mitigate these understandable but highly biasing tendencies. But peer review is severely limited especially by no access to the data and details of modeling process – as mentioned on this blog more than once before

    Also – no one has yet pointed out that the prior (model) always adjusts data (likelihood) towards the prior ( to get the posterior) ?

    K

  8. Oops, my comment about the irritating style should have been directed at manuelq, not at William! manuelq's post started with "William Ockham" and I took that as the signature. William, you're okay!

  9. (If my style is not irritating to the degree of overwhelming awareness of all other qualities, I must be coming down with a cold.

    But I want the full force of castigation to fall on my head alone, so I will take greater care when quoting.)

    Quoting "K? O'Rourke":

    > Good scientific process tries to mitigate these understandable but highly biasing tendencies.

    I would say that "good scientific process" is *actually successful* at mitigating bias. As a process that embraces self-correction, it is hard to come up with examples of errors that science has allowed to let stand, for all time. I can't think of any.

    [Aside: The closest I can think of is the denying of the possibility of a numerical intelligence quotient in the polite company of scientist, even though it is at least as well established as the Big Five Personality model, which is uncontroversial. But does that even count? Nobody is barred from publishing and the truth is available to the motivated. I am genuinely curious – Are there any examples of errors that science has allowed to let stand, for all time?]

    Obviously, that doesn't mean that correction could not or should not take place faster. But you took an unusually negative tone for such a successful system. I cannot think of a more successful system.

    > But peer review is severely limited especially by no access to the data and details of modeling process

    OK. There should be a standard of reporting of data that makes replication of results not merely possible, but highly likely to be successful for other groups. But what *exactly* can be scavenged from "Ockham"'s point(s)? The motivation is supplied – notoriety for one's research that confounds current understanding, and the barriers are not shown to be insurmountable or prohibitively looming compared to the motivation. "To the victor go the spoils" is a good system.

    > no one has yet pointed out that the prior (model) always adjusts data (likelihood) towards the prior ( to get the posterior)

    This only applies to those who are arguing for a model. Consider the "data correction" that was the negative result of the Michelson-Morley experiment. They were not barred from publishing until they supplied a replacement model for the ether they disproved – the negative result came before the compatible model.

  10. I was counting on people here having honest intent and a decent level of reading comprehension. Based on your style of argumentation, I'm going to assume that your selective quoting of my comment was intentional (normally I would assume that yours was a simple misunderstanding).

    I think you knew that I was referring to the perspective of a researcher evaluating her own data and I'm sure you that you knew that Lindzen and the critiquers of Mendel didn't fit that category. If you tried to argue against one of the points I had actually made, perhaps it would be more of a challenge. Straw men are rarely worthy targets.

    As for the rest of your interpretation of my comments, I'll just point you to K? O'Rourke's follow-up comment which elucidated the points I was trying to make.

  11. I don't wish to drown this thread in the castor oil of my combative writing style, so I will keep it brief(er) [being honest]. [I am readily available in other forums, so abuse can be directed at me without boring Phil's readers.]

    I plead innocent to the charge of "selective quoting". The *only* thing I didn't quote was the sentence "That means there will always be a bias in "legitimate data discrepancies found" towards those that didn't agree with the model.", which is an assertion claiming to be supported by what surrounds. I thought quoting and dismantling *everything else* you said would be sufficient. I quoted your short comment in three different places, for goodness sake.

    Forgive me if I don't see fit to quote the totality of your final reply… would be what I would say, if I didn't wish to end on a gentlemanly note.

  12. We're clearly having a communication failure. Let me re-phrase my original point with a simple hypothetical:

    Let's assume that you, manuelg, have conducted an experiment to test one of your own models and the experiment's results match your model. With no other information, would you re-examine the data to identify any measurement errors you made? Would your first thought be, I might have made a measurement error? Or would you take the results at face value?

    On the other hand, if the results contradicted your model, would you check the data to see if there was a measurement error?

    I have no idea how you will answer those questions, but I'm asserting that most people will be more likely to re-check the data when it didn't match their model. I thought this would be a fairly non-controversial view.

    Do you agree that people are more likely to check data when it contradicts their model? If that's true, then I think it is fairly obvious that more measurement errors will be found in those cases than in cases when the data agreed with the model (but only due to measurement error).

  13. http://www.stat.columbia.edu/~cook/movabletype/ar

    If the topic is "finding measurement errors", I don't see the issue as being "quantity of corrections done privately"; I see the issue as being "quality of corrections done in the whole of the published literature, over time".

    For the same reason I wouldn't rifle through the desk drawers of local mathematicians looking for errors made in computing restaurant tips to make a judgement about the quality of the theorems of the last decade of modern mathematics.

    Ockham: "Do you agree that people are more likely to check data when it contradicts their model?"

    I agree. And I claim, in the long run, no such bias can be sustained in the published literature.

    Ockham: "it is fairly obvious that more measurement errors will be found in those cases than in cases when the data agreed with the model (but only due to measurement error)"

    I agree. And I claim, in the long run, no such bias can be sustained in the published literature.

    I developed my claim in my earlier comments, and no argument has yet been provided in opposition.

    In your first comment you made the statement about peer review: "This can be [ameliorated] to a certain extent by peer review…"

    The qualifier "to a certain extent" was objected to, with reasons given, and no argument has yet been provided in opposition.

    If you are dropping any claim about the inability of peer review to eventually correct for confirmation bias with respect to models, then we have no argument. But what is left then? Only a prosaic observation about how humans can sometimes act irrationally, since I don't dispute that scientists are human.

    Ockham: "Based on your style of argumentation, I'm going to assume that your selective quoting of my comment was intentional (normally I would assume that yours was a simple misunderstanding)"

    What percentage of your verbiage must I quote to escape the charge of "selective quoting"? If five nines, then I am afraid I must disappoint.

  14. This is the 17th comment, but the word "outlier" hasn't been mentioned yet.

    If you make your living with data, you see a lot of outliers — including in your dreams.

    Yet outliers are outliers only with reference to a model. Maybe not an elegant model, but some model.

    Therefore, as Ockham argues above, there's an inherent bias to outlier detection.

    Not being a climate scientist, I can't really wade into the climate discussion. I would guess that most measurements of climate variables are subject to potential measurement error, recording error, instrument error, etc.

  15. manuelg – I totally agree in the long run all errors will be corrected if the long run means – as CS Peirce did – an unending scientific (accepts risking criticism) community of inquiry that never ends.

    But as Keynes pointed out (along George W Bush) we are all dead by then and so most of us will be concerned with the economy of inquiry – that errors be found and corrected as soon as is best.

    I did once work with a group that tried to check as hard for errors especially when the got what they hoped for. Part of that process was to keep me blind to treatment group coding and whether success was coded as 0 or 1.

    I believe we can just hope to ecourage more to make the effort to be less wrong less often. Unfortunately it does sometimes require them to risk losing funding and positions…

    K

  16. zbicyclist: and John Nelder seemed to go as far as to suggest the model be changed until there were no outliers at all – in his paper "There are no outliers in the stackloss data"

    K

  17. Aleks: I guess I should not say that I recall that was a not too uncommon practice in Phd theses

    or should I?

    K

  18. Cute story in principle, but of course in reality it's just the sort of biased crap that one would expect from Lindzen. The corrections run in the other way when the data are found to have errors in that direction. A prominent example is the rapid shutdown in the north atlantic overturning, which had been supposedly observed in ocean data but was found to be bogus.

  19. James,
    You're right, the assertion that all of the corrections lead to better agreement with "alarming" models is not true. In fact, originally I had some additional material about data that have been corrected the other direction, including very recent temperature anomaly data. I decided to leave that out because the post was already pretty long, and because it seemed to weaken my main point, which is that even if all of the corrections do lead to better agreement with a model, it doesn't mean something fishy is going on.

    –Phil

  20. Phil – with a bit of flippancy – it depends on whether you embrace my principles or my ex girl friend.

    Depends on whether the model is less wrong than the data versus the model more wrong than the data.

    Either can happen (which I think is your point) and David Cox once put this a ruining a good prior with bad data versus ruining good data with a bad prior.

    K
    p.s. of course the preface should have been
    (with apologies to the original quote)
    How will this blog seal Phil's fate? – Rotting in prison or from a sexually transmitted disease?

  21. If data is very sparse and a model with many degrees of freedom agrees with it, then there is a significant likelihood that the model gets the right answer for the wrong reasons.

    If data is very rich and a model is highly constrained, when the data and the model agree then you are done. The likelihood of getting the right answer for the wrong reason is vanishingly small.

    In the case of climate models, the data is very rich, and very high order organization appears in the models in ways that match reality that cannot be entirely due to chance. That said, agreement is imperfect and the model is necessarily incomplete, so the above argument is somewhat qualitative, but the agreements we already have are sufficient that they are clearly not entirely due to chance.

    It has been the case on several occasions that climate models had better fidelity than observational data. (It is important to understand that observational data themselves, especially remote observations, require a complex modeling process to "invert" the sensor signals to physical quantities; it is just as likely that the one model is flawed as the other.) In those cases the modeling process has ended up correcting the data.

    This is easily mocked (as in the title of the present article) but it's quite reasonable in fact. The key weapon of science is coherence. Once various streams of rich evidence agree, it is unlikely that you are on the wrong track. So, indeed, you move on at that point. In practice this is not a major cause of problems, at least in the climate field.

  22. The first two paragraphs of your comment are a very nice summary for sorting out false positive model results, and a good heuristic for when it makes sense to even use a statistical model for a given data set and situation.

    In the fourth paragraph, what does "fidelity" mean? I'm more of descriptive statistics practitioner,not bona fide statistian, so I google'd it, found only a single reference to fidelity statistics . However, I don't think that was what you meant?
    – Ellie K.

  23. My prior comment and question was intended for @Michael Tobis. Also I, ummm… this is embarrassing, mispelled "statistics".
    – Ellie K.

  24. Hi all,

    in order to answer to an evil referee, I need to cite some scientific article that claims (or even better proofs, beyond the general "common sense" reasoning) specifically what is said in this post: adjusting data to fit a model is WRONG.

    Any idea? Should be something grand, and, of course, peer reviewed :-)

Comments are closed.