Sometimes the raw numbers are better than a percentage

Posted on June 24, 2010 1:18 PM by Phil

A NY Times Environment blog entry summarizes an article in Proceedings of the National Academy of Sciences that looks into whether there really is a “scientific consensus” that humans are substantially changing the climate. There is. That’s pretty much “dog bites man” as far as news is concerned. But although the results of the study don’t seem noteworthy, I was struck by this paragraph in the blog writeup, which is pretty much a quote of the PNAS article:

For example, of the top 50 climate researchers identified by the study (as ranked by the number of papers they had published), only 2 percent fell into the camp of climate dissenters. Of the top 200 researchers, only 2.5 percent fell into the dissenter camp. That is consistent with past work, including opinion polls, suggesting that 97 to 98 percent of working climate scientists accept the evidence for human-induced climate change.

Two percent of the top 50, that’s one person. And 2.5 percent of the top 200, that’s five people. As a general rule, when the numerator in count data is very small, or when the denominator is fairly small, I prefer to see the numerator and denominator separately rather than a percentage. If someone says “this guy has been making more than 80% of his free throws this post season,” I want to know if that is 6/7 or 17/20. So I think they should say 1/50 and 5/200, rather than 2% of the top 50 and 2.5% of the top 200. Yes, I understand these are mathematically identical, but I betcha a lot of people see “2% of 50” and don’t realize that’s one person, even though they’d realize it if they thought about it for just a second.

By the way, one guy (I assume it’s a guy, most of them are guys) has over 850 climate-related publications. Fourteen people have over 500 publications each (one of these is the “dissenter” among the top 50). Jeez. Andrew might attain this sort of level if he keeps up as his current rate, but for most of the rest of us this is ridiculous. Quite a selection effect, too: it’d be pretty much impossible to be considered as one of their “top 50” climate experts if you are under 45 years old.

11 thoughts on “Sometimes the raw numbers are better than a percentage”

Patrick McCann on June 24, 2010 12:28 PM at 12:28 pm said:

There is also a selection effect in that climate journal editors and referees are typically not dissenters. There has been recent evidence of an organized effort by these folks to prevent the funding and/or publication of their peers with opposing views. This would make the fact that a top x publications list was sparse on dissenters self-fulfilling.

http://www.guardian.co.uk/environment/2010/feb/02…
Patrick McCann on June 24, 2010 12:34 PM at 12:34 pm said:

Here is the evidence in one example, a climate scientist emails another asking for reasons to reject a paper he can't find flows with so the consensus will not be refuted.

http://www.eastangliaemails.com/emails.php?eid=32…
K? O'Rourke on June 24, 2010 2:53 PM at 2:53 pm said:

Phil – you might even being understating the mis-interpretation hazards.

These are my slides to simply display the risks of combining percentages versus raw numbers.

• Why percentages are truly confusing

– (maybe just to me)

• Front Acreage – percent male horses

– left side 80% > right side 70%

• Back Acreage – percent male horses

– left side 30% > right side 20%

• Higher percentage on left side than right?

•

• Front Acreage – male/total horses

– left side 4/5 > right side 7/10

• Back Acreage – male/total horses

– left side 3/10 > right side 1/5

• Higher percentage on left side than right?

• (4+3)/(10 + 5) > (7+1)(10 + 5)?

• 7/15 > 8/15?

• (Need to stratify – Simpsons Paradox)

K?
noahpoah on June 24, 2010 3:48 PM at 3:48 pm said:

What counts as substantial change? Also, what counts as concensus? Not trying to be snarky, just nit-picky, which in this case, seems worthwhile.
george on June 24, 2010 11:00 PM at 11:00 pm said:

Simpson's paradox is nothing to do with small samples vs large samples – which was where Phil started. You could multiply all the counts in your bullet points by a million, and not change your conclusion.

Also, where do the horses and Front/Back acreage fit in? Surely they're nothing to do with climate researchers, or free throws.
Jim on June 25, 2010 12:03 AM at 12:03 am said:

I hate to repeat such a worn out phrase, but follow the money. How many grants are awarded to scientists hypothesizing that a phenomenon does not exist? I'm not sure the application would be very compelling. If there's any money in that arena I could be a grant-receiving researcher in a number of fields.
Andrew Gelman on June 25, 2010 12:08 AM at 12:08 am said:

Cool–I know Ed Cook! I saw the following, whihc is maybe what you were pointing to: "Note that I had to pull out the Mongolia data set. I would love to give you it, but Gordon would go nuts if he found out." That's the kind of thing I might write in an email and then feel horrible if it came out. Or maybe "Gordon" understands and wouldn't be bothered, I dunno.

Someone once gave me the advice not to send anything in email that I wouldn't want posted to the world. I try to follow this principle–not so hard, really, given that I already have a blog–but occasionally I slip up and send an email that I woudn't want broadcast. I hope nobody ever hacks the Columbia University email system in this way!
K? O'Rourke on June 25, 2010 9:57 AM at 9:57 am said:

george: Most if not all posts here seem to me to have both methodological and domain content.

I was focusing on the methodological which to me here – is that of the hazards of dimensionality reduction.

Here (x,n) -> (x/n,n) -> (x/n).

The transformation is non-invertible and hence there is a necessary loss of information that may lead to confusion.

Phil pointed out the loss of the x information (recoverable by (x/n)*n)

I pointed out the difficulties of correctly anticipating averaging (combining) and contrasting with the loss of n,s.

Pearl has written about arguably bigger problems regarding causal interpretation when the n,s are replaced!

When I present the first slide to groups of highly educated, intelligent people – most realize its a trick question but are queasy about why obviously yes is not the right answer …

You can lead a horse (on either side) to water …

K?
george on June 25, 2010 9:09 PM at 9:09 pm said:

The "confusion" in Simpson's paradox is that the reversal happens – which is explainable in a picture. The reversal/paradox happens even when you have all the data, so I don't see that consequent "confusion" or "queasy"ness can be blamed on dimension reduction.

On your final comment, perhaps you can lead a horse to water but you can't make them define what the heck they mean by "n,s"?
A. Zarkov on June 26, 2010 3:57 PM at 3:57 pm said:

Astrophysicist Nir Shaviv debunks the PNAS article here. All it takes is some common sense. He too takes note that the huge number of publications ascribed to some climate researchers. At the other end of the scale, this and other European and American blog sites claim that in AR4 the IPCC relied on one, and only one publication (whose author is very attractive) to support their claim that solar activity has a minimal effect on global temperature rise. In other words, the consensus is a consensus of one.

All supports Phil's recommendation that we get both the numerator and denominator behind a fraction.
Phil on June 27, 2010 8:16 PM at 8:16 pm said:

I certainly agree with the suggestions that the article that prompted this post is kinda ridiculous. They could have covered the main point in two paragraphs. But then some reviewer would have complained that they hadn't addressed "statistical significance." I wish I could say I was kidding, but I'm not.

Comments are closed.