Skip to content

I can’t think of a good title for this one.

Andrew Lee writes:

I recently read in the MIT Technology Review about some researchers claiming to remove “bias” from the wisdom of crowds by focusing on those more “confident” in their views.

I [Lee] was puzzled by this result/claim because I always thought that people who (1) are more willing to reassess their priors and (2) “hedgehogs” were more accurate forecasters.

I clicked through to the article and noticed this line: “tasks such as to estimate the length of the border between Switzerland and Italy, the correct answer being 734 kilometers.”

Ha! Haven’t they ever read Mandelbrot?

Britain-fractal-coastline-100km

Estimating discontinuity in slope of a response function

Peter Ganong sends me a new paper (coauthored with Simon Jager) on the “regression kink design.” Ganong writes:

The method is a close cousin of regression discontinuity and has gotten a lot of traction recently among economists, with over 20 papers in the past few years, though less among statisticians.

We propose a simple placebo test based on constructing RK estimates at placebo policy kinks. Our placebo test substantially changes the findings from two RK papers (one which is revise and resubmit at Econometrica by David Card, David Lee, Zhuan Pei and Andrea Weber and another which is forthcoming in AEJ: Applied by Camille Landais). If applied more broadly — I think it is likely to change the conclusions of other RK papers as well.

Regular readers will know that I have some skepticism about certain regression discontinuity practices, so I’m sympathetic to this line from Ganong and Jager’s abstract:

Statistical significance based on conventional p- values may be spurious.

I have not read this new paper in detail but, just speaking generally, I’d imagine it would be difficult to estimate a change in slope. It seems completely reasonable to me that slopes will be changing all the time—that’s just nonlinearity!—but unless the changes are huge, they’ve gotta be hard to estimate from data, and I’d think the estimates would be supersensitive to whatever else is included in the model.

The Ganong and Jager paper looks interesting to me. I hope that someone will follow it up with a more model-based approach focused on estimation and uncertainty rather than hypothesis testing and p-values. Ultimately I think there should be kinks and discontinuities all over the place.

What does CNN have in common with Carmen Reinhart, Kenneth Rogoff, and Richard Tol: They all made foolish, embarrassing errors that would never have happened had they been using R Markdown

Rachel Cunliffe shares this delight:

scotland

Had the CNN team used an integrated statistical analysis and display system such as R Markdown, nobody would’ve needed to type in the numbers by hand, and the above embarrassment never would’ve occurred.

And CNN should be embarrassed about this: it’s much worse than a simple typo, as it indicates they don’t have control over their data. Just like those Rasmussen pollsters whose numbers add up to 108%. I sure wouldn’t hire them to do a poll for me!

I was going to follow this up by saying that Carmen Reinhart and Kenneth Rogoff and Richard Tol should learn about R Markdown—but maybe that sort of software would not be so useful to them. Without the possibility of transposing or losing entire columns of numbers, they might have a lot more difficulty finding attention-grabbing claims to publish.

Ummm . . . I better clarify this. I’m not saying that Reinhart, Rogoff, and Tol did their data errors on purpose. What I’m saying is that their cut-and-paste style of data processing enabled them to make errors which resulted in dramatic claims which were published in leading journals of economics. Had they done smooth analyses of the R Markdown variety (actually, I don’t know if R Markdown was available back in 2009 or whenever they all did their work, but you get my drift), it wouldn’t have been so easy for them to get such strong results, and maybe they would’ve been a bit less certain about their claims, which in turn would’ve been a bit less publishable.

To put it another way, sloppy data handling gives researchers yet another “degree of freedom” (to use Uri Simonsohn’s term) and biases claims to be more dramatic. Think about it. There are three options:

1. If you make no data errors, fine.

2. If you make an inadvertent data error that works against your favored hypothesis, you look at the data more carefully and you find the error, going back to the correct dataset.

3. But if you make an inadvertent data error that supports your favored hypothesis (as happened to Reinhart, Rogoff, and Tol), you have no particular motivation to check, and you just go for it.

Put these together and you get a systematic bias in favor of your hypothesis.

Science is degraded by looseness in data handling, just as it is degraded by looseness in thinking. This is one reason that I agree with Dean Baker that the Excel spreadsheet error was worth talking about and was indeed part of the bigger picture.

Reproducible research is higher-quality research.

P.S. Some commenters write that, even with Markdown or some sort of integrated data-analysis and presentation program, data errors can still arise. Sure. I’ll agree with that. But I think the three errors discussed above are all examples of cases where an interruption in the data flow caused the problem, with the clearest example being the CNN poll, where, I can only assume, the numbers were calculated using one computer program, then someone read the numbers off a screen or a sheet of paper and typed them into another computer program to create the display. This would not have happened using an integrated environment.

Shamer shaming

This post is by Phil Price.

I can’t recall when I first saw “shaming” used in its currently popular sense. I remember noting “slut shaming” and “fat shaming” but did they first become popular two years ago? Three? At any rate, “shaming” is now everywhere…and evidently it’s a very bad thing.

When I first saw the term, I agreed with the message it was trying to convey: it is bad to try to make people feel ashamed of being fat, or of wanting to have sex. Indeed, I’d say it’s bad to try to make people feel ashamed of anything that isn’t unethical or morally wrong or at least irritating. Down with slut shaming! Down with fat shaming! Down with gay shaming!

But somehow all criticism seems to have become “shaming.” A few days ago I posted a message to my neighborhood listserv, reminding people that (1) we are in a severe drought (I live in California), (2) washing one’s car with a hose uses a lot of water, and indeed is a fineable offense if you don’t use a nozzle that shuts off the water when you release it, (3) all commercial car washes in our area recycle their water, and (4) our storm drains empty directly into a creek. The next day I got an angry email from a neighbor: how dare I shame him for washing his car on the street?

On this blog, Andrew has frequently posted about researchers doing shameful things, such as plagiarizing, and refusing to admit to major mistakes in their published work. (There’s nothing shameful about making a mistake, at least not if you’ve tried hard to get it right, but it is shameful to refuse to admit it). And, sure enough, some people have complained that Andrew is “shaming” these people.

Plagiarist-shaming, academic fraud-shaming, hack journalist-shaming, all of those are evidently in the same unacceptable category as fat-shaming and slut-shaming. There is nothing shameful in the world, except trying to make somebody feel ashamed. Shamer-shaming is the only kind of shaming that is OK.

This post is by Phil Price

Palko’s on a roll

I just wanted to interrupt our scheduled stream of posts to link to a bunch of recent material from Mark Palko:

At least we can all agree that ad hominem and overly general attacks are bad: A savvy critique of the way in which opposition of any sort can be dismissed as “ad hominem” attacks. As a frequent critic (and subject of criticism), I agree with Palko that this sort of dismissal is a bad thing.

Wondering where the numbers come from — Rotten Tomatoes: These numbers really are rotten. Palko writes:

This figure indicates a “Good” rating. How does that translate to “Rotten”? . . . this is pretty clearly a glitch and it’s a glitch in the easy part of review aggregation . . . This brings up one of my [Palko's] problems with data-driven journalism. Reporters and bloggers are constantly barraging us with graphs and analyses and of course, narratives looking at things like Rotten Tomatoes rankings. All to often, though, their process starts with the data as given. They spend remarkably little time asking where the data came from or whether it’s worth bothering with.

I’ll just throw in the positive message that criticism can improve the numbers. After seeing this post, maybe the people at the website in question will be motivated to clean their data a bit.

Shifting alliances:

The education reform movement has never lent itself to the standard left/right axis. Not only was its support bipartisan; it was often the supporters on the left who were quickest to embrace privatization, deregulation and market-based solutions. Zephyr Teachout may be a sign that anomaly is ending.

I’d also be interested in seeing poll data on this (if it’s possible to get good data, given the low salience of this issue for many voters). My guess is that, even if many leaders on the left were supportive of privatization, etc., that these were not so popular among rank-and-file, lower-income left-leaning voters.

In any case, I’m fascinated by this topic for several reasons, including its inherent importance and the compelling stories of various education-reform scams and scandals (well relayed by Palko over the past few years). Also, from a political-sciene perspective, I’ve always been interested in issues that don’t line up with the usual partisan divide.

Driverless Cars and Uneasy Riders: Dramatic claims are being made about the potential fuel and economic-efficiency gains from the use of driverless cars. Palko (and I) are skeptical.

Another story that needs to be on our radar — ECOT: Yet another education reform scam that should be a scandal. Eternal vigilance etc.

I know I go on about ignoring Canada’s education system: Palko links to, and criticizes, a report that’s so bad in so many dimensions that it probably deserves its own post here or at the Monkey Cage.

Selection effects on steroids: I’m not such a fan of the expression “on steroids”—to me it’s a bit of journalism cliche that should’ve died along with the 80s—but the statistical, and policy, point is important. Selection bias is one of these things that we statisticians have known about and talked about forever, but even so we probably don’t talk about it enough. As a researcher and as a teacher, I feel the challenge is to go beyond criticism and move to adjustment. But criticism is often a necessary first step.

Support your local journalists: Yup.

I know I pick on Netflix a lot: “the way flacks and hacks force real stories into standard narratives”

The essential distinction between charter schools and charter school chains:

The charter school sector is highly diverse. It ranges from literal mom and pop operations to nation-wide corporations. The best of these schools get good results, genuinely care about their students and can fill an important educational niche. The worst aggressively cook data to conceal mediocre results and gouge the taxpayers.

If current trends hold, I think charter schools will be nearly as diverse and I’m not optimistic about who the winners will be.

As Steven Levitt would say, incentives matter.

What do you do to visualize uncertainty?

Howard Wainer writes:

What do you do to visualize uncertainty?
Do you only use static methods (e.g. error bounds)?
Or do you also make use of dynamic means (e.g. have the display vary over time proportional to the error, so you don’t know exactly where the top of the bar is, since it moves while you’re watching)?

Have you any thoughts on this topic?
I assume that since a Bayesian generates a posterior dist’n the output should not be point but rather a dist’n; and you being the most prolific Bayesian I know that you’ve got three or four old papers that you’ve written on it.

OK, sure, when you put it that way, my collaborators and I do have a few papers on the topic:

Visualization in Bayesian data analysis

Visualizing distributions of covariance matrices

Multiple imputation for model checking: completed-data plots with missing and latent data

A Bayesian formulation of exploratory data analysis and goodness-of-fit testing

All maps of parameter estimates are misleading

But I don’t really have much else to say right now. Dynamic graphics seem like a good idea but I’ve never programmed them myself. In many settings it will work to display point estimates, but sometimes this can create big problems (as discussed in some of the above-linked papers) because Bayesian point estimates will tend to be too smooth—less variable—compared to the variation in the underlying parameters being modeled.

So I’m kicking this one out to the commenters to see if they can offer some useful suggestions.

They know my email but they don’t know me

This came (unsolicited) in the inbox today (actually, two months ago; we’re on a delay, as you’re probably aware), subject line “From PWC – animations of CEO opinions for 2014″:

Good afternoon,

I wanted to see if the data my colleague David sent to you was of any interest. I have attached here additional animated Gifs from PwC’s CEO survey. Let me know if you would be interested in featuring these pieces or in a guest post by PwC.

Best,
**

** on behalf of **

Attached were two infographics which you can bet I’m not including here.

P.S. Just to be clear: I don’t think unsolicited emails are so horrible; I myself send emails to strangers all the time. Nor am I offended by the content. I just think it’s funny that there are people out there who think I’m interesting in publishing animated chartjunk.

More bad news for the buggy-whip manufacturers

In a news article regarding difficulties in using panel surveys to measure the unemployment rate, David Leonhardt writes:

The main factor is technology. It’s a major cause of today’s response-rate problems – but it’s also the solution.

For decades, survey research has revolved around the telephone, and it’s worked very well. But Americans’ relationship with their phones has radically changed. It’s no surprise that survey research will have to as well. . . .

In the future, we are unlikely to live in a country in which information is scant. We are certain to live in one in which information is collected in different ways. The transition is under way, and the federal government is among those institutions that will need to adapt.

Let’s hope that the American Association for Public Opinion Research can adapt too.

PostcardTorringtonCTRacingBuggyCirca1910

On deck this week

Mon: More bad news for the buggy-whip manufacturers

Tues: They know my email but they don’t know me

Wed: What do you do to visualize uncertainty?

Thurs: Sokal: “science is not merely a bag of clever tricks . . . Rather, the natural sciences are nothing more or less than one particular application — albeit an unusually successful one — of a more general rationalist worldview”

Fri: Question about data mining bias in finance

Sat: Estimating discontinuity in slope of a response function

Sun: I can’t think of a good title for this one.

Six quotes from Kaiser Fung