Our first Daily Beast column is here.
Nathan Lemoine writes:
I’m an ecologist, and I typically work with small sample sizes from field experiments, which have highly variable data. I analyze almost all of my data now using hierarchical models, but I’ve been wondering about my interpretation of the posterior distributions. I’ve read your blog, several of your papers (Gelman and Weakliem, Gelman and Carlin), and your excellent BDA book, and I was wondering if I could ask your advice/opinion on my interpretation of posterior probabilities.
I’ve thought of 95% posterior credible intervals as a good way to estimate effect size, but I still see many researchers use them in something akin to null hypothesis testing: “The 95% interval included zero, and therefore the pattern was not significant”. I tend not to do that. Since I work with small sample sizes and variable data, it seems as though I’m unlikely to find a “significant effect” unless I’m vastly overestimating the true effect size (Type M error) or unless the true effect size is enormous (a rarity). More often than not, I find ‘suggestive’, but not ‘significant’ effects.
In such cases, I calculate one-tailed posterior probabilities that the effect is positive (or negative) and report that along with estimates of the effect size. For example, I might say something like
“Foliar damage tended to be slightly higher in ‘Ambient’ treatments, although the difference between treatments was small and variable (Pr(Ambient>Warmed) = 0.86, CI95 = 2.3% less – 6.9% more damage).”
By giving the probability of an effect as well as an estimate of the effect size, I find this to be more informative than simply saying ‘not significant’. This allows researchers to make their own judgements on importance, rather than defining importance for them by p < 0.05. I know that such one-tailed probabilities can be inaccurate when using flat priors, but I place weakly informative priors ( N(0,1) or N(0,2) ) on all parameters in an attempt to avoid such overestimates unless strongly supported by my small sample sizes.
I was wondering if you agree with this philosophy of data reporting and interpretation, or if I’m misusing the posterior probabilities. I’ve done some research on this, but I can’t find anyone that’s offered a solid opinion on this. Based on my reading and the few interactions I’ve had with others, it seems that the strength of posterior probabilities compared to p-values is that they allow for such fluid interpretation (what’s the probability the effect is positive? what’s the probability the effect > 5? etc.), whereas p-values simply tell you “if the null hypothesis is true, theres a 70 or 80% chance I could observe an effect as strong as mine by chance alone”. I prefer to give the probability of an effect bounded by the CI of the effect to give the most transparent interpretation possible.
My short answer is that this is addressed in this post:
If you believe your prior, then yes, it makes sense to report posterior probabilities as you do. Typically, though, we use flat priors even though we have pretty strong knowledge that parameters are close to 0 (this is consistent with the fact that we see lots of estimates that are 1 or 2 se’s from 0, but very few that are 4 or 6 se’s from 0). So, really, if you want to make such a statement I think you’d want a more informative prior that shrinks to 0. If, for whatever reason, you don’t want to assign such a prior, then you have to be a bit more careful about interpreting those posterior probabilities.
In your case, you’re using weakly-informative priors such as N(0,1), this is less of a concern. Ultimately I guess the way to go is to embed any problem in a hierarchical meta-analysis so that the prior makes sense in the context of the problem. But, yeah, I’ve been using N(0,1) a lot myself lately.
“Faith means belief in something concerning which doubt is theoretically possible.” — William James (again)
Eric Tassone writes:
So, here’s a Bill James profile from late-ish 2014 that I’d missed until now. It’s baseball focused, which was nice — so many recent articles about him are non-baseball stuff. Here’s an extended excerpt of a part I found refreshing, though it’s probably just that my expectations have gotten pretty low of late w/r/t articles about him. What is going on in this passage? … an evolving maturity for him? … merely exchanging one set of biases for another?
Anyway, surprisingly I enjoyed the article. I hope you enjoy it too. Here’s an excerpt:
But [James] wonders if the generation of baseball fans he inspired have expanded their skepticism to the point where it has crowded out other things like wonder and tolerance and a healthy understanding of our own limited understanding.
Right now, Bill James thinks this sort of arrogance can be dangerous in the sabermetric community. There is more baseball data available now than ever before, and the data grows exponentially. “Understanding cannot keep up with the data,” he says. “It will take many years before we fully understand, say, some of the effects of PITCHf/x (which charts every pitch thrown). It’s important not to skip steps.”
He groans whenever he hears people discount leadership or team chemistry or heart because they cannot find such things in the data. He has done this himself in the past … and regrets it.
“I have to take my share of responsibility for promoting skepticism about things that I didn’t understand as well as I might have,” he says. “What I would say NOW is that skepticism should be directed at things that are actually untrue rather than things that are difficult to measure.
“Leadership is one player having an effect on his teammates. There is nothing about that that should invite skepticism. People have an effect on one another in every area of life. … We all affect another’s work. You just can’t really measure that in an individual-accounting framework.”
The young Bill James rather famously wrote that he could not find any evidence that certain types of players could consistently hit better in the clutch – he still has not found that evidence. But unlike his younger self, he will not dismiss the idea of clutch hitting. He has been a consultant for the Red Sox for more than a decade, and he has watched David Ortiz deliver so many big hits in so many big moments, and he finds himself unwilling to deny that Big Papi does have an ability in those situations others don’t have. He wrote an essay with this thought in mind, suggesting that just because we have not found the evidence is not a convincing argument that the evidence does not exist.
“I think I had limited understanding of these issues and wrote about them — little understanding and too-strong opinions,” he says. “And I think I damaged the discussion in some ways when I did this. … these sorts of effects (leadership and clutch-hitting and how players interact) CAN be studied. You just need to approach the question itself, rather than trying to back into it beginning with the answer.”
I responded: Interesting . . . but I wonder if part of this is that James is such an insider now that he’s buying into all the insider tropes.
Yep, exactly . . . especially since one is about his guy, Ortiz!
Psychologists speak of “folk psychology” or “folk physics” as the intuitive notions we have about the world, which typically describe some aspects of reality but ultimately are gross oversimplifications.
I encountered a good example of “folk genetics” the other day after following the clickbait link to “22 Things We Learned Hanging Out With Sam Smith”:
1. He wasn’t afraid to speak up for equality at his Catholic school.
“From what I can remember, they believe that you can be homosexual, but you just can’t practice it, which is ridiculous,” he says. “I would just say, ‘I am proof that it’s genetic. It has to be, because it wasn’t a choice.’ And that’s it. That’s my only argument, you know? You love who you love, and I can’t help that I like guys.”
The fallacy, of course, is to think that everything is a choice or is genetic. Just google *identical twins one straight one gay* if you want to learn more. I followed these links myself and found that genetic essentialism on this topic is not restricted the pro-gay side. From the other direction is the homophobic “nooneisborngay.com” that announces “How is it possible that identical twins with identical DNA have different sexual attractions? Simple. No one is born gay. . . . if people are born homosexual, then all identical twin pairs should have identical sexual attractions, 100% of the time . . .” Nope nope nope. A lot can happen between conception and birth!
Anyway, this all gives me some sympathy for the ill-fated genetic essentialism of Nicholas Wade. If a Grammy-winning artist and the savants at nooneisborngay.com can get this wrong, what hope is there for a New York Times science writer?
P.S. It’s still OK to like Sam Smith, right? I think he can’t really be overexposed until his second album comes out. The backlash starts then.
P.P.S. I originally titled this post, “An amusing window into folk genetics.” How boring can you get??? So I retitled as above. Like John Updike, I have difficulty coming up with good titles.
Kaiser Fung and I have a new weekly column for the Daily Beast. After much deliberation, we gave it the title Statbusters (the runner-up choice was Dirty Data; my personal preference was Statboyz in the Hood, but, hey, who ever listens to me on anything?).
The column will appear every Saturday, and Kaiser and I are planning to alternate. Should be fun. The first column will go online this Saturday.
P.S. It didn’t go online until Sunday. It’s here.
I was bothered by a recent post on the sister blog. The post was by political scientist David Fortunato and it was called, Would “concealed carry” have stopped Dylann Roof’s church shooting spree?.
What bugged me in particular was this sentence:
On its face, the claim that increasing the number of gun carriers would reduce crime seems logical (at least to an economist). If more people carry guns, then criminals would understand that the likelihood of their victims defending themselves with a gun is higher and would therefore be less likely to commit crime. In simple economic terms, easing concealed carry seeks to increase the cost assailants pay to commit a crime, so they choose not to, we hope.
On its face, I think the above argument is ridiculous in its naive division of the population into “criminals” and “their victims,” with no mention of the idea that the presence of the gun could lead to escalation. Also the silly idea that “a criminal” decides whether to “commit a crime,” with no sense that events develop in unexpected ways. The above argument is “simple” indeed, but I think it does no service to economists to consider it as logical.
This sort of thing bothers me about a lot of op-ed style writing, that the author has a certain flow in mind (in this case, it seems that Fortunato wanted to say that evidence on gun control is weak, and so he decided to lead up to it in this way). Also a problem with social science that everything has to be a “puzzle.” Fortunato writes:
But scientific research on the ability of concealed carry to reduce crime has yielded mixed results. A few studies suggest these policies are effective, but even more suggest that making concealed carry easier does not reduce crime and may even increase instances of firearm injury. Why is this the case?
Why indeed? It hardly seems a mystery that if more people are carrying guns, then they might use them accidentally or on purpose to shoot people.
Again I see a problem with Fortunato’ theoretical framework when he writes: “More people choose to carry firearms . . . Criminals observe (or infer) that more people are carrying firearms,” as if there is some division between “people” and “criminals.”
I don’t mean this as a slam on Fortunato’s research, and I’m not saying that the obvious answers to his question tells the whole story. Of course I have a huge respect for research on topics that laypeople consider to be obvious. But his post really bothers me as representing some of the unfortunate tendencies of journalism and social science to take a counterintuitive idea (giving out guns is a way to reduce gun crime!) and normalizing it so much as to make it the default explanation.
From the Monkey Cage perspective, I don’t think much can be done here—in the old days I would’ve done a follow-up post but I know we don’t really do these anymore—but maybe we can all be aware of the pitfalls of taking a counterintuitive framework as the norm. I think the content of Fortunato’s post is excellent but I’m disturbed by how it’s framed, as if it’s some sort of surprise that a wacky counterintuitive bank-shot theory of the world doesn’t actually seem to be borne out by the data.
P.S. I elaborate here in comments:
When someone focuses on an indirect effect instead of the first-order effect, that’s what I call a counterintuitive argument. Indeed, this might be the core of many or even all counterintuitive arguments: The first-order effect is obviously in one direction (increase the number of guns, make it more normal to carry guns as a way of life, and people like Dylann Roof are more likely to go shoot people), but there’s this second-order effect (Dylann Roof might not shoot if he thinks the people in the church are armed too).
The counterintuitive claim is that the second-order effect outweighs the first-order effect. It’s counterintuitive for the same general reason that an echo is typically not as loud as the original sound, or for the same general reason that elasticities are typically less than 1.
Some counterintuitive claims are true. But it hardly seems puzzling when a bank-shot, counterintuitive claim is not supported by the data.
Logo design by Michael Betancourt and Stephanie Mannheim.
P.S. Some commenters suggested the top of the S above is too large, but I wonder if that’s just because I’ve posted the logo in a large format. On the screen it would typically be smaller, something like this, which appears a bit more tasteful:
Can Candan writes:
I have scraped horse racing data from a web site in Turkey and would like to try some models for predicting the finishing positions of future races, what models would you suggest for that?
There is one recent paper on the subject that seems promising, which claims to change the SMO algorithm of support vector regression to work with race based stratification, but no details given, I don’t understand what to modify with SMO algorithm.
This builds on the above one and improves with NDCG based model tuning of least squares SVR.
There’s a conditional logistic regression approach which I tried to implement, but I couldn’t get the claimed improvement over the public odds of winning, may be I’m doing something wrong here.
I’m quite comfortable with R any books, pointers, code snippets are greatly appreciated.
My reply: Sorry, this one is too far away from my areas of expertise!
CDC should know better.
P.S. In comments, Zachary David supplies this correctly-scaled version:
It would be better to label the lines directly than to use a legend, and the y-axis is off by a factor of 100, but I can hardly complain given that he just whipped this graph up for us.
The real point is that, once the x-axis is scaled correctly, the shapes of the curves change! So that original graph really was misleading, in that it incorrectly implies a ramping up in the 3-10 year range.
P.P.S. Zachary David sent me an improved version:
Ideally the line labels would be colored so there’d be no need for the legend at all, but at this point I really shouldn’t be complaining.
Mon: Hey, what’s up with that x-axis??
Tues: A question about race based stratification
Wed: Our new column in the Daily Beast
Thurs: Irwin Shaw: “I might mistrust intellectuals, but I’d mistrust nonintellectuals even more.”
Fri: An amusing window into folk genetics
Sat: “Faith means belief in something concerning which doubt is theoretically possible.” — William James (again)
Sun: Interpreting posterior probabilities in the context of weakly informative priors