Skip to content

Upcoming Stan-related talks

If you’re in NYC or Sidney, there are some Stan-related talks in the next few weeks.


New York



  • 4 March. Bob Carpenter: The Benefits of a Probabilistic Model of Data Annotation. Macquarie Uni Computer Science Dept.
  • 10 March, 2–3 PM. Bob Carpenter: Stan: Bayesian Inference Made Easy. Macquarie Uni Statistics Dept. Building E4A, room 523.
  • 11 March, 6 PM, Mithcell Theatre, Level 1 at SMSA (Sydney Mechanics’ School of Arts). Bob Carpenter: RStan: Bayesian Inference Made Easy. Register Now: Sydney Users of R Forum (SURF) (Meetup)



“A small but growing collection of studies suggest X” . . . huh?

Lee Beck writes:

I’m curious if you have any thoughts on the statistical meaning of sentences like “a small but growing collection of studies suggest [X].” That exact wording comes from this piece in the New Yorker, but I think it’s the sort of expression you often see in science journalism (“small but mounting”, “small but growing”, etc.). A post on your own blog quotes a New York Times piece using the phrase, “a growing body of science suggesting [X]” but the post does not address the expression itself.

For Bayesians the weight of evidence available now should be all that matters, right? How the weight of evidence has changed with respect to time would seem to offer no additional information. If anything, trends in research should themselves be based on the evidence already revealed, so it seems like double-counting to include growth-in-evidence as evidence itself.

Maybe there is a more complicated justification. For example, if researchers have both unpublished evidence and (weak) published evidence and their research agenda is determined by both, then the very fact that they the number of such studies is “growing” more quickly than would seem to be justified by the (weak) published evidence could itself be an indicator that the unpublished evidence bolsters the (weak) published evidence. That seems way too convoluted to be what the journalist or reader could have had in mind, though!

So I’m curious whether you think “growing evidence” is a statistical howler? There are over 700,000 google hits for the phrase “growing evidence,” so if it really means nothing, that will be news to a lot of writers and editors.

Interesting question. How would we model this process? Sometimes it does seem to happen that a new hypothesis arises and the evidence becomes stronger and stronger in its favor (for example, global warming); other times there’s a new hypothesis and the evidence just doesn’t seem to be there (for example, cold fusion). Still other times the evidence seems to simmer along at a sort of low boil, with a continuing supply of evidence but nothing completely convincing (for example, stereotype threat). Ultimately, though we like to think of the evidence as increasing toward one conclusion or another.

So, maybe the phrase “growing evidence” is ok. But this only works if we accept that sometimes the evidence isn’t growing.

To see this, shift away from the press and go into the lab. It is natural to take inconclusive evidence and think of it as the first step on the road to success. Suppose, for example, you have some data and you get an estimate of 2.0 with a standard error of 1.4. This is not statistically significant—but it’s close! And it’s easy to think that, if you just double your sample size, you’ll get success: double your sample size, the standard error goes down by a factor of sqrt(2), and you get a standard error of 1.0: the estimate will be 2 standard errors away from 0. But that’s incorrect because there’s no reason to assume that the estimate will stay fixed at 2.0. Indeed, under the prior in which small effects are more likely than large effects, it’s more likely the estimate will go lower rather than higher, once more data come in.

So, in that sense, I agree with Lee Beck that the frame of “small and growing evidence” can be misleading, in that it encourages a mode of thinking in which we first extrapolate from what we see, then we implicitly condition on these potential data that haven’t occurred yet, in order to make our conclusions stronger than they should be.

And then you end up with renowned biologist James D. Watson saying in 1998, “Judah is going to cure cancer in two years.” There was a small but mounting pile of evidence.

It’s 2015. Judah did a lot of things in his time, but cancer is still here.

Martin Luther King (2) vs. Sigmund Freud

We didn’t get any great comments yesterday, so I’ll have to go with PKD on the grounds that he was the presumptive favorite, and nobody made any good case otherwise.

Screen Shot 2015-02-14 at 12.04.08 PM

And today we have the second seed among the Religious Leaders vs. an unseeded entry in the Founders of Religions category. Truly a classic matchup. MLK perhaps has the edge here because he can talk about plagiarism; on the other hand, Freud is an expert in unfalsifiable research theories. I imagine that either one would be an amazingly compelling speaker. King would have a lot to say about Middle East wars, globalization, and economic and social inequality; and Freud could wittily diagnose all of society’s problems. I’d love to have them both—but that’s not allowed. So who’s it gonna be?

P.S. As always, here’s the background, and here are the rules.

“Unbiasedness”: You keep using that word. I do not think it means what you think it means. [My talk tomorrow in the Princeton economics department]


The talk is tomorrow, Tues 24 Feb, 2:40-4:00pm in 200 Fisher Hall:

“Unbiasedness”: You keep using that word. I do not think it means what you think it means.

Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University

Minimizing bias is the traditional first goal of econometrics. In many cases, though, the goal of unbiasedness can lead to extreme claims that are both substantively implausible and not supported by data. We illustrate with several examples in areas ranging from public opinion to social psychology to public heath, using methods including regression discontinuity, hierarchical models, interactions in regression, and data aggregation. Methods that purport to be unbiased, aren’t, once we carefully consider inferential goals and select on the analyses that are actually performed and reported. The implication for econometrics research: It’s best to be aware of all sources of error, rather than to focus narrowly on reducing bias with respect to one particular aspect of your model.

This work reflects collaboration with Guido Imbens and others. Here are the slides, and people can read the following papers for partial background:

Why high-order polynomials should not be used in regression discontinuity designs. (Andrew Gelman and Guido Imbens)

[2015] Evidence on the deleterious impact of sustained use of polynomial regression on causal inference. Research and Politics. (Andrew Gelman and Adam Zelizer)

[2014] Beyond power calculations: Assessing Type S (sign) and Type M (magnitude) errors. Perspectives on Psychological Science 9, 641-651. (Andrew Gelman and John Carlin)

On deck this week

Mon: “Unbiasedness”: You keep using that word. I do not think it means what you think it means. [My talk tomorrow in the Princeton economics department]

Martin Luther King (2) vs. Sigmund Freud

Tues: “A small but growing collection of studies suggest X” . . . huh?

Aristotle (3) vs. Stewart Lee

Wed: The axes are labeled but I don’t know what the dots represent.

Abraham (4) vs. Jane Austen

Thurs: In criticism of criticism of criticism

Richard Pryor (1) vs. Karl Popper

Fri: “The harm done by tests of significance” (article from 1994 in the journal, “Accident Analysis and Prevention”)

William Shakespeare (1) vs. Karl Marx

Sat: Forget about pdf: this looks much better, it makes all my own papers look like kids’ crayon drawings by comparison

Friedrich Nietzsche (4) vs. Alan Bennett

Sun: Time-release pedagogy??

Buddha (3) vs. John Updike

Philip K. Dick (2) vs. Jean Baudrillard

For yesterday, I was gonna go with Vincent, based on X’s comment:

In addition to his unique painting style and very special life, van Gogh was highly literate, as shown through the 844 letters from him that are available today.

X also made a missing-body-part joke, which I generally don’t think is so cool but, if anyone’s allowed to get away with that sort of humor, it’s X.

Anyway, now I was curious so I googled *Vincent Van Gogh letters* and found this site. I clicked through and looked at a few letters and they seemed like nothing special.

So, given that this was the best argument in favor and it wasn’t so great, I’ll have to call it for Grandma Moses, boring as she sounds.


Today we have Horselover Fat vs. the self-parodying intellectual. Dick would seem to be the easy winner here. But Baudrillard did write this:

Decidedly, joggers are the true Latter Day Saints and the protagonists of an easy-does-it Apocalypse. Nothing evokes the end of the world more than a man running straight ahead on a beach, swathed in the sounds of his walkman, cocooned in the solitary sacrifice of his energy, indifferent even to catastrophes since he expects destruction to come only as the fruit of his own efforts, from exhausting the energy of a body that has in his own eyesbecome useless. Primitives, when in despair, would commit suicide by swimming out to sea until they could swim no longer. The jogger commits suicide by running up and down the beach. His eyes are wild, saliva drips from his mouth. Do not stop him. He will either hit you or simply carry on dancing around in front of you like a man possessed.

I think we can safely say this is a contest between two guys who did not spend much time at the gym.

P.S. As always, here’s the background, and here are the rules.

“Academics should be made accountable for exaggerations in press releases about their own work”

Fernando Martel Garcia points me to this news article by Ben Goldacre:

For anyone with medical training, mainstream media coverage of science can be an uncomfortable read. It is common to find correlational findings misrepresented as denoting causation, for example, or findings in animal studies confidently exaggerated to make claims about treatment for humans. But who is responsible for these misrepresentations?

In the linked paper (doi:10.1136/bmj.g7015) Sumner and colleagues found that much of the exaggeration in mainstream media coverage of health research—statements that went beyond findings in the academic paper—was already present in the press release sent out to journalists by the academic institution itself.

Sumner and colleagues identified all 462 press releases on health research from 20 leading UK universities over one year. They traced 668 associated news stories . . .

The story is pretty much as you’d predict: a lot of the exaggeration comes in the press release.

I remarked that this makes sense. I agree. Of course, this is just a start, as I’m sure a lot of academics would be happy to put their names on various exaggerated claims! See, for example, here, where the researchers in question were very active with the publicity, and in which they dramatically overstated the implications on individual-level behavior that could be drawn from their state-level analysis. The lead research in this case was just a law professor, but still, we’d like to see better.

As this example illustrates, the problem is not necessarily any sort of conscious exaggeration or hype: I assume that the researchers in question really believe that their claims are supported by their data. For that matter, I assume that disgraced primatologist Mark Hauser really believes his theories.

To put it another reason: be skeptical of press releases, not because they’re written by sleazy public relations people, but because they’re written by, or with the collaboration, of researchers who know enough to make a superficially convincing case but not enough to recognize the flaws in their reasoning.

Vincent van Gogh (3) vs. Grandma Moses

In yesterday‘s battle of the religions, the strongest argument against Mother Teresa was given by Paul, who related that she was friends with all sorts of nasty politicians and that she’s been accused of spending money that came from questionable sources. But if that’s all you can say about her, it won’t cut much ice with Sun Myung Moon, who also was friends with various unsavory characters and scammed all sorts of money. So Moon is more of a badass. But this isn’t a contest about who’s the toughest guy—we’re looking for a seminar speaker here, not a rogue sociologist.

I’ll have to with Mother Teresa. I doubt we’ll see any faith healings but I’m persuaded by Ken’s negative case against Moon:

Moon’s wikipedia page describes him as a “religious leader, businessperson, political activist, and media mogul.” In my experience people with “rock star” status like this make for bad seminar speakers because they tend to be full of anecdotes and fluff, and light on rigorous empirical evidence.

Good point. The last thing we need here is a goddam Ted talk.


And today we have a contest between two artists! Vincent is more famous and would certainly be the bigger draw, but I wouldn’t be surprised if Grandma could give a more coherent lecture. On the other hand, according to wikipedia, “she was a Society of Mayflower Descendants and Daughters of the American Revolution member.” And that sounds pretty duuuulllllllll. It’s up to all of you to make the strongest and wittiest arguments on both sides.

P.S. As always, here’s the background, and here are the rules.

Bayes and doomsday

Ben O’Neill writes:

I am a fellow Bayesian statistician at the University of New South Wales (Australia).  I have enjoyed reading your various books and articles, and enjoyed reading your recent article on The Perceived Absurdity of Bayesian Inference.  However, I disagree with your assertion that the “doomsday argument” is non-Bayesian; I think if you read how it is presented by Leslie you will see that it is at least an attempt at a Bayesian argument.  In any case, although it has enough prima facie plausibility to trick people, the argument is badly flawed, and not a correct application of Bayesian reasoning.  I don’t think it is a noose around the Bayesian neck.

Anyway, I’m just writing because I thought you might be interested in a recent paper on this topic in the Journal of Philosophy.  The paper is essentially a Bayesian refutation of the doomsday argument, pointing out how it goes wrong, and how it is an incorrect application of Bayesian inference.  (And also how a correct application of Bayesian inference leads to sensible conclusions.)  Essentially, the argument confuses total series length with remaining series length, and sneaks information from the data into the prior in a way which is invalid.  Once this is corrected the absurd conclusions of the doomsday argument evaporate.

I don’t really have anything more to say on this topic (here’s my argument from 2005 as to why I think the doomsday argument is clearly frequentist and not particularly Bayesian) but I thought some of you might be interested, hence the pointer.

The bracket so far

Thanks to the Excel stylings of Paul Davidson:


Our competition is (approximately) 1/4 done!

And I’ve been thinking about possible categories for next year’s tourney:

New Jersey politicians
Articulate athletes
People named Greg or Gregg
Vladimir Nabokov and people connected to him
. . .

Ummm, we need 3 more categories. Any suggestions? Real people only, please. In some future year we can have an all-fictional category.