Skip to content

William Shakespeare (1) vs. Karl Marx

For yesterday‘s winner, I’ll follow the reasoning of Manuel in comments:

Popper. We would learn more from falsifying the hypothesis that Popper’s talk is boring than what we would learn from falsifying the hypothesis that Richard Pryor’s talk is uninteresting.

And today we have the consensus choice for greatest writer vs. the notorious political philosopher. Marx is unseeded in the Founders of Religions category but he’s had lots of influence on the world. Both these guys are pretty quotable. So who’s it gonna be, the actor or the radical?

P.S. No Groucho jokes, please. And no need for reminders that lots of bad things were done in the name of Marxism. We’re choosing a seminar speaker here, that’s all. We’re not endorsing a philosophy.

P.P.S. As always, here’s the background, and here are the rules.

“The harm done by tests of significance” (article from 1994 in the journal, “Accident Analysis and Prevention”)

Ezra Hauer writes:

In your January 2013 Commentary (Epidemiology) you say that “…misunderstanding persists even in high-stakes settings.” Attached is an older paper illustrating some such.

“It is like trying to sink a battleship by firing lead shot at it for a long time”—well put!

Richard Pryor (1) vs. Karl Popper


The top-seeded comedian vs. an unseeded philosopher. Pryor would be much more entertaining, that’s for sure (“Arizona State Penitentiary population: 80 percent black people. But there are no black people in Arizona!”). But Karl Popper laid out the philosophy that is the foundation for modern science. His talk, even if it is dry, might ultimately be more interesting.

What do you think?

P.S. As always, here’s the background, and here are the rules.

Psych journal bans significance tests; stat blogger inundated with emails

OK, it’s been a busy email day.

From Brandon Nakawaki:

I know your blog is perpetually backlogged by a few months, but I thought I’d forward this to you in case it hadn’t hit your inbox yet. A journal called Basic and Applied Social Psychology is banning null hypothesis significance testing in favor of descriptive statistics. They also express some skepticism of Bayesian approaches, but are not taking any action for or against it at this time (though the editor appears opposed to the use of noninformative priors).

From Joseph Bulbulia:

I wonder what you think about the BASP’s decision to ban “all vestiges of NHSTP (P-values, t-values, F-values, statements about “significant” differences or lack thereof and so on)”?

As a corrective to the current state of affairs in psychology, I’m all for bold moves. And the emphasis on descriptive statistics seems reasonable enough — even if more emphasis could have placed on visualising the data, more warnings could have been issued around the perils of un-modelled data, and more value could have been placed on obtaining quality data (as well as quantity).

My major concern, though, centres on the author’s timidness about Bayesian data analysis. Sure, not every Bayesian analysis deserves to count as a contribution, but nor is it the case that Bayesian methods should be displaced while descriptive methods are given centre stage. We learn by subjecting our beliefs to evidence. Bayesian modelling merely systematises this basic principle, so that adjustments to belief/doubt are explicit.

From Alex Volfovsky:

I just saw this editorial from Basic and Applied Social Psychology:

Seems to be a somewhat harsh take on the question though gets at the frequently arbitrary choice of “p<.05" being important...

From Jeremy Fox:

Psychology journal bans inferential statistics: As best I can tell, they seem to have decided that all statistical inferences from sample to population are inappropriate.

From Michael Grosskopf:

I thought you might find this interesting if you hadn’t seen it yet. I imagine it is mostly the case of a small journal trying to make a name for itself (I know nothing of the journal offhand), but still is interesting.

From the Reddit comments on a thread that led me to the article:
“They don’t want frequentist approaches because you don’t get a posterior, and they don’t want Bayesian approaches because you don’t actually know the prior.”

From John Transue:

Null Hypothesis Testing BANNED from Psychology Journal: This will be interesting.

From Dominik Papies:

I assume that you are aware of this news, but just in case you haven’t heard, one journal from psychology issued a ban on NHST (see editorial, attached). While I think that this is a bold move that may shake things up nicely, I feel that they may be overshooting, as not the technique per se, but rather its use seems the real problem to me. The editors also state they will put more emphasis on sample size and effect size, which sounds like good news.

From Zach Weller:

One of my fellow graduate students pointed me to this article (posted below) in the Basic and Applied Social Psychology (BASP) journal. The article announces that hypothesis testing is now banned from BASP because the procedure is “invalid”. Unfortunately, this has caused my colleague’s students to lose motivation for learning statistics. . . .

From Amy Cohen:

From the Basic and Applied Social Psychology editorial this month:

The Basic and Applied Social Psychology (BASP) 2014 Editorial emphasized that the null hypothesis significance testing procedure (NHSTP) is invalid, and thus authors would be not required to perform it (Trafimow, 2014). However, to allow authors a grace period, the Editorial stopped short of actually banning the NHSTP. The purpose of the present Editorial is to announce that the grace period is over. From now on, BASP is banning the NHSTP. With the banning of the NHSTP from BASP, what are the implications for authors?

From Daljit Dhadwal:

You may already have seen this, but I thought you could blog about this: the journal “Basic and Applied Social Psychology” is banning most types of inferential statistics (p-values, confidence intervals, etc.).

Here’s the link to the editorial:

John Kruschke blogged about it as well:

The comments on Kruschke’s blog are interesting too.

OK, ok, I’ll take a look. The editorial article in question is by David Trafimow and Michael Marks. Krushke points out this quote from the piece:

The usual problem with Bayesian procedures is that they depend on some sort of Laplacian assumption to generate numbers where none exist. The Laplacian assumption is that when in a state of ignorance, the research should assign an equal probability to each possibility.

Huh? This seems a bit odd to me, given that I just about always work on continuous problems, so that the “possibilities” can’t be counted and it is meaningless to talk about assigning probabilities to each of them. And the bit about “generating numbers where none exist” seems to reflect a misunderstanding of the distinction between a distribution (which reflects uncertainty) and data (which are specific). You don’t want to deterministically impute numbers where the data don’t exist, but it’s ok to assign a distribution to reflect your uncertainty about such numbers. It’s what we always do when we do forecasting; the only thing special about Bayesian analysis is that it applies the principles of forecasting to all unknowns in a problem.

I was amused to see that, when they were looking for an example where Bayesian inference is OK, they used a book by R. A. Fisher!

Trafimow and Marks conclude:

Some might view the NHSTP [null hypothesis significance testing procedure] ban as indicating that it will be easier to publish in BASP [Basic and Applied Social Psychology], or that less rigorous manuscripts will be acceptable. This is not so. On the contrary, we believe that the p < .05 bar is too easy to pass and sometimes serves as an excuse for lower quality research. We hope and anticipate that banning the NHSTP will have the effect of increasing the quality of submitted manuscripts by liberating authors from the stultified structure of NHSTP thinking thereby eliminating an important obstacle to creative thinking.

I’m with them on that. Actually, I think standard errors, p-values, and confidence intervals can be very helpful in research when considered as convenient parts of a data analysis (see chapter 2 of ARM for some examples). Standard errors etc. are helpful in giving a lower bound on uncertainty. The problem comes when they’re considered as the culmination of the analysis, as if “p less than .05″ represents some kind of proof of something. I do like the idea of requiring that research claims stand on their own without requiring the (often spurious) support of p-values.

Abraham (4) vs. Jane Austen

Yesterday’s is a super-tough call. I’d much rather hear Stewart Lee than Aristotle. I read one of Lee’s books, and he’s a fascinating explicator of performance. Lee gives off a charming David Owen vibe—Phil, you know what I’m saying here—he’s an everyman, nothing special, he’s just been thinking really hard lately and wants to share his insights with all of us.

Aristotle, though, I could care less.

But the commenters mostly favored Aristotle, basically on the grounds that he invented science. And, as Keith put it, “being scientific is absolutely no defense again being wrong, but rather just an acceleration of the process of getting less wrong.” And Aristotle is probably a good seminar speaker—seminars are what they did all day back then, right?

Ultimately I’ll have to go with Patrick:

Stewart Lee.

I can’t see Aristotle presenting a seminar on his biggest philosophical mistakes.

But I can see Lee spending a seminar on his least funny jokes, and getting a few laughs at the same time.

And today’s match is a forfeit. Abraham (listed as #4 in the Founders of Religions category) does not belong in this contest. The other 63 people in the bracket are real people, Abraham is the only fictional character here, he just doesn’t belong. So Jane will advance, uncontested, to the next round.


P.S. As always, here’s the background, and here are the rules.

The axes are labeled but I don’t know what the dots represent.


John Sukup writes:

I came across a chart recently posted by Boston Consulting Group on LinkedIn and wondered what your take on it was. To me, it seems to fall into the “suspicious” category but thought you may have a different opinion.

I replied that this one baffles me cos I don’t know what the dots represent! This is typical in graphs, the axes are labeled but I don’t know what it is that is being labeled.

Sukup replied:

Indeed. The axes are also labeled strangely and the years used to calculate the CAGR are also not the same. I’d have expected more from BCG—but this type of data visualization error seems pervasive these days!

Aristotle (3) vs. Stewart Lee

Yesterday‘s winner is a tough one. Really, these two guys could’ve met in the final.

Some arguments in the comments in favor of Freud: From Huw, “he has the smirks, knowing looks, and barely missed sidelong glances.” And Seth points out the statistical connection: “Some people might say that theory is getting lost in the identification revolution. Freud didn’t have that problem.” And Manoel picks up on an old line from this blog and writes that “I think we should pick Freud as the typical economist . . . which are under-represented in this contest. Arguably, both have a silly theory of human action and a huge impact on our society nonetheless.”

In favor of King, Zbicyclist recommends stopping Freud’s ideas spreading farther, and he add: “this sets up a possible Gandhi vs King match two rounds further.” That’s a good argument but I’ll have to go with Freud, because he inspired so much more enthusiasm, positive and negative, in the comment thread.

And, today, the third-best philosopher vs. the 41st Best Stand-up Ever.

I don’t know what to think about Aristotle. On one hand, he invented science. On the other, he’s most famous for being wrong. Whether the topic is slavery, or the laws of motion, or how many teeth are in the mouth of men and women—you name it, Aristotle’s on the wrong end of the stick.

On the other hand, if he truly is an empiricist, Aristotle might give a good talk in which he re-evaluates his philosophy in response to learning about all these famous errors.

Stewart Lee is more of a known quantity. You can check out his DVD’s.

P.S. As always, here’s the background, and here are the rules.

Upcoming Stan-related talks

If you’re in NYC or Sidney, there are some Stan-related talks in the next few weeks.


New York



  • 4 March. Bob Carpenter: The Benefits of a Probabilistic Model of Data Annotation. Macquarie Uni Computer Science Dept.
  • 10 March, 2–3 PM. Bob Carpenter: Stan: Bayesian Inference Made Easy. Macquarie Uni Statistics Dept. Building E4A, room 523.
  • 11 March, 6 PM, Mithcell Theatre, Level 1 at SMSA (Sydney Mechanics’ School of Arts). Bob Carpenter: RStan: Bayesian Inference Made Easy. Register Now: Sydney Users of R Forum (SURF) (Meetup)



“A small but growing collection of studies suggest X” . . . huh?

Lee Beck writes:

I’m curious if you have any thoughts on the statistical meaning of sentences like “a small but growing collection of studies suggest [X].” That exact wording comes from this piece in the New Yorker, but I think it’s the sort of expression you often see in science journalism (“small but mounting”, “small but growing”, etc.). A post on your own blog quotes a New York Times piece using the phrase, “a growing body of science suggesting [X]” but the post does not address the expression itself.

For Bayesians the weight of evidence available now should be all that matters, right? How the weight of evidence has changed with respect to time would seem to offer no additional information. If anything, trends in research should themselves be based on the evidence already revealed, so it seems like double-counting to include growth-in-evidence as evidence itself.

Maybe there is a more complicated justification. For example, if researchers have both unpublished evidence and (weak) published evidence and their research agenda is determined by both, then the very fact that they the number of such studies is “growing” more quickly than would seem to be justified by the (weak) published evidence could itself be an indicator that the unpublished evidence bolsters the (weak) published evidence. That seems way too convoluted to be what the journalist or reader could have had in mind, though!

So I’m curious whether you think “growing evidence” is a statistical howler? There are over 700,000 google hits for the phrase “growing evidence,” so if it really means nothing, that will be news to a lot of writers and editors.

Interesting question. How would we model this process? Sometimes it does seem to happen that a new hypothesis arises and the evidence becomes stronger and stronger in its favor (for example, global warming); other times there’s a new hypothesis and the evidence just doesn’t seem to be there (for example, cold fusion). Still other times the evidence seems to simmer along at a sort of low boil, with a continuing supply of evidence but nothing completely convincing (for example, stereotype threat). Ultimately, though we like to think of the evidence as increasing toward one conclusion or another.

So, maybe the phrase “growing evidence” is ok. But this only works if we accept that sometimes the evidence isn’t growing.

To see this, shift away from the press and go into the lab. It is natural to take inconclusive evidence and think of it as the first step on the road to success. Suppose, for example, you have some data and you get an estimate of 2.0 with a standard error of 1.4. This is not statistically significant—but it’s close! And it’s easy to think that, if you just double your sample size, you’ll get success: double your sample size, the standard error goes down by a factor of sqrt(2), and you get a standard error of 1.0: the estimate will be 2 standard errors away from 0. But that’s incorrect because there’s no reason to assume that the estimate will stay fixed at 2.0. Indeed, under the prior in which small effects are more likely than large effects, it’s more likely the estimate will go lower rather than higher, once more data come in.

So, in that sense, I agree with Lee Beck that the frame of “small and growing evidence” can be misleading, in that it encourages a mode of thinking in which we first extrapolate from what we see, then we implicitly condition on these potential data that haven’t occurred yet, in order to make our conclusions stronger than they should be.

And then you end up with renowned biologist James D. Watson saying in 1998, “Judah is going to cure cancer in two years.” There was a small but mounting pile of evidence.

It’s 2015. Judah did a lot of things in his time, but cancer is still here.

Martin Luther King (2) vs. Sigmund Freud

We didn’t get any great comments yesterday, so I’ll have to go with PKD on the grounds that he was the presumptive favorite, and nobody made any good case otherwise.

Screen Shot 2015-02-14 at 12.04.08 PM

And today we have the second seed among the Religious Leaders vs. an unseeded entry in the Founders of Religions category. Truly a classic matchup. MLK perhaps has the edge here because he can talk about plagiarism; on the other hand, Freud is an expert in unfalsifiable research theories. I imagine that either one would be an amazingly compelling speaker. King would have a lot to say about Middle East wars, globalization, and economic and social inequality; and Freud could wittily diagnose all of society’s problems. I’d love to have them both—but that’s not allowed. So who’s it gonna be?

P.S. As always, here’s the background, and here are the rules.