Skip to content

Buddha (3) vs. John Updike

Yesterday‘s winner is Friedrich Nietzsche. I don’t really have much to say here: there was lots of enthusiasm about the philosopher and none at all for the cozy comedian. Maybe Jonathan Miller would’ve been a better choice.

devries

Now for today’s battle. Buddha is seeded #3 among founders of religions. Updike is the unseeded author of the classic Rabbit, Run, and dozens of memorable short stories, but is detested by Helen DeWitt and various commenters on this blog.

Who’d be a better speaker? Updike is more of a Harvard guy but I guess he’d give a talk at Columbia if we asked, right?

P.S. As always, here’s the background, and here are the rules.

“Precise Answers to the Wrong Questions”

Our friend K? (not to be confused with X) seeks pre-feedback on this talk:

Can we get a mathematical framework for applying statistics that better facilitates communication with non-statisticians as well as helps statisticians avoid getting “precise answers to the wrong questions*”?

Applying statistics involves communicating with non-statisticians so that we grasp their applied problems and they understand how the methods we propose address our (incomplete) grasp of their problems. Statistical theory on the other hand, involves communicating with oneself and other qualified statisticians about statistical models that embody theoretical abstractions and one would be foolish to limit mathematical approaches in this task. However, as put in Kass, R. (2011), Statistical Inference: The Big Picture – “Statistical procedures are abstractly defined in terms of mathematics but are used, in conjunction with scientific models and methods, to explain observable phenomena. … When we use a statistical model to make a statistical inference [address applied problems] we implicitly assert … the theoretical world corresponds reasonably well to the real world.” Drawing on clever constructions by Francis Galton and insights into science and mathematical reasoning by C.S. Peirce, this talk will discuss an arguably mathematical framework (in the Peirce’s sense of diagrammatic reasoning) that might be better.

*“An approximate answer to the right question is worth a great deal more than a precise answer to the wrong question.” – John Tukey.

P.S. from Andrew: Here’s my article from 2011, Bayesian Statistical Pragmatism, a discussion of Rob Kass’s article on statistical pragmatism.

Key quote from my article:

In the Neyman–Pearson theory of inference, confidence and statistical significance are two sides of the same coin, with a confidence interval being the set of parameter values not rejected by a significance test. Unfortunately, this approach falls apart (or, at the very least, is extremely difficult) in problems with high-dimensional parameter spaces that are characteristic of my own applied work in social science and environmental health.

In a modern Bayesian approach, confidence intervals and hypothesis testing are both important but are not isomorphic [emphasis added]; they represent two different steps of inference. Confidence statements, or posterior intervals, are summaries of inference about parameters conditional on an assumed model. Hypothesis testing—or, more generally, model checking—is the process of comparing observed data to replications under the model if it were true.

Friedrich Nietzsche (4) vs. Alan Bennett

William Shakespeare had the most support yesterday; for example, from David: “I vote for Shakespeare just to see who actually shows up.” The best argument of the serious variety came from Babar, who wrote, “I would vote for WS. Very little is known about the man. I care very little about Marx’s mannerisms but I’d like to know if WS had modern day actor mannerisms. I’d like to see how he moved – is he a physical actor, or just a writer?” That’s an excellent point. Of all the seminar speaker candidates we’ve considered, Shakespeare’s the only one with a physical dimension in that way. It would be great to see the movements of an old-time actor.

But the funniest argument came from Jonathan:

As near as I can figure, Shakespeare was nothing more than a guy who could string a bunch of famous phrases together and make a play out of them. It’s a talent, to be sure, but a fairly minor one. Plus, if he’s in love with Gwyneth Paltrow, I’m out.

Ouch! Willie got zinged, so Karl’s in.

And, today it’s “God is dead” vs. the ultimate cozy comedian. Amazingly enough, it’s been more than 40 years since the first performance of “Forty Years On.”

P.S. As always, here’s the background, and here are the rules.

Bertrand Russell goes to the IRB

Jonathan Falk points me to this genius idea from Eric Crampton:

Here’s a fun one for those of you still based at a university.

All of you put together a Human Ethics Review proposal for a field experiment on Human Ethics Review proposals.

Here is the proposal within my proposal.

Each of you would propose putting together a panel of researchers at different universities. You would propose that each of your panel members – from diverse fields, seniority levels, ethnicities and such – would submit a proposal to his or her ethics review board or Institutional Review Board for approval, and each of the panellists would track the time it took to get the proposal approved, which legitimate ethical issues were flagged, which red herring issues also held things up, and how long and onerous the whole ordeal was.

Still in your proposal, you would then propose gathering the data from your panellists and drawing some conclusions about what sorts of schools have better or worse processes. Specific hypotheses to be tested would be whether universities with medical schools were worse than others because medical ethicists would be on the panel, and whether universities with faculty-based rather than centralised IRBs would have better approval processes.

You would note that members of your panels could ask their University’s HR advisers to get data on the people who are on the IRBs – race, gender, ethnicity, area of study, rank, age, experience, time on panel, number of children, marital status, and sexual orientation (though not all of those would be in each place’s HR database); you’d propose using these as control variables but also to test whether a panel’s experience made any difference and whether having a panel member from your home Department made any difference. It would also be interesting to note whether the gender, seniority, ethnicity and home department of the submitter made any difference to the application.

End of the proposal-within-the-proposal.

Now for the fun part: each one of you reading this is a potential member of a panel for a study for which nobody has ever sought ethical approval, but which will be self-approving in a particularly distributed fashion: The IRB proposal to be tested is the one I’ve just outlined. Whichever of you first gets ethical approval is the lead author on the paper, is a data point, and already has the necessary ethics approval. Everybody else, successful or not, is a data point.

This is just the greatest. You can only do this sort of study if you have IRB approval, but the only way to get IRB approval is . . . to do the study!

This is related to other paradoxes such as: I can do nice (or mean) things to people and write about what happens, but call it “research” and all of a sudden we’re in big trouble if we don’t get permission. Crampton’s idea is beautiful because it wraps the problem in itself. Russell, Cantor, and Godel would be proud.

William Shakespeare (1) vs. Karl Marx

For yesterday‘s winner, I’ll follow the reasoning of Manuel in comments:

Popper. We would learn more from falsifying the hypothesis that Popper’s talk is boring than what we would learn from falsifying the hypothesis that Richard Pryor’s talk is uninteresting.

And today we have the consensus choice for greatest writer vs. the notorious political philosopher. Marx is unseeded in the Founders of Religions category but he’s had lots of influence on the world. Both these guys are pretty quotable. So who’s it gonna be, the actor or the radical?

P.S. No Groucho jokes, please. And no need for reminders that lots of bad things were done in the name of Marxism. We’re choosing a seminar speaker here, that’s all. We’re not endorsing a philosophy.

P.P.S. As always, here’s the background, and here are the rules.

“The harm done by tests of significance” (article from 1994 in the journal, “Accident Analysis and Prevention”)

Ezra Hauer writes:

In your January 2013 Commentary (Epidemiology) you say that “…misunderstanding persists even in high-stakes settings.” Attached is an older paper illustrating some such.

“It is like trying to sink a battleship by firing lead shot at it for a long time”—well put!

Richard Pryor (1) vs. Karl Popper

clip

The top-seeded comedian vs. an unseeded philosopher. Pryor would be much more entertaining, that’s for sure (“Arizona State Penitentiary population: 80 percent black people. But there are no black people in Arizona!”). But Karl Popper laid out the philosophy that is the foundation for modern science. His talk, even if it is dry, might ultimately be more interesting.

What do you think?

P.S. As always, here’s the background, and here are the rules.

Psych journal bans significance tests; stat blogger inundated with emails

OK, it’s been a busy email day.

From Brandon Nakawaki:

I know your blog is perpetually backlogged by a few months, but I thought I’d forward this to you in case it hadn’t hit your inbox yet. A journal called Basic and Applied Social Psychology is banning null hypothesis significance testing in favor of descriptive statistics. They also express some skepticism of Bayesian approaches, but are not taking any action for or against it at this time (though the editor appears opposed to the use of noninformative priors).

From Joseph Bulbulia:

I wonder what you think about the BASP’s decision to ban “all vestiges of NHSTP (P-values, t-values, F-values, statements about “significant” differences or lack thereof and so on)”?

As a corrective to the current state of affairs in psychology, I’m all for bold moves. And the emphasis on descriptive statistics seems reasonable enough — even if more emphasis could have placed on visualising the data, more warnings could have been issued around the perils of un-modelled data, and more value could have been placed on obtaining quality data (as well as quantity).

My major concern, though, centres on the author’s timidness about Bayesian data analysis. Sure, not every Bayesian analysis deserves to count as a contribution, but nor is it the case that Bayesian methods should be displaced while descriptive methods are given centre stage. We learn by subjecting our beliefs to evidence. Bayesian modelling merely systematises this basic principle, so that adjustments to belief/doubt are explicit.

From Alex Volfovsky:

I just saw this editorial from Basic and Applied Social Psychology: http://www.tandfonline.com/doi/pdf/10.1080/01973533.2015.1012991

Seems to be a somewhat harsh take on the question though gets at the frequently arbitrary choice of “p<.05" being important...

From Jeremy Fox:

Psychology journal bans inferential statistics: As best I can tell, they seem to have decided that all statistical inferences from sample to population are inappropriate.

From Michael Grosskopf:

I thought you might find this interesting if you hadn’t seen it yet. I imagine it is mostly the case of a small journal trying to make a name for itself (I know nothing of the journal offhand), but still is interesting.

http://www.tandfonline.com/doi/pdf/10.1080/01973533.2015.1012991

From the Reddit comments on a thread that led me to the article:
“They don’t want frequentist approaches because you don’t get a posterior, and they don’t want Bayesian approaches because you don’t actually know the prior.”

http://www.reddit.com/r/statistics/comments/2wy414/social_psychology_journal_bans_null_hypothesis/

From John Transue:

Null Hypothesis Testing BANNED from Psychology Journal: This will be interesting.

From Dominik Papies:

I assume that you are aware of this news, but just in case you haven’t heard, one journal from psychology issued a ban on NHST (see editorial, attached). While I think that this is a bold move that may shake things up nicely, I feel that they may be overshooting, as not the technique per se, but rather its use seems the real problem to me. The editors also state they will put more emphasis on sample size and effect size, which sounds like good news.

From Zach Weller:

One of my fellow graduate students pointed me to this article (posted below) in the Basic and Applied Social Psychology (BASP) journal. The article announces that hypothesis testing is now banned from BASP because the procedure is “invalid”. Unfortunately, this has caused my colleague’s students to lose motivation for learning statistics. . . .

From Amy Cohen:

From the Basic and Applied Social Psychology editorial this month:

The Basic and Applied Social Psychology (BASP) 2014 Editorial emphasized that the null hypothesis significance testing procedure (NHSTP) is invalid, and thus authors would be not required to perform it (Trafimow, 2014). However, to allow authors a grace period, the Editorial stopped short of actually banning the NHSTP. The purpose of the present Editorial is to announce that the grace period is over. From now on, BASP is banning the NHSTP. With the banning of the NHSTP from BASP, what are the implications for authors?

From Daljit Dhadwal:

You may already have seen this, but I thought you could blog about this: the journal “Basic and Applied Social Psychology” is banning most types of inferential statistics (p-values, confidence intervals, etc.).

Here’s the link to the editorial:

http://www.tandfonline.com/doi/full/10.1080/01973533.2015.1012991

John Kruschke blogged about it as well:

http://doingbayesiandataanalysis.blogspot.ca/2015/02/journal-bans-null-hypothesis.html

The comments on Kruschke’s blog are interesting too.

OK, ok, I’ll take a look. The editorial article in question is by David Trafimow and Michael Marks. Krushke points out this quote from the piece:

The usual problem with Bayesian procedures is that they depend on some sort of Laplacian assumption to generate numbers where none exist. The Laplacian assumption is that when in a state of ignorance, the research should assign an equal probability to each possibility.

Huh? This seems a bit odd to me, given that I just about always work on continuous problems, so that the “possibilities” can’t be counted and it is meaningless to talk about assigning probabilities to each of them. And the bit about “generating numbers where none exist” seems to reflect a misunderstanding of the distinction between a distribution (which reflects uncertainty) and data (which are specific). You don’t want to deterministically impute numbers where the data don’t exist, but it’s ok to assign a distribution to reflect your uncertainty about such numbers. It’s what we always do when we do forecasting; the only thing special about Bayesian analysis is that it applies the principles of forecasting to all unknowns in a problem.

I was amused to see that, when they were looking for an example where Bayesian inference is OK, they used a book by R. A. Fisher!

Trafimow and Marks conclude:

Some might view the NHSTP [null hypothesis significance testing procedure] ban as indicating that it will be easier to publish in BASP [Basic and Applied Social Psychology], or that less rigorous manuscripts will be acceptable. This is not so. On the contrary, we believe that the p < .05 bar is too easy to pass and sometimes serves as an excuse for lower quality research. We hope and anticipate that banning the NHSTP will have the effect of increasing the quality of submitted manuscripts by liberating authors from the stultified structure of NHSTP thinking thereby eliminating an important obstacle to creative thinking.

I’m with them on that. Actually, I think standard errors, p-values, and confidence intervals can be very helpful in research when considered as convenient parts of a data analysis (see chapter 2 of ARM for some examples). Standard errors etc. are helpful in giving a lower bound on uncertainty. The problem comes when they’re considered as the culmination of the analysis, as if “p less than .05″ represents some kind of proof of something. I do like the idea of requiring that research claims stand on their own without requiring the (often spurious) support of p-values.

Abraham (4) vs. Jane Austen

Yesterday’s is a super-tough call. I’d much rather hear Stewart Lee than Aristotle. I read one of Lee’s books, and he’s a fascinating explicator of performance. Lee gives off a charming David Owen vibe—Phil, you know what I’m saying here—he’s an everyman, nothing special, he’s just been thinking really hard lately and wants to share his insights with all of us.

Aristotle, though, I could care less.

But the commenters mostly favored Aristotle, basically on the grounds that he invented science. And, as Keith put it, “being scientific is absolutely no defense again being wrong, but rather just an acceleration of the process of getting less wrong.” And Aristotle is probably a good seminar speaker—seminars are what they did all day back then, right?

Ultimately I’ll have to go with Patrick:

Stewart Lee.

I can’t see Aristotle presenting a seminar on his biggest philosophical mistakes.

But I can see Lee spending a seminar on his least funny jokes, and getting a few laughs at the same time.

And today’s match is a forfeit. Abraham (listed as #4 in the Founders of Religions category) does not belong in this contest. The other 63 people in the bracket are real people, Abraham is the only fictional character here, he just doesn’t belong. So Jane will advance, uncontested, to the next round.

author

P.S. As always, here’s the background, and here are the rules.

The axes are labeled but I don’t know what the dots represent.

8a7a4e51-c3ca-46ac-9096-fde254e4fdd8-original

John Sukup writes:

I came across a chart recently posted by Boston Consulting Group on LinkedIn and wondered what your take on it was. To me, it seems to fall into the “suspicious” category but thought you may have a different opinion.

I replied that this one baffles me cos I don’t know what the dots represent! This is typical in graphs, the axes are labeled but I don’t know what it is that is being labeled.

Sukup replied:

Indeed. The axes are also labeled strangely and the years used to calculate the CAGR are also not the same. I’d have expected more from BCG—but this type of data visualization error seems pervasive these days!