Skip to content

Round 3 begins! Mark Twain (4) vs. Mary Baker Eddy

After yesterday‘s John Waters victory, we’re now in the Round of 16:


Thanks again to Paul Davidson for providing the bracket. Remaining are 4 authors, 3 comedians, 3 cult figures, 1 founder of religion, 2 French intellectuals, 2 philosophers, 1 religious leader, and 0 artists.

Today’s lucha is a classic grudge match—Twain and Eddy have a feud going way back. So, let the mud be slung!

P.S. As always, here’s the background, and here are the rules.

Paul Meehl continues to be the boss

Lee Sechrest writes:

Here is a remarkable paper, not well known, by Paul Meehl. My research group is about to undertake a fresh discussion of it, which we do about every five or ten years. The paper is now more than a quarter of a century old but it is, I think, dramatically pertinent to the “soft psychology” problems of today. If you have not read it, I think you will find it enlightening, and if you have read it, your blog readers might want to be referred to it at some time.

The paper is in a somewhat obscure journal with not much of a reputation as “peer reviewed.” (The journal’s practices should remind us that peer review is not a binary (yes-no) process. I reviewed a few paper for them, including two or three of Meehl’s. I asked Paul once why he published in such a journal. He replied that he was late in his career, and he did not have the time nor patience to deal with picky reviewers who were often poorly informed. He called my attention to the works of several other well-known, even eminent, psychologists who felt the same way and who published in the journal. So the obscurity of the publication should not deter us. The paper has been cited a few hundred times, but, alas, it has had little impact.

I agree. Whenever I read Meehl, I’m reminded of that famous passage from The Catcher in the Rye:

What really knocks me out is a book that, when you’re all done reading it, you wish the author that wrote it was a terrific friend of yours and you could call him up on the phone whenever you felt like it. That doesn’t happen much, though.

Meehl’s article is from 1985 and it begins:

Null hypothesis testing of correlational predictions from weak substantive theories in soft psychology is subject to the influence of ten obfuscating factors whose effects are usually (1) sizeable, (2) opposed, (3) variable, and (4) unknown. The net epistemic effect of these ten obfuscating influences is that the usual research literature review is well nigh uninterpretable. Major changes in graduate education, conduct of research, and editorial policy are proposed.

Meehl writes a lot about things that we’ve been rediscovering, and talking a lot about, recently. Including, for example, the distinction between scientific hypotheses and statistical hypotheses. I think that, as a good Popperian, Meehl would agree with me completely that null hypothesis significance testing wears the cloak of falsificationism without actually being falsificationist.

And it makes me wonder how it is that we (statistically-minded social scientists, or social-science-minded statisticians) have been ignoring these ideas for so many years.

Even if you were to step back only ten years, for example, you’d find me being a much more credulous consumer of quantitative research claims than I am now. I used to start with some basic level of belief and then have to struggle to find skeptical arguments. For me, I guess it started with the Kanazawa papers, but then I started to see a general pattern. But it’s taken awhile. Even as late as 2011, when that Bem paper came out, I at first subscribed to the general view that his ESP work was solid science and he just had the bad luck to be working in a field where the true effects were small. A couple years later, under the influence of E. J. Wagenmakers and others, it was in retrospect obvious that Bem’s paper was full of serious, serious problems, all in plain view for anyone to see.

And who of a certain age can forget that Statistical Science in 1994 published a paper purporting to demonstrate statistical evidence in favor of the so-called Bible Code? It took a couple of years for the message to get out, based on the careful efforts of computer scientist Brendan McKay and others, that the published analysis was wrong. In retrospect, though, it was a joke—if I (or, for that matter, a resurrection of Paul Meehl) were to see an analysis today that was comparable to that Bible Code paper, I think I’d see right away how ridiculous it is, just as I could right away see through the ovulation-and-voting paper and all the other “power = .06″ studies we’ve been discussing here recently.

So here’s the puzzle. It’s been obvious to me for the past three or so years, obvious to E. J. Wagenmakers and Uri Simonsohn for a bit longer than that—but there was Paul Meehl, well-respected then and still well-remembered now, saying all this thirty and forty years ago, yet we forgot. (“We” = not just me, not just Daniel Kahneman and various editors of Psychological Science, but quantitative social scientists more generally.)

It’s not that quants haven’t been critical. We’ve been talking forever about correlation != causation, and selection bias, and specification searches. But these all seemed like little problems, things to warn people about. And, sure, there’s been a steady drumbeat (as the journalists say) of criticism of null hypothesis significance testing. But, but . . . the idea that the garden of forking paths and the statistical significance filter are central to the interpretation of statistical studies, that’s new to us (though not to Meehl).

I really don’t know what to say about our forgetfulness. I wish I could ask Meehl his opinion of what happened.

Maybe one reason we can feel more comfortable criticizing the classical approach is that now we have a serious alternative—falsificationist Bayes. As they say in politics, you can’t beat something with nothing. And now that we have a something (albeit in different flavors; E.J.’s falsificationist Bayes is not quite the same as mine), this might help us move foward.

On deck this week

Mon: Paul Meehl continues to be the boss

Tues: Adiabatic as I wanna be: Or, how is a chess rating like classical economics?

Wed: Define first, prove later

Thurs: Another disgraced primatologist . . . this time featuring “sympathetic dentists”

Fri: Imagining p<.05 triggers increased publication

Sat: The publication of one of my pet ideas: Simulation-efficient shortest probability intervals

Sun: “Bayesian and Frequentist Regression Methods”

Also the continuation of the “Greatest Seminar Speaker” competition each day.

Judy Garland (4) vs. John Waters (1); Carlin advances

Not a lot of action on yesterday‘s post, so I don’t think the winner will advance any farther . . . But, in any case, I’ll call it for Carlin based on Jonathan’s amusing babble of postmodernist commentary.

Screen Shot 2015-03-14 at 8.36.07 PM

As for today: What can you say? A great pairing to close out the second round of our competition. I’m guessing Waters would be better speaker—who knows what condition Garland would be in?

But then there’s this, from an interview with Waters:

I remember the funniest thing, seeing Judy Garland walking down the street with ten thousand gay people following her like the Pied Piper. She went into the little A-House. She was dead drunk, in bad shape, having fun, wearing a big hat. It was like the Virgin Mary appearing, a Miracle. Imagine: Judy Garland LIVE ON COMMERCIAL STREET!

Imagine: Judy Garland LIVE ON 116 STREET!

P.S. As always, here’s the background, and here are the rules.

Why I don’t use the terms “fixed” and “random” (again)

A couple months ago we discussed this question from Sean de Hoon:

In many cross-national comparative studies, mixed effects models are being used in which a number of slopes are fixed and the slopes of one or two variables of interested are allowed to vary across countries. The aim is often then to explain the varying slopes by referring to some country-level characteristic.

My question is whether it is possible that the estimation of these random slopes (the interesting ones) is affected by the fact that the slopes of other (uninteresting) variables are fixed (even though they may actually vary over countries) and if so, how it may be affected? This question is inspired by many studies examining men’s wages which, for example, include a control for level of education that does not have a random slope, while I doubt whether education will have the same effect across countries.

Do you think the decision not to include many random slopes is predominantly methodologically informed? And do you think Bayesian analyses can provide a better solution for the kind of situations where many slopes should be allowed to vary?

My response is at the link. But here let me use the above email to point out the difficulty of referring to slopes or “effects” as “random” or “fixed.”

What is meant by “random”? That’s easy: a slope or effect is called “random” if (a) it varies by group, and (b) this variation is estimated using a probability model. Thus, “randomness” is a property both of the data model and of how this variation is estimated.

What is meant by “fixed”? This is not so clear: in the context above, a slope or effect is called “fixed” if it does not vary by group. But, in other contexts (notably in econometrics), “fixed effects” refers to a model where coefficients vary by group but where this variation is not estimated using a probability model.

In multilevel modeling terms, in de Hoon’s email, “fixed effects” are equal to each other and estimated using complete pooling, whereas in econometric terminology, “fixed effects” vary by group and are estimated using no pooling. Completely different models.

Which is one reason (but not the only reason) I prefer to avoid the terms “random” and “fixed” in describing models. Instead I like to separate the modeling decision (modeling a coefficient as varying or non-varying) and the inference decision (in Bayesian terms, what prior are we using; in multilevel modeling terms, how much pooling are we doing).

George Carlin (2) vs. Jacques Derrida; Updike advances

Yesterday‘s best comment comes from Zbicyclist, who wrote:

My wife would prefer I not go to a talk by someone who wrote so extensively about adultery.

But of course that would rule out both John Updike and Bertrand Russell. We could use “number of wives” as a tiebreaker, but instead I’ll go with Updike based on Brian’s comment:

If updike brought even a tenth of the genius to writing his lecture as he brought to the line ‘his hands were like wild birds’ we’d be in for a real treat.

Grease-gray and kind of coiled.

Also I think the whole Russell’s-paradox thing is getting old, and that keeping Updike in the game will make next round’s battle against Nietzsche more fun. And, one more thing: Updike’s got the basketball connection. Can you imagine Bertrand Russell playing hoops? I don’t think so.

Hey, how did Derrida even make it into the second round?? There must’ve been some mistake.

P.S. As always, here’s the background, and here are the rules.

“How the Internet Scooped Science (and What Science Still Has to Offer)”

Brian Silver pointed me to this post from Andrew Lindner:

This week, my manuscript, co-authored by Melissa Lindquist and Julie Arnold, “Million Dollar Maybe? The Effect of Female Presence in Movies on Box Office Returns” was published online by Sociological Inquiry. It will appear in print later this year.

So far, no surprises. A researcher promotes his work online. I do this all the time, myself.

The topic of the paper in question is the Bechdel test for women in movies, which has come up on this blog a couple times.

When we last discussed the Bechdel test, several years ago, we got this comment from Paul:

The Bechdel test isn’t a tool for evaluating individual movies. In my experience film quality is almost an orthogonal dimension. But it’s informative in aggregate.

Essentially, [the] point about giving the audience what they want is spot on. When the Bechdel test is consistently failing, that means people are being drawn to movies with poor characterization of female characters, and that raises the question of “why?”. This is also a push/pull phenomena: since we have finite choices in movies, and directors have imperfect knowledge of the public’s desires, it’s almost certain that to some of this can be explained by the consumer, but some of it is also being pushed by the producers.

Lindner was thinking along similar lines:

I wondered whether the underrepresentation of women in film was due to audiences disliking movies featuring women or Hollywood under-funding Bechdel movies. I cooked up the idea to link a content analysis of whether movies pass the Bechdel Test with data on the movies’ box office performance, production budget, and critical appraisal. That fall and winter, two wonderful students, Melissa Lindquist and Julie Arnold, and I collected the data. In short, we found that Bechdel movies earn less at the box office, but it’s because [sic] they have smaller production budgets, not because [sic] audiences reject them. A simple study, but, I think, an interesting one.

We’ll just let those “becauses” stand for now, as causal inference is not today’s topic of conversation.

Anyway, Lindner gives the submission history of his article:


I assume that “Top Two Journal” is the American Journal of Sociology? I say this because ASR and AJS are considered the top two journals in sociology, but I think the American Sociological Review is the top journal, so if he’d submitted it there, he would’ve just said “Top Journal.” Actually, I don’t know why he didn’t just give the names of the journals—it’s not like there’s any requirement to keep it secret—but I guess those details don’t really matter.

As Lindner writes, “there is nothing abnormal about this story.” Indeed, a couple years ago I published my own story of when the American Sociological Review published an article with serious statistical flaws and then followed up by refusing to publish my letter pointing out those flaws.

As Lindner might say, nothing sinister here, just standard operating procedure.

It’s a struggle to get a paper into a journal—but when it is accepted, it’s inside the castle walls and is protected by a mighty army.

John Updike vs. Bertrand Russell; Nietzsche advances

In yesterday‘s bout, another founder of religion falls, thanks to this comment by Zbicyclist:

Do we want an audience full of would-be Ubermensches, or an audience of the proletariat?

Considering Columbia is an Ivy League school, I guess we have to go with the Ubermensches.

And today’s contest features the eminently sane conservative vs. the madman who went to jail to protest a war.

Updike was the most logical, reasonable man around, but his favorite topics were infidelity and religion, two loci of irrationality. Russell was a master of logic and reason, yet personally he was anything but reasonable.

So this is an excellent, excellent matchup.

P.S. As always, here’s the background, and here are the rules.

Bayesian models, causal inference, and time-varying exposures

Mollie Wood writes:

I am a doctoral student in clinical and population health research. My dissertation research is on prenatal medication exposure and neurodevelopmental outcomes in children, and I’ve encountered a difficult problem that I hope you might be able to advise me on.

I am working on a problem in which my main exposure variable, triptan use, can change over time— e.g., a women may take a triptan during first trimester, not take one during second trimester, and then take one again during third trimester, or multiple permutations thereof. I am particularly concerned about time-varying confounding of this exposure, as there are multiple other medications (such as acetaminophen or opioids) whose use also changes over time, and so are both confounders and mediators.

I’m fairly familiar with the causal inference literature, and have initially approached this using marginal structural models and stabilized inverse probability of treatment weights (based mainly on Robins’ and Hernan’s work). I am interested in extending this approach using a Bayesian model, especially because I would like to be able to model uncertainty in the exposure variable. However, I have had little luck finding examples of such an approach in the literature. I’ve encountered McCandless et al’s work on Bayesian propensity scores, in which the PS is modeled as a latent variable, but as of yet have not encountered an example that considered time-varying treatment and confounding. In principle, I don’t see any reason why an MSM/weighting approach would be inadvisable… but then, I’m a graduate student, and I hear we do unwise things all the time.

My reply:

My short answer is that, while I recognize the importance of the causal issues, I’d probably model things in a more mechanistic way, not worrying so much about causality but just modeling the output as a function of the exposures, basically treating it as a big regression model. If there is selection (for example, someone not taking the drug because of negative side effects that are correlated with the outcome of interest), this can bias your estimates, but my guess would be that a straightforward model of all the data (not worrying about propensity scores, weighting, etc) might work just fine. That is, if the underlying phenomenon can be described well by some sort of linear model and there’s not big selection in the nonresponse, you can just model the data directly and just interpret the parameter estimates as is.

To which Wood continued:

I’m mainly hesitant to trust the results of a multivariable-adjusted model because there’s some evidence from the bit of my dissertation I’m working on now that there is some amount of selection happening. Previously, I’ve fit a marginal structural model and compared the results with the MV-adjusted model, and the parameter estimates change by 10-20%, depending on which outcome measure I’m looking at. (I’m interpreting it as selection; I realize one can’t directly compare the results of the MSM and regression model.)

I see what she’s saying, and from this perspective, yes it makes sense to include selection in the model. In theory, my preference would be to model the selection directly by adding to the model a latent variable defined so that exposure could be taken as ignorable given available information plus this latent variable. That said, I’ve never actually fit such a model, it just seems to me to be the cleanest approach. For the usual Bayesian reasons, I’d generally not be inclined to use weights based on the probability of treatment (or exposure), but, again, I can see how such methods could be useful in practice if applied judiciously.

P.S. On a related topic, Adan Becerra writes:

Do you or readers of the blog know of anyone currently working on Bayesian or penalized likelihood estimation techniques for marginal structural models? I know MSMs are not a Bayesian issue but I don’t see why you couldn’t estimate the inverse probability weights within a Bayesian framework.

Regular readers will know that I don’t think inverse probability weights make much sense in general (see the “struggles” paper for more on the topic), but maybe one of you in the audience can offer some help on this one.

Karl Marx vs. Friedrich Nietzsche (4); Austen advances

For yesterday, I was strongly rooting for Popper. I read several of his books about thirty years ago and they had a huge effect on me (and on a lot of social scientists, I think). But the best comment was about Austen. Here’s Dalton with the comment:

“A woman, especially if she has the misfortune of knowing anything, should conceal it as well as she can.” – Austen in Northanger Abbey

Sounds to me like somebody would NOT be presenting data.

Jane for the win. The topic: selection bias.

And now on to today’s March Madness battle. It’s funny how the random assignments sometimes create some apt pairings, as with this matchup between two angry 19th-century Germans.

If only we had George Orwell to judge this one.

Hey, this suggests another category for the next contest: My Heroes. It could include George Orwell, Stanislaw Ulam, A. J. Liebling, Imre Lakatos, Pierre-Simon Laplace, Orson Welles, Ed Wegman, ummmm, I guess you could throw in Abraham Lincoln, but that seem a bit silly, since he’s everybody’s hero . . . in any case, this isn’t so good, my heroes are all white men! I guess that tells you something about me, huh?

P.S. As always, here’s the background, and here are the rules.