Skip to content

“How the Internet Scooped Science (and What Science Still Has to Offer)”

Brian Silver pointed me to this post from Andrew Lindner:

This week, my manuscript, co-authored by Melissa Lindquist and Julie Arnold, “Million Dollar Maybe? The Effect of Female Presence in Movies on Box Office Returns” was published online by Sociological Inquiry. It will appear in print later this year.

So far, no surprises. A researcher promotes his work online. I do this all the time, myself.

The topic of the paper in question is the Bechdel test for women in movies, which has come up on this blog a couple times.

When we last discussed the Bechdel test, several years ago, we got this comment from Paul:

The Bechdel test isn’t a tool for evaluating individual movies. In my experience film quality is almost an orthogonal dimension. But it’s informative in aggregate.

Essentially, [the] point about giving the audience what they want is spot on. When the Bechdel test is consistently failing, that means people are being drawn to movies with poor characterization of female characters, and that raises the question of “why?”. This is also a push/pull phenomena: since we have finite choices in movies, and directors have imperfect knowledge of the public’s desires, it’s almost certain that to some of this can be explained by the consumer, but some of it is also being pushed by the producers.

Lindner was thinking along similar lines:

I wondered whether the underrepresentation of women in film was due to audiences disliking movies featuring women or Hollywood under-funding Bechdel movies. I cooked up the idea to link a content analysis of whether movies pass the Bechdel Test with data on the movies’ box office performance, production budget, and critical appraisal. That fall and winter, two wonderful students, Melissa Lindquist and Julie Arnold, and I collected the data. In short, we found that Bechdel movies earn less at the box office, but it’s because [sic] they have smaller production budgets, not because [sic] audiences reject them. A simple study, but, I think, an interesting one.

We’ll just let those “becauses” stand for now, as causal inference is not today’s topic of conversation.

Anyway, Lindner gives the submission history of his article:


I assume that “Top Two Journal” is the American Journal of Sociology? I say this because ASR and AJS are considered the top two journals in sociology, but I think the American Sociological Review is the top journal, so if he’d submitted it there, he would’ve just said “Top Journal.” Actually, I don’t know why he didn’t just give the names of the journals—it’s not like there’s any requirement to keep it secret—but I guess those details don’t really matter.

As Lindner writes, “there is nothing abnormal about this story.” Indeed, a couple years ago I published my own story of when the American Sociological Review published an article with serious statistical flaws and then followed up by refusing to publish my letter pointing out those flaws.

As Lindner might say, nothing sinister here, just standard operating procedure.

It’s a struggle to get a paper into a journal—but when it is accepted, it’s inside the castle walls and is protected by a mighty army.

John Updike vs. Bertrand Russell; Nietzsche advances

In yesterday‘s bout, another founder of religion falls, thanks to this comment by Zbicyclist:

Do we want an audience full of would-be Ubermensches, or an audience of the proletariat?

Considering Columbia is an Ivy League school, I guess we have to go with the Ubermensches.

And today’s contest features the eminently sane conservative vs. the madman who went to jail to protest a war.

Updike was the most logical, reasonable man around, but his favorite topics were infidelity and religion, two loci of irrationality. Russell was a master of logic and reason, yet personally he was anything but reasonable.

So this is an excellent, excellent matchup.

P.S. As always, here’s the background, and here are the rules.

Bayesian models, causal inference, and time-varying exposures

Mollie Wood writes:

I am a doctoral student in clinical and population health research. My dissertation research is on prenatal medication exposure and neurodevelopmental outcomes in children, and I’ve encountered a difficult problem that I hope you might be able to advise me on.

I am working on a problem in which my main exposure variable, triptan use, can change over time— e.g., a women may take a triptan during first trimester, not take one during second trimester, and then take one again during third trimester, or multiple permutations thereof. I am particularly concerned about time-varying confounding of this exposure, as there are multiple other medications (such as acetaminophen or opioids) whose use also changes over time, and so are both confounders and mediators.

I’m fairly familiar with the causal inference literature, and have initially approached this using marginal structural models and stabilized inverse probability of treatment weights (based mainly on Robins’ and Hernan’s work). I am interested in extending this approach using a Bayesian model, especially because I would like to be able to model uncertainty in the exposure variable. However, I have had little luck finding examples of such an approach in the literature. I’ve encountered McCandless et al’s work on Bayesian propensity scores, in which the PS is modeled as a latent variable, but as of yet have not encountered an example that considered time-varying treatment and confounding. In principle, I don’t see any reason why an MSM/weighting approach would be inadvisable… but then, I’m a graduate student, and I hear we do unwise things all the time.

My reply:

My short answer is that, while I recognize the importance of the causal issues, I’d probably model things in a more mechanistic way, not worrying so much about causality but just modeling the output as a function of the exposures, basically treating it as a big regression model. If there is selection (for example, someone not taking the drug because of negative side effects that are correlated with the outcome of interest), this can bias your estimates, but my guess would be that a straightforward model of all the data (not worrying about propensity scores, weighting, etc) might work just fine. That is, if the underlying phenomenon can be described well by some sort of linear model and there’s not big selection in the nonresponse, you can just model the data directly and just interpret the parameter estimates as is.

To which Wood continued:

I’m mainly hesitant to trust the results of a multivariable-adjusted model because there’s some evidence from the bit of my dissertation I’m working on now that there is some amount of selection happening. Previously, I’ve fit a marginal structural model and compared the results with the MV-adjusted model, and the parameter estimates change by 10-20%, depending on which outcome measure I’m looking at. (I’m interpreting it as selection; I realize one can’t directly compare the results of the MSM and regression model.)

I see what she’s saying, and from this perspective, yes it makes sense to include selection in the model. In theory, my preference would be to model the selection directly by adding to the model a latent variable defined so that exposure could be taken as ignorable given available information plus this latent variable. That said, I’ve never actually fit such a model, it just seems to me to be the cleanest approach. For the usual Bayesian reasons, I’d generally not be inclined to use weights based on the probability of treatment (or exposure), but, again, I can see how such methods could be useful in practice if applied judiciously.

P.S. On a related topic, Adan Becerra writes:

Do you or readers of the blog know of anyone currently working on Bayesian or penalized likelihood estimation techniques for marginal structural models? I know MSMs are not a Bayesian issue but I don’t see why you couldn’t estimate the inverse probability weights within a Bayesian framework.

Regular readers will know that I don’t think inverse probability weights make much sense in general (see the “struggles” paper for more on the topic), but maybe one of you in the audience can offer some help on this one.

Karl Marx vs. Friedrich Nietzsche (4); Austen advances

For yesterday, I was strongly rooting for Popper. I read several of his books about thirty years ago and they had a huge effect on me (and on a lot of social scientists, I think). But the best comment was about Austen. Here’s Dalton with the comment:

“A woman, especially if she has the misfortune of knowing anything, should conceal it as well as she can.” – Austen in Northanger Abbey

Sounds to me like somebody would NOT be presenting data.

Jane for the win. The topic: selection bias.

And now on to today’s March Madness battle. It’s funny how the random assignments sometimes create some apt pairings, as with this matchup between two angry 19th-century Germans.

If only we had George Orwell to judge this one.

Hey, this suggests another category for the next contest: My Heroes. It could include George Orwell, Stanislaw Ulam, A. J. Liebling, Imre Lakatos, Pierre-Simon Laplace, Orson Welles, Ed Wegman, ummmm, I guess you could throw in Abraham Lincoln, but that seem a bit silly, since he’s everybody’s hero . . . in any case, this isn’t so good, my heroes are all white men! I guess that tells you something about me, huh?

P.S. As always, here’s the background, and here are the rules.

March Madness!

Ummm . . . this one’s gonna really irritate all the subscription-cancelers . . .

Paul Davidson updated the brackets (as of a couple days ago):

Bracket v1

And here’s a version showing the survivors among each of the eight categories. The artists are all gone, and only one religious leader is left, but the other categories are still going strong.

Bracket v2

Say what you want about Bruno Latour, he inspired an excellent time-waster!

What do CERN, the ISS, and Stephen Fry have in Common?


You’ll have to read the New Yorker article on Richard M. Stallman and the The GNU Manifesto by Maria Bustillos to find out!

And what’s up with Tim O’Reilly’s comments about the Old Testment vs. New Testament?   That’s an ad hominem attack of the highest order, guaranteed to get the Judeo-Christians even more riled up than computer scientists debating GPL vs. BSD. On the plus side, it did remind me of Dana Fradon’s side-splitting New Yorker cartoon about the God of the Old Testament.

This is all strong evidence that Andrew missed an opportunity by not putting Stallman in the “founders of religions” bracket along with Freud.

I knew we’d hit critical mass with Stan when rms (I hear that’s waht the cool kids call him) wrote to us about the Stan license. I pretty much steamrolled the BSD license to maximize our user base. Allen Riddell, on the other hand, decided to copyleft PyStan. Reasonable people can disagree. Of course, R is GPL-ed, so the combination of RStan and R has to be copylefted, too.

So let’s take a little poll, in the spirt of recent posts by Andrew and all this focus on seminar speakers.

1. Which is tastier, free beer or free speech?

  • [ ] Beer
  • [ ] Speech
  • [ ] It’s political season and I need a drink with my speech.
  • [ ] Beer and speeches are both too bitter.

Jane Austen vs. Karl Popper; Lee advances

For yesterday’s contest I’ll have to go with this comment by Nuthin:

This series of posts is so tedious that I’m considering removing this blog from my RSS feed altogether.

Stewart Lee is a master of hecklers. In a lot of his work he pretty much invites people to heckle, he antagonizes his audience, etc. So, just in case someone like this commenter shows up to the seminar, it would be good to have Lee on hand to shoot him town, or maybe to acknowledge the truth of the heckle, or maybe to discuss it in his next show. Either way it’s a win.

Today we have the ultimate book-club author of ur-chick-lit, up against the dominant figure in modern philosophy of science.

It is a truth universally acknowledged, that a scientist in possession of a good discovery must be in want of a philosophy.

However little known the feelings or views of such a scientist may be on his first entering a laboratory, this truth is so well fixed in the minds of the surrounding researchers, that he is considered as the rightful property of some one or other of their theories.

P.S. As always, here’s the background, and here are the rules.

New time unit needed!

We need a time unit that’s bigger than a minute but smaller than an hour.

I thought of it when writing this comment in which I referred to “2100 valuable minutes of classroom time” during the semester (that’s 75 minutes per class, twice a week, for 14 weeks).

A minute of class time is pretty useless. You can’t do much in a minute. Or, you can do a bit in a minute, but not much, and it will take another minute to organize it and another to recover. So, for the purposes of teaching, or working, a minute is not a good unit of time.

But an hour is too long. I could refer to “35 valuable hours of classroom time” but that would be misleading, as you can do lots of stuff in an hour. Time really drags if nothing much is going on.

So . . . a minute is too short, an hour is too long. We need a new unit. 5 minutes, maybe? Then a 75-minute class is 15 intervals long. That’s about right. I can do 15 little things in 75 minutes. An interval of 5 minutes is pretty much the smallest indivisible unit of time.

Or maybe the interval should be 10 minutes? Then we’d be talking a lot about half-intervals but maybe that’s ok.

The name “interval” is no good, though. What should we call it?

So here we go

OK, we have 2 decisions to make:

First, what should be the length of our new time unit? I’m liking 5 minutes but I can see the argument for 10.

Second, what should it be called?

Once we’re settled on both these things, we’re good to go.

Sigmund Freud vs. Stewart Lee; Dick advances

Yesterday‘s thread was won by Slugger, who wrote:

I accidentally swallowed a stelazine capsule and have seen that Grandma Moses is in fact a reptilian lifeform without the ability to vocalize. My vote goes to PKD.

Where did that light switch come from, anyway? I could’ve sworn it was a pull cord. . . .

In today’s match, neither of the two contenders (Freud in the “founders of religion” category and Lee in the “comedians” category) is seeded, but both are formidable contenders. I think either could give a wonderful speech and also deal well with the inevitable hecklers. Freud’s got the fame; on the other hand, Lee’s theories are probably more falsifiable.

P.S. As always, here’s the background, and here are the rules.

The 1980 Math Olympiad Program: Where are they now?

Brian Hunt: He was the #1 math team kid in our team (Montgomery County, Maryland). I think he came in first place in the international olympiad the next year (yup, here’s the announcement). We carpooled once or twice to county math team practices, and I remember that his mom would floor it rather than slow down when she came to a yellow light. I looked Brian up, and he is now a math professor at the University of Maryland. On Google scholar, his most cited paper is “Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter.”

Benji Fisher: He was quite a character, larger than life in some ways. He came in first place in the regional math competition (this required getting 8 problems out of 8 correct, which was difficult; maybe I got 4 correct that year, or maybe only 2?). I remember him (perhaps incorrectly) as a big guy with long hair in a ponytail. When I came to Columbia in 1996, I noticed that he was teaching in the math department. I gave him a call (“Hi, Benji, you probably don’t remember me, my name is Andy Gelman . . .”) and suggested getting together. He told me he was leaving Columbia to teach at the Bronx high school of science. We never did get together, nor did we speak again. I googled just now and here he is, mentioned in the NYT in 1981 (it says he could do the Rubik’s cube in 2 1/2 minutes, which was really the least of his talents at the time), then I found a linked-in page that says that he only taught high school for one year, and now he’s a web developer in Boston.

Jack Brennen: He and Noam were the two youngest kids in the Olympiad program. I and some others were 15, a bunch of the other kids were 16 or 17, Jack and Noam were only 14. Jack felt a bit of rivalry with Noam which was unfortunate because Noam was obviously the best of all of us. I did some googling and it appears that Jack is now a full-time software engineer, or at least he was as of 2011.

Andrew Gelman: I’ve written about my olympiad experiences before (see also here). I was probably about the 20th best out of 24. Had I practiced, I think I could’ve been 10th or 15th (then again, if everyone had practiced, maybe I would’ve been 23rd or 24th), but the important thing was that I realized there were other people better than me at this. I feel very lucky that I came to this realization and didn’t hit a dead end later on. At the time I was disappointed not to make it to the international olympiad but in retrospect it all worked out just right.

Gregg Patruno: A super-nice guy. That’s all I remember about Gregg: he was one of the top kids in the program and he was super-nice, very friendly. I was pretty shy at the time, I was younger than most of the other kids and mostly tagged along with the 3 others from Montgomery County. So I appreciated when some of the older kids were friendly. I googled and it appears that Gregg is a musician. More googling yields this page: it appears that Gregg is “a Vice President in the Fixed Income Division of Goldman Sachs.” That’s too bad.

Also I remember there were 2 or 3 participants in the training program who were from the Boston area. 2 guys and a girl, I think. Or maybe 1 guy and 2 girls. All 3 were, like me, near the bottom of the pack. Anyway, I remember that one of those Bostonians was really funny. Near the end of the four weeks, this guy was really stuck on one of the homework problems and he asked Gregg for help. He wrote up Gregg’s solution and then, at the end, put a long footnote with a citation to “Patruno, G., Solution to Problem 11 of Mathematical Olympiad Program,” etc. This joker (unfortunately, I can’t remember his name) was also amused by Gregg’s near-palindromic name and suggested that he change it to Grerg.

David Yuen: I don’t remember him well but I do remember his name, so that’s something. Here he is—he’s a professor of math and computer science. Most-cited paper appears to be “Linear dependence among Siegel modular forms.”

David Wollen: I remember him as a very comfortable guy, with lots of friends, I was envious of how at-ease he was. In the middle of the pack in terms of success on math olympiad problems. I can’t find him on the web but I do seem to be finding a David Wolland so maybe I’m misremembering the name. I remember that he went to Hunter College High School. In any case, Wollen or Wolland, I have no idea what he’s up to now.

Ken Zeger: He was the kid who was interested in engineering. All the rest of us cared only for pure math, he wanted to solve engineering problems. And here he is, he’s a professor of Electrical and Computer Engineering at the University of California. How cool is that? His most-cited paper appears to be “Closest point search in lattices.”

Dougin Walker: My roommate. Also from Montgomery County. Like me, a silly kid, also another guy who was trying his best but was no star. His distinctive name makes him easy to google! He appears to be a triathlete, and it looks like he’s a “principal” of the Watermark Group—a hedge fund?

Stephen Mark: A really mellow guy. The rest of us really cared where we ranked. Brian was a top scorer and that was important to him, Benji was a big shot and that came from his success solving math problems, Noam—well, Noam was not competitive with the rest of us, exactly, but he know how good he was—and I cared too. It’s all well for me to say, now, that I was lucky not to be among the best in the group. But, at the time, I did want to be the best. Not that I had any plan of how to get there, I just wanted to be #1. Or at least #2. Or #5, whatever. But Stephen (no, I can’t remember if he went by Steve, although I can only assume he did) just didn’t seem to care in that way, he was more aloof (in a good sense). The other thing I remember is that at one of the tournaments he wrote the anagram Peter Shmank on his nametag. And he was also from Montgomery County. And . . . that’s about it. I can’t find Stephen Mark on Google.

Hmmm, let me be clear on one thing. I’ve written how my mediocre performance on the olympiad training program pushed me to a new view of where I was heading in life, how I realized that (a) there were others who were better at math than I was, which led me gradually to the realization (b) that there were other things I could do. But this didn’t happen right away—even step (a) took awhile. I was in that program as a 10th grader and so I figured, sure, I’ll just get better each year. Every time I did a math competition I fully expected to get a perfect score and come in first place. The next year, I was pretty bummed when, after taking the national math test (from which the highest scorers got picked to do the olympiad), I didn’t score so high. Maybe I was in the top 100 in the country, maybe not, I can’t remember, but not in the top 8 or even the top 24, that’s for sure. And then the same thing happened in 12th grade. That was ok, I still did math team and was on the county math team at the regional competition, I still looked forward to these things. Which is fine: school sports would fall apart if only the very best kids participated (and of course there were lots of kids on the team who weren’t at my level, either). It’s funny in retrospect, though, to think that I kept sort of expecting I’d do better. And, again, I’m glad I didn’t do better. If I had, maybe I’d’ve gotten a math Ph.D. and now I’d be working at a hedge fund. Ulp.

Leonid Fridman, Zachary Franco: I remember only their names, and I think they were nice enough, also in that middle-of-pack range in math-problem-solving ability. Some googling appears to show that Zachary is now a medical researcher and does volunteer math coaching. I’m not sure what Leonid is doing.

Jeremy Primer: He was, like Patruno, one of the top kids but not the very top. I ran into him from time to time in grad school at Harvard; he was studying math while I was in statistics. My impression was that he was not so thrilled with it. What’s he doing now? Hmmm . . . “Jeremy Primer” shouldn’t be hard to Google . . . uhhhh, “Head of Research & Chief Risk Officer” at Tilden Park, looks like another hedge fund. He also worked at Goldman Sachs. Uh oh. I wonder if he stays in touch with Gregg? According to one online source, Jeremy was a “Harvard-educated maths genius whose computer models alerted the bank to how small levels of defaults would quickly turn apparently sound assets into junk,” leading Goldman to start selling off at the end of 2006. OK, whatever.

Dan Scales: Am I misremembering this name? I can’t find anything on Google.

Noam Elkies: Obviously the most talented math kid in the group (and thus, by implication, in the country); in retrospect, the most successful mathematician as well, maybe the top mathematician of our generation (not counting Stephen Wolfram ha ha ha). At the time, I don’t recall that we saw him as so brilliant; pretty much we saw him as being really weird. But, what can you say, we were all pretty weird and he had a lot going on in his brain. I ran into him a couple times in grad school. He’s now a math professor at Harvard. I haven’t looked him up in a long time. I guess I could—I could just walk over to his office one day. Maybe I should, although I don’t really know that we’d have much to talk about. His most-cited paper: “Alternating-sign matrices and domino tilings (Part I).” Hey, that’s got a bit of a Mel Brooks feel to it! And here’s Noam’s paper with the most math-team-like title: “On A^4+B^4+C^4=D^4,” which begins, “We use elliptic curves to find infinitely many solutions to A^4+B^4+C^4=D^4 in coprime natural numbers A,B,C, and D, starting with 2682440^4+15365639^4+18796760^4=20615673^4. We thus disprove the n=4 case of Euler’s conjectured generalization of …” It’s like a really really hard olympiad problem!

Sam Greitzer: The old guy who ran the mathematical olympiad program. A cranky, mean old man. On the other hand, I suppose he was doing it all on a volunteer or quasi-volunteer basis, and maybe you had to be a mean guy to keep a bunch of teenage boys under control. Still, he was not a pleasant person by any means. According to Wikipedia, he was born on August 10, 1905, so he was already 74 years old when I met him. A cranky old man indeed. I’ll give him a break. When I’m 74, I’ll probably have difficulty relating to 15-year-olds too.

Murray Klamkin: The second banana. Not so young himself, he was 59 when I knew him. More mild-mannered. According to Wiki, Klamkin “worked at AVCO, taught at SUNY Buffalo, and served as the Principal Research Scientist at Ford Motor Company,” among other things.

Mike Larsen: A former olympiad competitor who was serving as a coach. I think he was about 18. I don’t remember much about him, I suppose he (wisely) spent most of his time coaching the top kids, as the goal was to improve the national team, not to bring the laggards up to par. And, hey, here he is on Wikipedia! He teaches math at the University of Indiana. His most recent published paper: “Deformation theory and finite simple quotients of triangle groups I.” That, I’ll have to say, is the kind of thing we all imagined doing when we grew up. Pure math. You can’t get much purer than this.

OK, those are the names I can remember. But this is frustrating: I can only remember 15 kids out of 24. The closest to an official site I could find was this, which seems be from 2000 or 2001 and is on the Mathematical Association of America website. It has many omissions: it does not include me or most of my friends listed above!

P.S. I sent the above to Jordan Ellenberg, who added the following:

I [Jordan] have memories of lots of these people, though they’re a MOP generation older than me. Some disorganized thoughts.

Brian Hunt — I remember him as captain of the Montgomery County Math Team! I think 1980 or 1981 was the first time I went to ARML. I haven’t seen him in years and years even though we’re both in math academia.

Benji Fisher — left academic math a long time ago but wrote a very beautiful paper with Sol Friedberg that presents questions I still strongly feel need answers…. “long hair in a ponytail” fits my memory too. But was he also from Montgomery County? Weird that I would forget that!

Jack Brennen — remember the name, nothing else.

Gregg Patruno — he was the director of MOP by the time I went, when I was in high school! He was already working in the financial industry at that time. But great guy, “intense in a low-key way” if that makes sense; he really made me feel it was worth it to become ultra-strong at contest problems. I remember at one point the whole team went to Gregg’s house in Staten Island, the one and only time I’ve ever been there.

Leonid Fridman — he was a Harvard grad student when I was an undergrad. He founded a group called “The Society of Nerds and Geeks” and had an op/ed in the New York Times about it and was very briefly a national expert on nerds.

Noam: That A^4+B^4+C^4=D^4 paper was something he wrote in high school, I think! And people were like WHA-A-A-A-A-T? He is still an elliptic curves master (and master of many other things as well.)

Mike Larsen is a really good mathematician, who like Noam is very broad; I guess I would call him a group theorist but you could as well call him a number theorist, algebraic geometer, or many other things. He and I wrote a paper together about motives.