Skip to content

Further evidence that creativity and innovation are stimulated by college sports: Evidence from a big regression

Kevin Lewis sent along this paper from the Creativity Research Journal:

Further Evidence that Creativity and Innovation are Inhibited by Conservative Thinking: Analyses of the 2016 Presidential Election

The investigation replicated and extended previous research showing a negative relationship between conservatism and creative accomplishment. Conservatism was estimated, as in previous research, from voting patterns. The voting data used here were from the 2016 US Presidential election. The number of patents granted per county in the United States was used as estimate of creative and innovative accomplishment. Using a 2-level multilevel approach, in which state-level influences are taken into consideration, various control variables were tested, including socioeconomic status (SES), education, income, and diversity. The results confirmed a negative relationship between conservatism and the number of patents granted. Therefore, in counties and states with high conservatism, fewer patents were granted, even after controlling for SES and population. Patents were positively related to racial diversity and education. Practical implications include the benefits of liberal thinking outside of the political arena. Liberal thinking is very likely associated with flexibility, tolerance, and openness, and according to the present results, creative accomplishment. Limitations of the research and future directions are discussed.

I’d really like to think this a parody but it just might well be serious. I wonder what Susan T. Fiske would think of it. On one hand, it’s ridiculous. On the other hand, it’s a peer-reviewed publication with p less than 0.05 so it’s got to be true. It’s a tough call.

Meanwhile, I have an idea that, outside of certain big cities, the number of patents in a county is associated with the presence of college sports teams. I conjecture that the presence of college sports stimulates the sort of creative thinking. Somebody get a p-value on that, ok?
Continue reading ‘Further evidence that creativity and innovation are stimulated by college sports: Evidence from a big regression’ »

Chess records page

Chess records page (no, not on the first page, or the second page, or the third page, of a google search of *chess records*).

There’s lots of good stuff here, enough to fill much of a book if you so desire. As we’ve discussed, chess games are in the public domain so if you take material on chess games from an existing book or website without crediting the person who compiled this material, you’re not actually plagiarizing.

Getting the right uncertainties when fitting multilevel models

Cesare Aloisi writes:

I am writing you regarding something I recently stumbled upon in your book Data Analysis Using Regression and Multilevel/Hierarchical Models which confused me, in hopes you could help me understand it. This book has been my reference guide for many years now, and I am extremely grateful for everything I learnt from you.

On page 261, a 95% confidence interval for the intercept in a certain group (County 26) is calculated using only the standard error of the “random effect” (the county-level error). The string is as follows:

coef(M1)$county[26,1] + c(-2,2)*se.ranef(M1)$county[26]

My understanding is that, since the group-level prediction (call it y.hat_j = coef(M1)$county[26,1]) is a linear combination of a global average and a group-level deviation from the average (y.hat_j = beta_0 + eta_j), then the variance of y.hat_j should be the sum of the covariances of beta_0 and eta_j, not just the variance of eta_j, as the code on page 261 seems to imply. In other words:

Var(y.hat_j) = Var(beta_0) + Var(eta_j) + 2Cov(beta_0, eta_j)

Admittedly, lme4 does not provide an estimate for the last term, the covariance between “fixed” and “random” effects. Was the code used in the book to simplify the calculations, or was there some deeper reason to it that I failed to grasp?

My reply: The short answer is that it’s difficult to get this correct in lmer but very easy when using stan_lmer() in the rstanarm package. That’s what I recommend, and that’s what we’ll be doing in the 2nd edition of our book.

Stan Weekly Roundup, 22 September 2017

This week (and a bit from last week) in Stan:

  • Paul-Christian Bürkner‘s paper on brms (a higher-level interface to RStan, which preceded rstanarm and is still widely used and recommended by our own devs) was just published as a JStatSoft article. If you follow the link, the abstract explains what brms does.

  • Ben Goodrich and Krzysztof Sakrejda have been working on standalone functions in RStan. The bottleneck has been random number generators. As an application, users want to write Stan functions that they can use for efficient calculations inside R; it’s easier than C++ and we have a big stats library with derivatives backing it up.

  • Ben Goodrich has also been working on multi-threading capabilities for RStan.

  • Sean Talts has been continuing the wrestling match with continuous integration. Headaches continue from new compiler versions, Travis timeouts, and fragile build scripts.

  • Sean Talts has been working with Sebastian code reviewing MPI. The goal is to organize the code so that it’s easy to test and maintain (the two go together in well written code along with readability and crisp specs for the API boundaries).

  • Sean Talts has been working on his course materials for Simulation-Based Algorithmic Calibration (SBAC), the new name for applying the diagnostic of Cook et al.

  • Bob Carpenter has been working on writing a proper language spec for Stan looking forward to tuples, ragged and sparse structures, and functions for Stan 3. That’s easy; the denotational semantics will be more challenging as it has to be generic in terms of types and discuss how Stan compiles to a function with transforms.

  • Bob Carpenter has also been working on getting variable declarations through the model concept. After that, a simple base class and constant correctness for the model class to make it easier to use outside of Stan’s algorithms.

  • Michael Betancourt and Sean Talts have been prepping their physics talks for the upcoming course at MIT. There are post-it notes at metric heights in our office and they filmed themselves dropping a ball while filming a phone’s stopwatch (clever! hope that’s not too much of a spoiler for the course).

  • Michael Betancourt is also working on organizing the course before StanCon that’ll benefit NumFOCUS.

  • Jonah Gabry and T.J. Mahr released BayesPlot 1.4 with some new visualizations from Jonah’s paper.

  • Jonah Gabry working on Mac/C++ issues with R and has had to communicate with the R devs themselves it’s gotten so deep.

  • Lauren Kennedy has joined the team at Columbia taking over for Jonah Gabry at the population research center in the School of Social Work; she’ll be focusing on population health. She’ll also be working us (specifically with Andrew Gelman) on survey weighting and multilevel regression and poststratification with rstanarm.

  • Dan Simpson has been working on sparse autodiff architecture and talking with Aki and me about parallelization.

  • Dan Simpson and Andrew Gelman have been plotting how to marginalize out random effects in multilevel linear regression.

  • Andrew Gelman has been working with Jennifer Hill and others on revising his and Jennifer’s regression book. It’ll come out in two parts, the first of which is (tentatively?) titled Regression and Other Stories. Jonah Gabry has been working on the R packages for it and beefing up bayesplot and rstanarm to handle it.

  • Andrew Gelman is trying to write all the workflow stuff down with everyone else including Sean Talts, Michael Betancourt, and Daniel Simpson. As usual, a simple request from Andrew to write a short paper has spun out into real developments on the computational and methodological front.

  • Aki Vehtari, Andrew Gelman and others are revising the expectation propagation (EP) paper; we’re excited about the data parallel aspects of this.

  • Aki Vehtari gave a talk on priors for GPs at the summer school for GPs in Sheffield. He reports there were even some Stan users there using Stan for GPs. Their lives should get easier over the next year or two. Videos of the talks: Priors and integration over hyperparameters for GPs and On Bayesian model selection and model averaging.

  • Aki Vehtari reports there are 190 students in his Bayes class in Helsinki!

  • Michael Betancourt, Dan Simpson, and Aki Vehtari wrote comments on a paper about frequentist properties of horseshoe priors. Aki’s revised horseshoe prior paper has also been accepted.

  • Ben Bales wrote some generic array append code and also some vectorized random number generators which I reviewed and should go in soon.

  • Bill Gillespie is working on a piecewise linear interpolation function for Stan’s math library; he already added it to Torsten in advance of the ACoP tutorial he’s doing next month. He’ll be looking at a 1D integrator as an exercise, picking up from where Marco Inacio left off (it’s based on some code by John Cook).

  • Bill Gillespie is trying to hire a replacement for Charles Margossian at Metrum. He’s looking for someone who wants to work on Stan and pharmacology, preferably with experience in C++ and numerical analysis.

  • Krzysztof Sakrejda started a postdoc working on statistical modeling for global scale demographics for reproductive health with Leontine Alkema at UMass Amherst.

  • Krzysztof Sakrejda is working with makefiles trying to simplify them and inadvertently solved some of our clang++ compiler issues for CmdStan.

  • Matthijs Vákár got a pull request in for GLMs to speed up logistic regression by a factor of four or so by introducing analytic derivatives.

  • Matthijs Vákár is also working on higher-order imperative semantics for probabilistic programming languages like Stan.

  • Mitzi Morris finished last changes for pull request on base expression type refactor (this will pave the way for tuples, sparse matrices, ragged arrays, and functional types—hence all the semantic activity).

  • Mitzi Morris is also refactoring the local variable type inference system to squash a meta-bug that surfaced with ternary operators and will simplify the code.

  • Charles Margossian is finishing a case study on the algebraic solver to submit for the extended StanCon deadline. While he’s knee-deep in first-year grad student courses in measure theory and statistics.

  • Breck Baldwin and others have been talking to DataCamp (hi, Rasmus!) and Coursera. We’ll be getting some Stan classes out over the next year or two. Coordinating with DataCamp is easy, Coursera plus Columbia less so.

(video links added by Aki)

Air rage update

So. Marcus Crede, Carol Nickerson, and I published a letter in PPNAS criticizing the notorious “air rage” article. (Due to space limitations, our letter contained only a small subset of the many possible criticisms of that paper.) Our letter was called “Questionable association between front boarding and air rage.”

The authors of the original paper, Katherine DeCelles and Michael Norton, published a response in which they concede nothing. They state that their hypotheses are “are predicated on decades of theoretical and empirical support across the social sciences” and they characterize their results as “consistent with theory.” I have no reason to dispute either of these claims, but at the same time these theories are so flexible that they could predict just about anything, including, I suspect, the very opposite of the claims made in the paper. As usual, there’s a confusion between a general scientific theory and some very specific claims regarding regression coefficients in some particular fitted model.

Considering the DeCelles and Norton reply in a context-free sense, it reads as reasonable: yes, it is possible for the signs and magnitudes of estimates to change when adding controls to a regression. The trouble is that their actual data seem to be of low quality, and due to the observational nature of their study, there are lots of interactions not included in the model that are possibly larger than their main effects (for example, interactions of plane configuration with type of flight, interactions with alcohol consumption, nonlinearities in the continuous predictors such as number of seats and flight difference).

The whole thing is interesting in that it reveals the challenge of interpreting this sort of exchange from the outside. how it is possible for researchers to string together paragraphs that have the form or logical argument, in support of whatever claim they’d like to make. Of course someone could say the same about us. . . .

One good thing about slogans such as “correlation does not imply causation” is that they get right to the point.

Will Stanton hit 61 home runs this season?

[edit: Juho Kokkala corrected my homework. Thanks! I updated the post. Also see some further elaboration in my reply to Andrew’s comment. As Andrew likes to say …]

So far, Giancarlo Stanton has hit 56 home runs in 555 at bats over 149 games. Miami has 10 games left to play. What’s the chance he’ll hit 61 or more home runs? Let’s make a simple back-of-the-envelope Bayesian model and see what the posterior event probability estimate is.

Sampling notation

A simple model that assumes a home run rate per at bat with a uniform (conjugate) prior:

\theta \sim \mbox{Beta}(1, 1)

The data we’ve seen so far is 56 home runs in 555 at bats, so that gives us our likelihood.

56 \sim \mbox{Binomial}(555, \theta)

Now we need to simulate the rest of the season and compute event probabilities. We start by assuming the at-bats in the rest of the season is Poisson.

\mathit{ab} \sim \mbox{Poisson}(10 \times 555 / 149)

We then take the number of home runs to be binomial given the number of at bats and the home run rate.

h \sim \mbox{Binomial}(\mathit{ab}, \theta)

Finally, we define an indicator variable that takes the value 1 if the total number of home runs is 61 or greater and the value of 0 otherwise.

\mbox{gte61} = \mbox{I}[h \geq (61 - 56)]

Event probability

The probability Stanton hits 61 or more home runs (conditioned on our model and his performance so far) is then the posterior expectation of that indicator variable,

\displaystyle \mbox{Pr}[h \geq (61 - 56)] \\[6pt] \hspace*{3em}  \displaystyle { } = \ \int_{\theta} \ \sum_{ab} \, \ \mathrm{I}[h \geq 61 - 56] \\ \hspace*{8em} \ \times \ \mbox{Binomial}(h \mid ab, \theta) \\[6pt] \hspace*{8em} \ \times \ \mbox{Poisson}(ab \mid 10 \ \times \ 555 / 149) \\[6pt] \hspace*{8em} \ \times \ \mbox{Beta}(\theta \mid 1 + 56, 1 + 555 - 56) \ \mathrm{d}\theta.

Computation in R

The posterior for \theta is analytic because the prior is conjugate, letting us simulate the posterior chance of success given the observed successes (56) and number of trials (555). The number of at bats is independent and also easy to simulate with a Poisson random number generator. We then simulate the number of hits on the outside as a random binomial, and finally, we compare it to the total and then report the fraction of simulations in which the simulated number of home runs put Stanton at 61 or more:

> sum(rbinom(1e5,
             rpois(1e5, 10 * 555 / 149),
             rbeta(1e5, 1 + 56, 1 + 555 - 56))
       >= (61 - 56)) / 1e5
[1] 0.34

That is, I’d give Stanton about a 34% chance conditioned on all of our assumptions and what he’s done so far.


The above is intended for recreational use only and is not intended for serious bookmaking.


You guessed it—code this up in Stan. You can do it for any batter, any number of games left, etc. It really works for any old statistics. It’d be even better hierarchically with more players (that’ll pull the estimate for \theta down toward the league average). Finally, the event probability can be done with an indicator variable in the generated quantities block.

The basic expression looks like you need discrete random variables, but we only need them here for the posterior calculation in generated quantities. So we can use Stan’s random number generator functions to do the posterior simulations right in Stan.

“What we know and don’t know about the 2016 election—and beyond” (event at Columbia poli sci dept next Monday midday)

On Monday 25 Sep, 12:10-1:45pm, in the Playroom (707 International Affairs Bldg):

“What we know and don’t know about the 2016 election—and beyond”

(discussion led by Bob Shapiro, Bob Erikson, me, and other Columbia political science faculty)

It’s not enough to be a good person and to be conscientious. You also need good measurement. Cargo-cult science done very conscientiously doesn’t become good science, it just falls apart from its own contradictions.

Kevin Lewis points us to a biology/psychology paper that was a mix of reasonable null claims (on the order of, the data don’t give us enough information to say anything about XYZ) and some highly questionable noise mining supported by p-values and forking paths.

The whole thing is just so sad. The researchers are aware of the statistical problems of forking paths, but they still persist in doing noise-mining research, perhaps in response to the requirements of clueless reviewers. The thing don’t always seem to be understood in this sort of work is that it’s not enough to be a good person and to be conscientious. You also need good measurement. Cargo-cult science done very conscientiously doesn’t become good science, it just falls apart from its own contradictions.

Again: you don’t have to be a good person to be a good scientist.

If you do happen to be a good person, the above sentence implies two things:
Continue reading ‘It’s not enough to be a good person and to be conscientious. You also need good measurement. Cargo-cult science done very conscientiously doesn’t become good science, it just falls apart from its own contradictions.’ »

Call for papers: Probabilistic Programming Languages, Semantics, and Systems (PPS 2018)

I’m on the program committee and they say they’re looking to broaden their horizons this year to include systems like Stan. The workshop is part of POPL, the big programming language theory conference. Here’s the official link

The submissions are two-page extended abstracts and the deadline is 17 October 2017; the workshop itself is in Los Angeles on 9 January 2018.

You can also see the program from last year. I would’ve liked to have seen Gordon Plotkin’s talk, but there aren’t even abstracts on line; I see that despite the hype surrounding comp sci, he’s still modestly titling his contributions “Towards …”.

The workshop is the day before StanCon starts in Monterey (up the coast). That’s cutting it too close for me, so I won’t be at the workshop. I do hope to see you at


Using black-box machine learning predictions as inputs to a Bayesian analysis

Following up on this discussion [Designing an animal-like brain: black-box “deep learning algorithms” to solve problems, with an (approximately) Bayesian “consciousness” or “executive functioning organ” that attempts to make sense of all these inferences], Mike Betancourt writes:

I’m not sure AI (or machine learning) + Bayesian wrapper would address the points raised in the paper. In particular, one of the big concepts that they’re pushing is that the brain builds generative/causal models of the world (they do a lot based on simple physics models) and then use those models to make predictions outside of the scope of the data that they have previous seen. True out of sample performance is still a big problem in AI (they’re trying to make the training data big enough to make “out of sample” an irrelevant concept, but that’ll never really happen) and these kinds of latent generative/causal models would go a long way to improving that. Adding a Bayesian wrapper could identify limitations of an AI, but I don’t see how it could move towards this kind of latent generative/causal construction.

If you wanted to incorporate these AI algorithms into a Bayesian framework then I think it’s much more effective to treat the algorithms as further steps in data reduction. For example, train some neural net, treat the outputs of the net as the actual measurement, and then add the trained neural net to your likelihood. This is my advice when people want to/have to use machine learning algorithms but also want to quantify systematic uncertainties.

My response: yes, I give that advice too, and I’ve used this method in consulting problems. Recently we had a pleasant example in which we started by using the output from the so-called machine learning as a predictor, then we fit a parametric model to the machine-learning fit, and now we’re transitioning toward modeling the raw data. Some interesting general lessons here, I think. In particular, machine-learning-type methods tend to be crap at extrapolation and can have weird flat behavior near the edge of the data. So in this case when we went to the parametric model, we excluded some of the machine-learning predictions in the bad zone as they were messing us up.

Betancourt adds:

It could also be spun into a neurological narrative. As in our sensory organs and lower brain functions operate as AI, reducing raw inputs into more abstract/informative features from which the brain can then go all Bayesian and build the generative/causal models advocated in
the paper.

p less than 0.00000000000000000000000000000000 . . . now that’s what I call evidence!

I read more carefully the news article linked to in the previous post, which describes a forking-pathed nightmare of a psychology study, the sort of thing that was routine practice back in 2010 or so but which we’ve mostly learned to at least try to avoid.

Anyway, one thing I learned there’s something called “terror management theory.” Not as important as embodied cognition, I guess, but it seems to be a thing: according to the news article, it’s appeared in “more than 500 studies conducted over the past 25 years.”

I assume that each of these separate studies had p less than 0.05, otherwise they wouldn’t’ve been published, and I doubt they’re counting unpublished studies.

So that would make the combined p-value less than 0.05^500.

Ummm, what’s that in decimal numbers?

> 500*log10(0.05)
[1] -650.515
> 10^(-0.515)
[1] 0.3054921

OK, so the combined result is p less than 0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000031.

I guess terror management theory must be real, then.

The news article, from a Pulitzer Prize-winning reporter (!), concludes:

Score one for the scientists.

That’s old-school science writing for ya.

As I wrote in my previous post, I feel bad for everyone involved in this one. Understanding of researcher degrees of freedom and selection bias has only gradually percolated through psychology research, and it stands to reason that there are still lots of people, young and old, left behind, still doing old-style noise-mining, tea-leaf-reading research. I can only assume these researchers are doing their best, as is the journalist reporting these results, with none of them realizing that they’re doing little more than shuffling random numbers.

The whole thing is funny, but it’s sad, but I hope we’re moving forward. The modern journalists are getting clued in, and I expect the traditional science journalists will follow. There remains the problem of selection bias, that the credulous reporters write up these stories while the skeptics don’t bother. But I’m hoping that, one by one, reporters will figure out what’s going on.

After all, nobody wants to be the last one on the sinking ship. I guess not completely alone, as you’d be accompanied by the editor of Perspectives on Psychological Science and the chair of the Association for Psychological Science publications board. But who really wants to hang out with them all day?

Stan Course in Newcastle, United Kingdom!

(this post is by Betancourt)

The growth of Stan has afforded the core team many opportunities to give courses, to both industrial and academic audiences and at venues  across the world.  Regrettably we’re not always able to keep up with demand for new courses, especially outside of the United States, due to our already busy schedules.  Fortunately, however, some of our colleagues are picking up the slack!

In particular, Jumping Rivers is hosting a two day introductory RStan course at the University of Newcastle in the United Kingdom from Thursday December 7th to Friday December 8th.  The instructor is my good friend Sarah Heaps, who not only is an excellent statistician and avid Stan user but also attended one of the first RStan courses I gave!

If you are on the other side of the Atlantic and interested in learning RStan then I highly recommend attending (and checking out Newcastle’s surprisingly great Chinatown during lunch breaks).

And if you are interested in organizing a Stan course with any members of the core team then don’t hesitate to contact me to see if we might be able to arrange something.

As if the 2010s never happened

E. J. writes:

I’m sure I’m not the first to send you this beauty.

Actually, E. J., you’re the only one who sent me this!

It’s a news article, “Can the fear of death instantly make you a better athlete?”, reporting on a psychology experiment:

For the first study, 31 male undergraduates who liked basketball and valued being good at it were recruited for what they were told was a personality and sports study. The subjects were asked to play two games of one-on-one basketball against a person they thought was another subject but who was actually one of the researchers.

In between the two games, the participants were asked to fill out a questionnaire. Half of the subjects were randomly assigned questions that probed them to think about a neutral topic (playing basketball); the other half were prompted to think about their mortality with questions such as, “Please briefly describe the thoughts and emotions that the thought of your own death arouses in you” . . .

That’s right, priming! What could be more retro than that?

The news article continues:

The researchers hypothesized that according to terror management theory, those who answered the mortality questions should show an improvement in their second game. When the results of the experiment, which was videotaped, were analyzed, the researchers found out the subjects’ responses exceeded their expectations: The performance in the second game for those who had received a memento mori increased 40 percent, while the other group’s performance was unchanged.

They quoted one of the researchers as saying, “What we were surprised at was the magnitude of the effect, the size in which we saw the increases from baseline.”

I have a feeling that nobody told them about type M errors.

There’s more at the link, if you’re interested.

I feel bad for everyone involved in this one. Understanding of researcher degrees of freedom and selection bias has only gradually percolated through psychology research, and it stands to reason that there are still lots of people, young and old, left behind, still doing old-style noise-mining, tea-leaf-reading research. I can only assume these researchers are doing their best, as is the journalist reporting these results, with none of them realizing that they’re doing little more than shuffling random numbers.

One recommendation that’s sometimes given in these settings is to do preregistered replication. I don’t always like to advise this because, realistically, I expect that the replication won’t work. But preregistration can help to convince. I refer you to the famous 50 shades of gray study.

Maybe this paper is a parody, maybe it’s a semibluff

Peter DeScioli writes:

I was wondering if you saw this paper about people reading Harry Potter and then disliking Trump, attached. It seems to fit the shark attack genre.

In this case, the issue seems to be judging causation from multiple regression with observational data, assuming that control variables are enough to narrow down to causality (or that it’s up to a critic to find the confounds). It speaks to a bigger issue about how researchers interpret multiple regression in causal terms.

Any thoughts on this, or obvious/good references critiquing causal interpretations of multiple regression? (like to assign to my PhD students)

My reply: Hi, yes, I saw this paper months ago. I suspected it was a parody but someone told me that it was actually supposed to be serious. I still think it’s a kind of half-parody, it’s what social scientists might call a “fun” result, and it’s published in a non-serious journal, so I doubt the author takes it completely seriously. Kinda like this: you find an interesting pattern in data, it’s probably no big deal, but who knows, so get it out there and people can make of it what they will.

Twenty years ago, social scientists could do this and it would be no problem; nowadays with all this stuff on shark attacks, college football, power pose, contagion of obesity, etc., it seems that people have more difficult putting such speculations into perspective: any damn data pattern they see, they want to insist it’s a big deal, from data analysis to publication to Ted talk and NPR. In some sense this Harry Potter paper is a throwback and it would probably be best to interpret it the way we’d have taken it twenty or thirty years ago.

It’s impossible for me to tell whether the author, Diana Mutz, is writing this paper as a parody. Intonation is notoriously difficult to convey in typed speech. It’s a funny thing: if the paper’s not a parody and I say it is, then I’m kinda being insulting. But if the paper is a parody and I take it seriously, then I’m not getting the joke. So there’s no safe interpretation here! (I could ask Mutz directly but that’s not much of a general solution; I’d rather think of a published article and its implications as standing on their own and not requiring typically unobtainable “meta-data” on authorial intentions.)

DeScioli responded:

It does have some whimsical passages so maybe it is half-parody.

And I continued:

Yeah, there’s this genre of research which is not entirely serious but not entirely a joke, kinda what in poker we’d call a semibluff. Back in the good old days before Gladwell, PPNAS, NPR hype, etc., it was reasonable for researchers to try out some of these ideas, they were long shots but had some appeal as part of the mix of science. For awhile, though, it was seeming like this sort of open-ended-speculation-backed-by-statistically-significant-p-values had become most of social science, and this has reduced all of our patience for this sort of thing. Which is kinda too bad. Another example is that observation that several recent presidents were left-handed. It seems like it should be possible to point to such data patterns, and even run some statistics on them, without making large claims.

DeScioli followed up:

Seems it could still be as fun and interesting to look for these types of correlations without claiming causality. I was just surprised to see the paper double down on the causal interpretation with the argument that the analysis controlled for everything they could think of. (My assumption is that observational data has countless confounding correlations that no one could think of.) I don’t think this paper is worse in over-interpreting than many others I’ve seen. It’s just easier to notice because of the whimsical topic.

What’s the lesson for avoiding this for a more serious-sounding theory? I typically restrict causal judgments to experimental manipulations. But maybe that is too restrictive? The only other thing I can think of is if a researcher knew so much about their subject that they could boil down the possible causes to a handful. Then maybe multiple regression with controls could help sort between them. If so, the issue with Harry Potter is that it’s one of millions of similar cultural influences that are all hopelessly tangled and so can’t be untangled with observational methods.

Where does the discussion go?

Jorge Cimentada writes:

In this article, Yascha Mounk is saying that political scientists have failed to predict unexpected political changes such as the Trump nomination and the sudden growth of populism in Europe, because, he argues, of the way we’re testing hypotheses.

By that he means the quantitative aspect behind science discovery. He goes on to talk about the historical change from qualitative to modern quantitative analysis which hinders the capacity of scholars to study ‘less common’ or unfrequent situations, such as the ones outlined above.

I [Cimentada] am pretty sure there’s some truth behind that, but still, I think that the capacity to predict is not entirely based on the frequency of things. Another thing he fails to distinguish is that specific questions require specific designs. Depending on your research question, you might need to use qualitative over quantitative approaches.

If you have some time, I’d like to hear your stand on this. I thought this might be something which could fit in one of your blog entries which is why I contacted you.

I leave you with one paragraph that summarizes his main point pretty well:

It is easier to amass high-quality data, and therefore to make “rigorous” causal claims, about the economy than about culture; in part for that reason, the social sciences now favor economic over cultural modes of explanation. Similarly, it is easier to amass high-quality data, and to test causal hypotheses, about frequent events that are easy to count and categorize, like votes in Congress, than about rare and intractable events, like political revolutions; in part for that reason, the social sciences now tend to focus more on the business-as-usual of the recent past than on the great turning points of more distant times.

My reply: these are good questions that are worth considering. It’s kinda funny that they appeared in an opinion article in the Chronicle of Higher Education rather than in a political science journal, but I guess that journals are not so important for communication anymore. Nowadays journals are all about academic promotion and tenure. When people want to have scholarly discussions, they turn to newspapers, blogs, etc.

Extended StanCon 2018 Deadline!

(this post is by Betancourt)

We received an ensemble of exciting submissions for StanCon2018, but some of our colleagues requested a little bit of extra time to put the finishing touches on their submissions.  Being the generous organizers that we are, we have decided to extend the submission deadline for everyone by two weeks.

Contributed submissions will be accepted until September 29, 2017 5:00:00 AM GMT (that’s midnight on the east coast of the States, for those who aren’t fans of the meridian time).  We will do our best to review and send decisions out before the early registration deadline, but the sooner you submit the more likely you will hear back before then.  For more details on the submission requirements and how to submit see the Submissions page.

Early registration ends on Friday November 10, 2017, after which registration costs increase significantly.  Registration for StanCon 2018 is in two parts: an initial information form followed by payment and accommodation reservation at the Asilomar website.

Type M errors in the wild—really the wild!

Jeremy Fox points me to this article, “Underappreciated problems of low replication in ecological field studies,” by Nathan Lemoine, Ava Hoffman, Andrew Felton, Lauren Baur, Francis Chaves, Jesse Gray, Qiang Yu, and Melinda Smith, who write:

The cost and difficulty of manipulative field studies makes low statistical power a pervasive issue throughout most ecological subdisciplines. . . . In this article, we address a relatively unknown problem with low power: underpowered studies must overestimate small effect sizes in order to achieve statistical significance. First, we describe how low replication coupled with weak effect sizes leads to Type M errors, or exaggerated effect sizes. We then conduct a meta-analysis to determine the average statistical power and Type M error rate for manipulative field experiments that address important questions related to global change; global warming, biodiversity loss, and drought. Finally, we provide recommendations for avoiding Type M errors and constraining estimates of effect size from underpowered studies.

As with the articles discussed in the previous post, I haven’t read this article in detail, but of course I’m supportive of the general point, and I have every reason to believe that type M errors are a big problem in a field such as ecology where measurement is difficult and variation is high.

P.S. Steven Johnson sent in the above picture of a cat who is not in the wild, but would like to be.

Type M errors studied in the wild

Brendan Nyhan points to this article, “Very large treatment effects in randomised trials as an empirical marker to indicate whether subsequent trials are necessary: meta-epidemiological assessment,” by Myura Nagendran, Tiago Pereira, Grace Kiew, Douglas Altman, Mahiben Maruthappu, John Ioannidis, and Peter McCulloch.

From the abstract:

Objective To examine whether a very large effect (VLE; defined as a relative risk of ≤0.2 or ≥5) in a randomised trial could be an empirical marker that subsequent trials are unnecessary. . . .

Data sources Cochrane Database of Systematic Reviews (2010, issue 7) with data on subsequent large trials updated to 2015, issue 12. . . .

Conclusions . . . Caution should be taken when interpreting small studies with very large treatment effects.

I’ve not read the paper and so can’t evaluate these claims but they are in general consistent with our understanding of type M and type S errors. So, just speaking generally, I think it’s good to see this sort of study.

Along similar lines, Jonathan Falk pointed me to this paper, “On the Reproducibility of Psychological Science,” by Val Johnson, Richard Payne, Tianying Wang, Alex Asher, and Soutrik Mandal. I think their model (in which effects are exactly zero or else are spaced away from zero) is goofy and I don’t like the whole false-positive, false-negative thing or the idea that hypothesis tests in psychology experiments correspond to “scientific discoveries,” but I’m guessing they’re basically correct in their substantive conclusions, as this seems similar to what I’ve heard from other sources.

New Zealand election polling

Llewelyn Richards-Ward writes:

Here is a forecaster apparently using a simulated (?Bayesian) approach and smoothing over a bunch of poll results in an attempt to guess the end result. I looked but couldn’t find his methodology but he is at University of Auckland, if you want to track him down…

As a brief background, we have a very centrist government position and this has been so for many years since the neo-liberalist earthquake of the 80s hit us all. Really, the two main parties, Labour and National, are a metre away from each other compared to the gulf between similar parties overseas. We have an MMP system, with 120 seats. Usually there is a coalition government, which has given us very stable growth, positive social outcomes and reasonable taxation. This election is one where the swings in polls and voter options have been startling, for a small place like ours. First, the Green Party co-leader admitted to welfare fraud, actually told us all about it, hoping to garner sympathy for their social issues. She was a goner after public outcry. We all hate cheaters. Then the leader of a stabilising minor party decided to quit, probably as he was polling lower than usual. That party is probably now in that place called oblivion as, without an electorate seat or 5% polling, no ticket to parliament is handed out. Then the Labour party, lately rather staid and uninspiring, killed off yet another leader and a new, female, young, face appeared. This has been dubbed “jacinda-mania”. Lots of energy, she has a PR background, and appeals to the young chic voters (most of whom never actually vote). Labour, after leaping from the low teens to 43% (or something, in some polls) as a result of a fresh face (same policies and other candidates), now is slipping back again. Today, after stupidly promising a tax working group (which we all know means with people who will eat towards their preferred outcomes), they have U-U-turned and now say they will come up with ideas and put it to the electorate next election.The lesson again is don’t ask voters to write blank cheques and don’t threaten middle-NZ with property taxes whilst large corporates are paying very little. NZ, given mostly we are doing well, is not a place easily swayed into change when status-quo seems to be working. Who knows what tomorrow brings — I personally will be voting early to reduce the tension of it all.

My take is that all the pollsters are off-centre because of poll variability. It seems that the above issues have induced changes of preference in the public, rather than simply variability being about noise, so-called.

I don’t know enough about New Zealand to comment on this one. I did read an interesting book about New Zealand politics back when I visited the country in, ummm, 1992 I think it was. But I haven’t really thought about the country since, except briefly when doing our research project on representation and government spending in subnational units. New Zealand is (or was) one of the few countries in the world a unitary government in which no power was devolved to states or provinces.

Anyway, regarding Richards-Ward’s final paragraph above, I would expect that (a) much of the apparent fluctuation in the polls is really explained by fluctuations in differential nonresponse, but (b) you’ll see some instability in a multi-candidate election that you wouldn’t see in a (nearly) pure two-party system such as we have in the United States.

P.S. I feel like I should make some sort of Gandalf joke here but I’m just not really up for it.

American Democracy and its Critics

I just happened to come across this article of mine from 2014: it’s a review published in the American Journal of Sociology of the book “American Democracy,” by Andrew Perrin.

My review begins:

Actually-existing democracy tends to have support in the middle of the political spectrum but is criticized on the two wings.

I like Perrin’s book, and I like my review too, so I’m sharing it with you here.

P.S. Turnabout is fair play; here’s Perrin’s review from a few years back of Red State Blue State and several other books.