Skip to content

This year’s Atlantic Causal Inference Conference: 20-21 May

Dylan Small writes:

The conference will take place May 20-21 (with a short course on May 19th) and the web site for the conference is here. The deadline for submitting a poster title for the poster session is this Friday. Junior researchers (graduate students, postdoctoral fellows, and assistant professors) whose poster demonstrates exceptional research will also be considered for the Thomas R. Ten Have Award, which recognizes “exceptionally creative or skillful research on causal inference.” The two award winners will be invited to speak at the 2016 Atlantic Causal Inference Conference.

We held the first conference in this series ten years ago at Columbia, and I’m glad to see it’s still doing well.

Statistical analysis on a dataset that consists of a population

This is an oldie but a goodie.

Donna Towns writes:

I am wondering if you could help me solve an ongoing debate?

My colleagues and I are discussing (disagreeing) on the ability of a researcher to analyze information on a population. My colleagues are sure that a researcher is unable to perform statistical analysis on a dataset that consists of a population, whereas I believe that statistical analysis is appropriate if you are testing future outcomes. For example, a group of inmates in a detention centre receive a new program. As it would contravene ethics, all offenders receive the program. Therefore, a researcher would need to compare a group of inmates prior to the introduction of the program. Assuming, or after confirm that these two populations are similar, are we able to apply statistical analysis to compare the outcomes of these to populations (such as time to return to detention)? If so, what would be the methodologies used? Do you happen to know of any articles that discuss this issue?

I replied with a link to this post from 2009, which concludes:

If you set up a model including a probability distribution for these unobserved outcomes, standard errors will emerge.

Statistical significance, practical significance, and interactions

I’ve said it before and I’ll say it again: interaction is one of the key underrated topics in statistics.

I thought about this today (OK, a couple months ago, what with our delay) when reading a post by Dan Kopf on the exaggeration of small truths. Or, to put it another way, statistically significant but not practically significant.

I’ll give you Kopf’s story and then explain how everything falls into place when we think about interactions.

Here’s Kopf:

Immediately after an album is released, the critics descend. The period from the first to last major review for an albums typically falls between 1-6 months. As time passes, the average review gets slightly worse. Assuming my methodology is appropriate and the data is accurate and representative, this is very likely a statistical truth.

But is this interesting? . . .

My result about album reviews worsening over the review periods is “statistically significant.” The p-value is so small it risks vanishing. My initial response to the finding was excitement and to begin armchair psychologizing on what could be causing this. I even wrote an extensive article on my speculations.

But I [Kopf] was haunted by this image:


Each point is a review. Red ones are above average for that album, and green ones below. . . . With so many data points, it can be difficult for the human eye to determine correlation, but your eyes don’t deceive you. There is not much going on here. Only 1% of the variation of an album’s rating is explained by knowing when in the order of reviews an album fell. . . .

Is 1% worth considering? It depends on the subject matter, but in this case, its probably not. When you combine the large sample sizes that come with big data and the speed of modern computing, it is relatively easy to find patterns in data that are statistically significant . . . . But many of these patterns will be uninteresting and/or meaningless for decision making. . . .

But here’s the deal. What does it mean for the pattern to be tiny but still statistically significant? There are lots of albums that get reviewed. Each set of reviews has a time trend. Some trends go up, some go down. Is the average trend positive or negative? Who cares? The average trend is a mixture of + and – trends, and whether the avg is + or – for any given year depends on the population of albums for that year.

So I think the answer is the secret weapon (or, to do it more efficiently, a hierarchical model). Slice the data a bunch of ways. If the trend is negative for every album, or for 90% of albums, then this is notable, if puzzling: how exactly would that be, that the trend is almost always negative, but the aggregate pattern is so weak?

More likely, the trend is positive for some, negative for others, and you could try to understand that variation.

The key is to escape from the trap of trying to estimate a single parameter. Also to point out the near-meaninglessness of statistical significance the context of varying patterns.

Political Attitudes in Social Environments

Jose Duarte, Jarret Crawford, Charlotta Stern, Jonathan Haidt, Lee Jussim, and Philip Tetlock wrote an article, “Political Diversity Will Improve Social Psychological Science,” in which the argued that the field of social psychology would benefit from the inclusion of more non-liberal voices (here I’m using “liberal” in the sense of current U.S. politics). Duarte et al. argue that “one key type of viewpoint diversity is lacking in academic psychology in general and social psychology in particular: political diversity . . . Increased political diversity would improve social psychological science by reducing the impact of bias mechanisms such as confirmation bias, and by empowering dissenting minorities to improve the quality of the majority’s thinking . . .”

Their article is scheduled to be published in Behavioral and Brain Sciences with several discussions, including one by Neil Gross and myself.

Here’s our abstract:

We agree with the authors that it is worthwhile to study professions’ political alignments. But we have seen no evidence to support the idea that social science fields with more politically diverse workforces generally produce better research. We also think that when considering ideological balance, it is useful to place social psychology within a larger context of the prevailing ideologies of other influential groups within society, such as military officers, journalists, and business executives.

And here’s the rest of our discussion:

Although we appreciate several things about the Duarte et al. essay, “Political Diversity Will Improve Social Psychological Science,” including its insistence that social scientists should work to minimize the impact of their political views on their research and its sensitivity to political threats to social science funding, we find their central argument unpersuasive. Contrary to the assertion of the authors, we have seen no evidence that social science fields with more politically diverse workforces have higher evidentiary standards, are better able to avoid replication failures, or generally produce better research. As there are no standardized ways to measure these outcomes at the disciplinary or subdisciplinary level, and as reliable data on researcher politics at the disciplinary and subdisciplinary level are scarce, there have never been—to our knowledge—any systematic attempts to examine the relationship between epistemic quality and variation in the political composition of the social-scientific community. The authors are thus calling for major changes in policy and practice based on sheer speculation. The authors cite some evidence of the benefits of “viewpoint diversity” in collaboration, but there is a scale mismatch between these studies (of small groups) and the field-level generalizations the authors make. In point of fact, research on the history and sociology of social science suggests that scientific/intellectual movements that bundle together political commitments and programs for research—movements of the sort the authors believe to have weakened social and personality psychology—have arisen under a wide range of political conditions, as have countermovements calling for greater objectivity. Until we know more about these and related dynamics, it would be premature to tinker with organizational machineries for knowledge production in the social sciences, however much one may worry, alongside the authors, about certain current trends.

In addition we think it is helpful to consider the Duarte et al. argument in a broader context by considering other fields that lean strongly to the left or to the right. The cleanest analogy, perhaps, is between college professors (who are disproportionately liberal Democrats) and military officers (mostly conservative Republicans; see the research of political scientist Jason Dempsey, 2009). In both cases there seems to be a strong connection between the environment and the ideology. Universities have (with some notable exceptions) been centers of political radicalism for centuries, just as the military has long been a conservative institution in most places (again, with some exceptions). And this is true even though many university professors are well-paid, live well, and send their children to private schools, and even though the U.S. military has been described as the one of the few remaining bastions of socialism remaining in the 21st century. Another example of a liberal-leaning profession is journalism (with its frequently-cited dictum to “comfort the afflicted and afflict the comfortable,” and again the relative liberalism of that profession has been confirmed by polls of journalists, for example Weaver et al., 2003), while business executives represent an important, and influential, conservative group in American society. There has been some movement to balance out the liberal bias of journalism in the United States, but it is not clear what would be done to balance political representation among military officers or corporate executives.

In short, we applaud the work of Duarte et al. in exploring the statistics and implications of political attitudes among social researchers. The psychology profession is, like the military, an all-volunteer force, and it is not clear to us that the purported benefits of righting the ideological balance among social psychologists (or among military officers, or corporate executives) are worth the efforts that would involved in such endeavors. But these sorts of ideological what-ifs make interesting thought experiments.

Regular readers of this blog will know that I have problems with a lot of the social psychology research that gets published and publicized. And I certainly feel that political conservatives should feel free to contribute to this field. It’s not at all clear to me that a change in the mix of political attitudes among psychology researchers has much to do, one way or another, with scientific reform in this area. But it’s a question worth raising, just as it’s worth raising in the context of journalism, business, the military, and other institutions within our society.

A message from the vice chairman of surgery at Columbia University: “Garcinia Camboja. It may be the simple solution you’ve been looking for to bust your body fat for good.”


Should Columbia University fire this guy just cos he says things like this:

“You may think magic is make believe but this little bean has scientists saying they’ve found the magic weight loss cure for every body type—it’s green coffee extract.”

“I’ve got the No. 1 miracle in a bottle to burn your fat. It’s raspberry ketones.”

“Garcinia Camboja. It may be the simple solution you’ve been looking for to bust your body fat for good.”

Probably not. Exaggerating or even lying, trading off your university affiliation, I don’t think that’s a firing offense. Even the possibly “outrageous conflicts of interest,” maybe there’s no hard evidence there. And it might be that in the classes he sticks to the more standard material, or labels his speculations as such.

Or maybe they should just reduce his salary and give him a very tiny office in a faraway building, and schedule his classes for Sundays at 3 in the morning? I have no idea.

Having this sort of joker on the faculty is embarrassing for Columbia, sure, but firing or even reprimanding him could be even worse. After all, where do you draw the line? Should faculty be canned for plagiarizing, or for making up interviews in ethnographic studies, or for expressing noxious political or legal opinions, or for refusing to retract or correct the errors in their published work?

Probably Columbia has to just take the reputational hit, which means they have to continue seeing this sort of thing in the press:

Astoundingly, Dr. Oz is the vice chairman and professor of surgery at Columbia University College of Physicians and Surgeons.

Astoundingly, indeed.

Just like Cornell with Daryl Bem: it’s all an embarrassment, but Bem’s Cornell affiliation is a currency of diminishing value. When his study first got publicity, Bem benefited from the Ivy League affiliation, but now his work is evaluated on its own terms.

Dr. Oz is different, maybe, because he remains in the news. If Columbia does decide they want to get rid of the guy, I don’t think they’d fire him. They’d just make his working conditions worse and worse until he quits of his own accord.

Or maybe Columbia will go on the offensive and fight for the Vice-Chairman’s right to party—ketones style!

But not just any ketones. It’s gotta be raspberry ketones.

Hey, I eat celery almost every day but I don’t go all TV about it.

P.S. I’m thinking we should add Oz to the scripts for “Second Chance U” and “The New Dirty Dozen”. And, hey, graphics designers: I’d still like some movie posters for these!

Instead of worrying about multiple hypothesis correction, just fit a hierarchical model.


Pejman Mohammadi writes:

I’m concerned with a problem in multiple hypothesis correction and, despite having read your article [with Jennifer and Masanao] on not being concerned about it, I was hoping I could seek your advice.

Specifically, I’m interested in multiple hypothesis testing problem in cases when the test is done with a discrete finite distribution. For example, when doing many tests using binomial distribution. This is an important problem as it appears in in more and more places in bioinformatics nowadays, such as differential gene expression testing, Allele specific expression testing, and pathway enrichment analysis.

What seems to be clear is that the current correction methods are too conservative for such tests, and it’s also straightforward to show that such finite test distributions produce less false positives as one would expect from the null distribution. My understanding is that there’s not a clear way how to correct for multiple hypotheses in this type of situations. I was wondering if I could have your advice on the issue.

My response:

Instead of picking one comparison and doing a multiple comparisons correction, I suggest you should fit a hierarchical model including all comparisons and then there will be no need for such a corrections.

Mohammadi followed up:

I’m not sure if making a hierarchical model would be a possibility for all the cases, and anyways most of these methods are done in a frequentist way. At the moment I work around it by correcting for unique tests only but that seems not necessarily a good idea.

To which I replied:

“Frequentist” is a word for the way in which inferences are evaluated. It is fine to do a hierarchical model from a frequentist perspective.

The feather, the bathroom scale, and the kangaroo


Here’s something I wrote in the context of one of those “power = .06″ studies:

My criticism of the ovulation-and-voting study is ultimately quantitative. Their effect size is tiny and their measurement error is huge. My best analogy is that they are trying to use a bathroom scale to weigh a feather—and the feather is resting loosely in the pouch of a kangaroo that is vigorously jumping up and down.

At some point, a set of measurements is so noisy that biases in selection and interpretation overwhelm any signal and, indeed, nothing useful can be learned from them. I assume that the underlying effect size in this case is not zero—if we were to look carefully, we would find some differences in political attitude at different times of the month for women, also different days of the week for men and for women, and different hours of the day, and I expect all these differences would interact with everything—not just marital status but also age, education, political attitudes, number of children, size of tax bill, etc etc. There’s an endless number of small effects, positive and negative, bubbling around.

I like the weighing-a-feather-while-the-kangaroo-is-jumping analogy. It includes measurement accuracy and also the idea that there are huge biases that are larger than the size of the main effect.

Online predictions from ipredict

Following up on our post on PredictWise, Richard Barker points to this fun site of market-based predictions. It’s subtitled, “Buy and sell stocks in future political and economic events.”

It’s based in New Zealand so you can bet on wacky propositions such as, “David Carter to be next High Commissioner from New Zealand to the United Kingdom.” They also have political events in the U.S. and other countries and science items such as, “NASA to announce the discovery of extraterrestrial life before 1 Jan 2017.” Whaaa….? They give this one a 9% chance of happening. Could that really be? Hmmm, maybe I should bet a few thousand on that. But I guess that’s the point, that the payoff would be low: betting $10,000 to win $1,000 isn’t so exciting. Also I’m guessing the market is not so liquid, so I probably couldn’t get much of a bet on this one in any case.

New Book: Bayesian Data Analysis in Ecology Using Linear Models with R, BUGS, and Stan

Fränzi and Tobias‘s book is now real:

Fränzi Korner-Nievergelt, Tobias Roth, Stefanie von Felten, Jérôme Guélat, Bettina Almasi, and Pius Korner-Nievergelt (2015) Bayesian Data Analysis in Ecology Using Linear Models with R, BUGS, and Stan. Academic Press.

This is based in part on the in-person tutorials that they and the other authors have been giving on statistical modeling for ecology.

The book starts at the beginning with an introduction to R, regression and ANOVA, discusses maximum likelihood estimation, then generalized linear models including “mixed effects” models, and then proceeds to Bayesian modeling with MCMC computation for inference, and winds up with some case studies involving BUGS and Stan. Everything works up from simple “hello world” type programs through real examples, which I really appreciate myself in computational examples.

Stan’s primarily showcased in three fully worked out examples (which I also really appreciate as a reader), all of which appear in Chapter 14, “Advanced Ecological Models”:

(14.2) zero-inflated Poisson mixed model for analyzing breeding success,

(14.3) occupancy model to measure species distribution, and

(14.5) analyzing survival based on mark-recapture data.

On deck this week

Mon: New book on Bayesian analysis in ecology using Stan

Tues: The feather, the bathroom scale, and the kangaroo

Wed: Instead of worrying about multiple hypothesis correction, just fit a hierarchical model.

Thurs: Political Attitudes in Social Environments

Fri: Statistical significance, practical significance, and interactions

Sat: Statistical analysis on a dataset that consists of a population

Sun: An amusing window into folk genetics