Skip to content
Archive of posts filed under the Multilevel Modeling category.

What I missed on fixed effects (plural).

In my [Keith] previous post that criticised a publish paper, the first author commented they wanted some time to respond and I agreed. I also suggested that if the response came in after most readers have moved on I would re-post their response as a new post pointing back to the previous. So here we are. […]

Using Mister P to get population estimates from respondent driven sampling

From one of our exams: A researcher at Columbia University’s School of Social Work wanted to estimate the prevalence of drug abuse problems among American Indians (Native Americans) living in New York City. From the Census, it was estimated that about 30,000 Indians live in the city, and the researcher had a budget to interview […]

The Publicity Factory: How even serious research gets exaggerated by the process of scientific publication and reporting

The starting point is that we’ve seen a lot of talk about frivolous science, headline-bait such as the study that said that married women are more likely to vote for Mitt Romney when ovulating, or the study that said that girl-named hurricanes are more deadly than boy-named hurricanes, and at this point some of these […]

Does traffic congestion make men beat up their wives?

Max Burton-Chellew writes: I thought this paper and news story (links fixed) might be worthy of your blog? I’m no stats expert, far from it, but this paper raised some alarms for me. If the paper is fine then sorry for wasting your time, if it’s terrible then sorry for ruining your day! Why alarms […]

No tradeoff between regularization and discovery

We had a couple recent discussions regarding questionable claims based on p-values extracted from forking paths, and in both cases (a study “trying large numbers of combinations of otherwise-unused drugs against a large number of untreatable illnesses,” and a salami-slicing exercise looking for public opinion changes in subgroups of the population), I recommended fitting a […]

Beyond forking paths: using multilevel modeling to figure out what can be learned from this survey experiment

Under the heading, “Incompetent leaders as a protection against elite betrayal,” Tyler Cowen linked to this paper, “Populism and the Return of the ‘Paranoid Style’: Some Evidence and a Simple Model of Demand for Incompetence as Insurance against Elite Betrayal,” by Rafael Di Tella and Julio Rotemberg. From a statistical perspective, the article by Tella […]

Partial pooling with informative priors on the hierarchical variance parameters: The next frontier in multilevel modeling

Ed Vul writes: In the course of tinkering with someone else’s hairy dataset with a great many candidate explanatory variables (some of which are largely orthogonal factors, but the ones of most interest are competing “binning” schemes of the same latent elements). I wondered about the following “model selection” strategy, which you may have alluded […]

“Do statistical methods have an expiration date?” My talk at the University of Texas this Friday 2pm

Fri 6 Oct at the Seay Auditorium (room SEA 4.244): Do statistical methods have an expiration date? Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University There is a statistical crisis in science, particularly in psychology where many celebrated findings have failed to replicate, and where careful analysis has revealed that many […]

Response to some comments on “Abandon Statistical Significance”

The other day, Blake McShane, David Gal, Christian Robert, Jennifer Tackett, and I wrote a paper, Abandon Statistical Significance, that began: In science publishing and many areas of research, the status quo is a lexicographic decision rule in which any result is first required to have a p-value that surpasses the 0.05 threshold and only […]

The “fish MRI” of international relations studies.

Kevin Lewis pointed me to this paper by Stephen Chaudoin, Jude Hays and Raymond Hicks, “Do We Really Know the WTO Cures Cancer?”, which begins: This article uses a replication experiment of ninety-four specifications from sixteen different studies to show the severity of the problem of selection on unobservables. Using a variety of approaches, it […]

Getting the right uncertainties when fitting multilevel models

Cesare Aloisi writes: I am writing you regarding something I recently stumbled upon in your book Data Analysis Using Regression and Multilevel/Hierarchical Models which confused me, in hopes you could help me understand it. This book has been my reference guide for many years now, and I am extremely grateful for everything I learnt from […]

Causal inference using data from a non-representative sample

Dan Gibbons writes: I have been looking at using synthetic control estimates for estimating the effects of healthcare policies, particularly because for say county-level data the nontreated comparison units one would use in say a difference-in-differences estimator or quantile DID estimator (if one didn’t want to use the mean) are not especially clear. However, given […]

Job openings at online polling company!

Kyle Dropp of online polling firm Morning Consult says they are hiring a bunch of mid-level data scientists and software engineers at all levels: About Morning Consult: We are interviewing about 10,000 adults every day in the U.S. and ~20 countries, we have worked with 150+ Fortune 500 companies and industry associations and we are […]

How to design and conduct a subgroup analysis?

Brian MacGillivray writes: I’ve just published a paper that draws on your work on the garden of forking paths, as well as your concept of statistics as being the science of defaults. The article is called, “Characterising bias in regulatory risk and decision analysis: An analysis of heuristics applied in health technology appraisal, chemicals regulation, […]

Causal identification + observational study + multilevel model

Sam Portnow writes: I am attempting to model the impact of tax benefits on children’s school readiness skills. Obviously, benefits themselves are biased, so I am trying to use the doubling of the maximum allowable additional child tax credit in 2003 to get an unbiased estimate of benefits. I was initially planning to attack this […]

The Pandora Principle in statistics — and its malign converse, the ostrich

The Pandora Principle is that once you’ve considered a possible interaction or bias or confounder, you can’t un-think it. The malign converse is when people realize this and then design their studies to avoid putting themselves in a position where they have to consider some potentially important factor. For example, suppose you’re considering some policy […]

What explains my lack of openness toward this research claim? Maybe my cortex is just too damn thick and wrinkled

Diana Senechal writes: Yesterday Cari Romm reported that researchers had found a relation between personality traits and cortex shape: “People who scored higher on openness tended to have thinner and smoother cortices, while those who scored high on neuroticism had cortices that were thicker and more wrinkled.” I [Senechal] looked up the study itself ( […]

Multilevel modeling: What it can and cannot do

Today’s post reminded me of this article from 2005: We illustrate the strengths and limitations of multilevel modeling through an example of the prediction of home radon levels in U.S. counties. . . . Compared with the two classical estimates (no pooling and complete pooling), the inferences from the multilevel models are more reasonable. . […]

Adding a predictor can increase the residual variance!

Chao Zhang writes: When I want to know the contribution of a predictor in a multilevel model, I often calculate how much of the total variance is reduced in the random effects by the added predictor. For example, the between-group variance is 0.7 and residual variance is 0.9 in the null model, and by adding […]

Sparse regression using the “ponyshoe” (regularized horseshoe) model, from Juho Piironen and Aki Vehtari

The article is called “Sparsity information and regularization in the horseshoe and other shrinkage priors,” and here’s the abstract: The horseshoe prior has proven to be a noteworthy alternative for sparse Bayesian estimation, but has previously suffered from two problems. First, there has been no systematic way of specifying a prior for the global shrinkage […]