Skip to content
Archive of posts filed under the Multilevel Modeling category.

Wine + Stan + Climate change = ?

Pablo Almaraz writes: Recently, I published a paper in the journal Climate Research in which I used RStan to conduct the statistical analyses: Almaraz P (2015) Bordeaux wine quality and climate fluctuations during the last century: changing temperatures and changing industry. Clim Res 64:187-199.

Spatial models for demographic trends?

Jon Minton writes: You may be interested in a commentary piece I wrote early this year, which was published recently in the International Journal of Epidemiology, where I discuss your work on identifying an aggregation bias in one of the key figures in Case & Deaton’s (in)famous 2015 paper on rising morbidity and mortality in […]

Fitting multilevel models when predictors and group effects correlate

Ryan Bain writes: I came across your ‘Fitting Multilevel Models When Predictors and Group Effects Correlate‘ paper that you co-authored with Dr. Bafumi and read it with great interest. I am a current postgraduate student at the University of Glasgow writing a dissertation examining explanations of Euroscepticism at the individual and country level since the […]

Noisy, heterogeneous data scoured from diverse sources make his metanalyses stronger.

Kyle MacDonald writes: I wondered if you’d heard of Purvesh Khatri’s work in computational immunology, profiled in this Q&A with Esther Landhuis at Quanta yesterday. Elevator pitch is that he believes noisy, heterogeneous data scoured from diverse sources make his metanalyses stronger. The thing that gave me the woollies was this line: “We start with […]

What I missed on fixed effects (plural).

In my [Keith] previous post that criticised a publish paper, the first author commented they wanted some time to respond and I agreed. I also suggested that if the response came in after most readers have moved on I would re-post their response as a new post pointing back to the previous. So here we are. […]

Using Mister P to get population estimates from respondent driven sampling

From one of our exams: A researcher at Columbia University’s School of Social Work wanted to estimate the prevalence of drug abuse problems among American Indians (Native Americans) living in New York City. From the Census, it was estimated that about 30,000 Indians live in the city, and the researcher had a budget to interview […]

The Publicity Factory: How even serious research gets exaggerated by the process of scientific publication and reporting

The starting point is that we’ve seen a lot of talk about frivolous science, headline-bait such as the study that said that married women are more likely to vote for Mitt Romney when ovulating, or the study that said that girl-named hurricanes are more deadly than boy-named hurricanes, and at this point some of these […]

Does traffic congestion make men beat up their wives?

Max Burton-Chellew writes: I thought this paper and news story (links fixed) might be worthy of your blog? I’m no stats expert, far from it, but this paper raised some alarms for me. If the paper is fine then sorry for wasting your time, if it’s terrible then sorry for ruining your day! Why alarms […]

No tradeoff between regularization and discovery

We had a couple recent discussions regarding questionable claims based on p-values extracted from forking paths, and in both cases (a study “trying large numbers of combinations of otherwise-unused drugs against a large number of untreatable illnesses,” and a salami-slicing exercise looking for public opinion changes in subgroups of the population), I recommended fitting a […]

Beyond forking paths: using multilevel modeling to figure out what can be learned from this survey experiment

Under the heading, “Incompetent leaders as a protection against elite betrayal,” Tyler Cowen linked to this paper, “Populism and the Return of the ‘Paranoid Style’: Some Evidence and a Simple Model of Demand for Incompetence as Insurance against Elite Betrayal,” by Rafael Di Tella and Julio Rotemberg. From a statistical perspective, the article by Tella […]

Partial pooling with informative priors on the hierarchical variance parameters: The next frontier in multilevel modeling

Ed Vul writes: In the course of tinkering with someone else’s hairy dataset with a great many candidate explanatory variables (some of which are largely orthogonal factors, but the ones of most interest are competing “binning” schemes of the same latent elements). I wondered about the following “model selection” strategy, which you may have alluded […]

“Do statistical methods have an expiration date?” My talk at the University of Texas this Friday 2pm

Fri 6 Oct at the Seay Auditorium (room SEA 4.244): Do statistical methods have an expiration date? Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University There is a statistical crisis in science, particularly in psychology where many celebrated findings have failed to replicate, and where careful analysis has revealed that many […]

Response to some comments on “Abandon Statistical Significance”

The other day, Blake McShane, David Gal, Christian Robert, Jennifer Tackett, and I wrote a paper, Abandon Statistical Significance, that began: In science publishing and many areas of research, the status quo is a lexicographic decision rule in which any result is first required to have a p-value that surpasses the 0.05 threshold and only […]

The “fish MRI” of international relations studies.

Kevin Lewis pointed me to this paper by Stephen Chaudoin, Jude Hays and Raymond Hicks, “Do We Really Know the WTO Cures Cancer?”, which begins: This article uses a replication experiment of ninety-four specifications from sixteen different studies to show the severity of the problem of selection on unobservables. Using a variety of approaches, it […]

Getting the right uncertainties when fitting multilevel models

Cesare Aloisi writes: I am writing you regarding something I recently stumbled upon in your book Data Analysis Using Regression and Multilevel/Hierarchical Models which confused me, in hopes you could help me understand it. This book has been my reference guide for many years now, and I am extremely grateful for everything I learnt from […]

Causal inference using data from a non-representative sample

Dan Gibbons writes: I have been looking at using synthetic control estimates for estimating the effects of healthcare policies, particularly because for say county-level data the nontreated comparison units one would use in say a difference-in-differences estimator or quantile DID estimator (if one didn’t want to use the mean) are not especially clear. However, given […]

Job openings at online polling company!

Kyle Dropp of online polling firm Morning Consult says they are hiring a bunch of mid-level data scientists and software engineers at all levels: About Morning Consult: We are interviewing about 10,000 adults every day in the U.S. and ~20 countries, we have worked with 150+ Fortune 500 companies and industry associations and we are […]

How to design and conduct a subgroup analysis?

Brian MacGillivray writes: I’ve just published a paper that draws on your work on the garden of forking paths, as well as your concept of statistics as being the science of defaults. The article is called, “Characterising bias in regulatory risk and decision analysis: An analysis of heuristics applied in health technology appraisal, chemicals regulation, […]

Causal identification + observational study + multilevel model

Sam Portnow writes: I am attempting to model the impact of tax benefits on children’s school readiness skills. Obviously, benefits themselves are biased, so I am trying to use the doubling of the maximum allowable additional child tax credit in 2003 to get an unbiased estimate of benefits. I was initially planning to attack this […]

The Pandora Principle in statistics — and its malign converse, the ostrich

The Pandora Principle is that once you’ve considered a possible interaction or bias or confounder, you can’t un-think it. The malign converse is when people realize this and then design their studies to avoid putting themselves in a position where they have to consider some potentially important factor. For example, suppose you’re considering some policy […]