Open problem: How to make residual plots for multilevel models (or for regularized Bayesian and machine-learning predictions more generally)?

Adam Sales writes: I’ve got a question that seems like it should be elementary, but I haven’t seen it addressed anywhere (maybe I’m looking in the wrong places?) When I try to use binned residual plots to evaluate a multilevel … Continue reading

Sparse regression using the “ponyshoe” (regularized horseshoe) model, from Juho Piironen and Aki Vehtari

The article is called “Sparsity information and regularization in the horseshoe and other shrinkage priors,” and here’s the abstract: The horseshoe prior has proven to be a noteworthy alternative for sparse Bayesian estimation, but has previously suffered from two problems. … Continue reading

Avoiding boundary estimates using a prior distribution as regularization

For awhile I’ve been fitting most of my multilevel models using lmer/glmer, which gives point estimates of the group-level variance parameters (maximum marginal likelihood estimate for lmer and an approximation for glmer). I’m usually satisfied with this–sure, point estimation understates … Continue reading

Hey! Here’s a study where all the preregistered analyses yielded null results but it was presented in PNAS as being wholly positive.

Ryan Briggs writes: In case you haven’t seen this, PNAS (who else) has a new study out entitled “Unconditional cash transfers reduce homelessness.” This is the significant statement: A core cause of homelessness is a lack of money, yet few … Continue reading

The continuing challenge of poststratification when we don’t have full joint data on the population.

Torleif Halkjelsvik at the Norwegian Institute of Public Health writes: Norway has very good register data (education/income/health/drugs/welfare/etc.) but it is difficult to obtain complete tables at the population level. It is however easy to get independent tables from different registries … Continue reading

Bayesians moving from defense to offense: “I really think it’s kind of irresponsible now not to use the information from all those thousands of medical trials that came before. Is that very radical?”

Erik van Zwet, Sander Greenland, Guido Imbens, Simon Schwab, Steve Goodman, and I write: We have examined the primary efficacy results of 23,551 randomized clinical trials from the Cochrane Database of Systematic Reviews. We estimate that the great majority of … Continue reading

“You need 16 times the sample size to estimate an interaction than to estimate a main effect,” explained

This has come up before here, and it’s also in Section 16.4 of Regression and Other Stories (chapter 16: “Design and sample size decisions,” Section 16.4: “Interactions are harder to estimate than main effects”). But there was still some confusion … Continue reading

Springboards to overconfidence: How can we avoid . . .? (following up on our discussion of synthetic controls analysis)

Following up on our recent discussion of synthetic control analysis for causal inference, Alberto Abadie points to this article from 2021, Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects. Abadie’s paper is very helpful in that it lays out … Continue reading

How large is the underlying coefficient? An application of the Edlin factor to that claim that “Cash Aid to Poor Mothers Increases Brain Activity in Babies”

Often these posts start with a question that someone sends to me and continue with my reply. This time the q-and-a goes the other way . . . I pointed Erik van Zwet to this post, “I’m skeptical of that … Continue reading

Multilevel modeling to make better decisions using data from schools: How can we do better?

Michael Nelson writes: I wanted to point out a paper, Stabilizing Subgroup Proficiency Results to Improve the Identification of Low-Performing Schools, by Lauren Forrow, Jennifer Starling, and Brian Gill. The authors use Mr. P to analyze proficiency scores of students … Continue reading