Skip to content
Archive of posts filed under the Multilevel Modeling category.

A mess with which I am comfortable

Having established that survey weighting is a mess, I should also acknowledge that, by this standard, regression modeling is also a mess, involving many arbitrary choices of variable selection, transformations and modeling of interaction. Nonetheless, regression modeling is a mess with which I am comfortable and, perhaps more relevant to the discussion, can be extended [...]

Displaying inferences from complex models

David Williams writes: I am completing my doctoral dissertation dealing with modeling adverse birth outcomes. The models are complex with 9 risk factors, 5 area level variables and 4 individual level variables. I used hierarchical logistic regression (SAS glimmix) to analyze the data. I am now faced with reporting the results. Can you please recommend [...]

Hierarchical array priors for ANOVA decompositions

Alexander Volfovsky and Peter Hoff write: ANOVA decompositions are a standard method for describing and estimating heterogeneity among the means of a response variable across levels of multiple categorical factors. In such a decomposition, the complete set of main effects and interaction terms can be viewed as a collection of vectors, matrices and arrays that [...]

Fishing for cherries

Someone writes:

Why big effects are more important than small effects

The title of this post is silly but I have an important point to make, regarding an implicit model which I think many people assume even though it does not really make sense. Following a link from Sanjay Srivastava, I came across a post from David Funder saying that it’s useful to talk about the [...]

Correlation of 1 . . . too good to be true?

Alex Hoffman points me to this interview by Dylan Matthews of education researcher Thomas Kane, who at one point says, Once you corrected for measurement error, a teacher’s score on their chosen videos and on their unchosen videos were correlated at 1. They were perfectly correlated. Hoffman asks, “What do you think? Do you think [...]

What to read to catch up on multivariate statistics?

Henry Harpending writes: I am writing to ask you for a recommendation of something I can read to catch up on multivariate statistics. I am happy with random processes and linear algebra since they are important in population genetics. My last encounter with real statistics was several decades ago. Recently I have had to dip [...]

Heuristics for identifying ecological fallacies?

Greg Laughlin writes: My company just wrote a blog post about the ecological fallacy. There’s a discussion about it on the Hacker News message board. Someone asks, “How do you know [if a group-level finding shouldn't be used to describe individual level behavior]?” The best answer I had was “you can never tell without the [...]

Interaction-based feature selection and classification for high-dimensional biological data

Ilya Esteban writes: In your blog your advice for performing regression in the presence of large numbers of correlated features, has been to use composite scores and hierarchical modeling. Unfortunately, many problems don’t provide an obvious and unambiguous way of grouping features together (e.g. gene expression data). Are there any techniques that you would recommend [...]

Finite-population Anova calculations for models with interactions

Jim Thomson writes: I wonder if you could provide some clarification on the correct way to calculate the finite-population standard deviations for interaction terms in your Bayesian approach to ANOVA (as explained in your 2005 paper, and Gelman and Hill 2007). I understand that it is the SD of the constrained batch coefficients that is [...]

Is it really true that only 8% of people who buy Herbalife products are Herbalife distributors?

A reporter emailed me the other day with a question about a case I’d never heard of before, a company called Herbalife that is being accused of being a pyramid scheme. The reporter pointed me to this document which describes a survey conducted by “a third party firm called Lieberman Research”: Two independent studies took [...]

Prior Selection for Vector Autoregressions

Brendan Nyhan sends along this paper by Domenico Giannone, Michele Lenza, and Giorgio Primiceri: Vector autoregressions are flexible time series models that can capture complex dynamic interrelationships among macroeconomic variables. However, their dense parameterization leads to unstable inference and inaccurate out-of-sample forecasts, particularly for models with many variables. A solution to this problem is to [...]

Fixed effects, followed by Bayes shrinkage?

Stuart Buck writes: I have a question about fixed effects vs. random effects. Amongst economists who study teacher value-added, it has become common to see people saying that they estimated teacher fixed effects (via least squares dummy variables, so that there is a parameter for each teacher), but that they then applied empirical Bayes shrinkage [...]

Comparing people from two surveys, one of which is a simple random sample and one of which is not

Juli writes: I’m helping a professor out with an analysis, and I was hoping that you might be able to point me to some relevant literature… She has two studies that have been completed already (so we can’t go back to the planning stage in terms of sampling, unfortunately). Both studies are based around the [...]

Uri Simonsohn is speaking at Columbia tomorrow (Mon)

Noon in the stat dept (room 903 School of Social Work, at 122/Amsterdam). He’ll be talking about ways of finding fishy p-values. See here and here for background. This stuff is cool and important.

Ways of knowing

In this discussion from last month, computer science student and Judea Pearl collaborator Elias Barenboim expressed an attitude that hierarchical Bayesian methods might be fine in practice but that they lack theory, that Bayesians can’t succeed in toy problems. I posted a P.S. there which might not have been noticed so I will put it [...]

Multilevel modeling and instrumental variables

Terence Teo writes: I was wondering if multilevel models can be used as an alternative to 2SLS or IV models to deal with (i) endogeneity and (ii) selection problems. More concretely, I am trying to assess the impact of investment treaties on foreign investment. Aside from the fact that foreign investment is correlated over time, [...]

D. Buggin

Joe Zhao writes: I am trying to fit my data using the scaled inverse wishart model you mentioned in your book, Data analysis using regression and hierarchical models. Instead of using a uniform prior on the scale parameters, I try to use a log-normal distribution prior. However, I found that the individual coefficients don’t shrink [...]

How I think about mixture models

Larry Wasserman refers to finite mixture models as “beasts” and writes jokes that they “should be avoided at all costs.” I’ve thought a lot about mixture models, ever since using them in an analysis of voting patterns that was published in 1990. First off, I’d like to say that our model was useful so I’d [...]

Some thoughts on survey weighting

From a comment I made in an email exchange: My work on survey adjustments has very much been inspired by the ideas of Rod Little. Much of my efforts have gone toward the goal of integrating hierarchical modeling (which is so helpful for small-area estimation) with post stratification (which adjusts for known differences between sample [...]

Examples of the use of hierarchical modeling to generalize to new settings

In a link to our back-and-forth on causal inference and the use of hierarchical models to bridge between different inferential settings, Elias Bareinboim (a computer scientist who is working with Judea Pearl) writes: In the past week, I have been engaged in a discussion with Andrew Gelman and his blog readers regarding causal inference, selection [...]

Long discussion about causal inference and the use of hierarchical models to bridge between different inferential settings

Elias Bareinboim asked what I thought about his comment on selection bias in which he referred to a paper by himself and Judea Pearl, “Controlling Selection Bias in Causal Inference.” I replied that I have no problem with what he wrote, but that from my perspective I find it easier to conceptualize such problems in [...]

Hierarchical modeling as a framework for extrapolation

Phil recently posted on the challenge of extrapolation of inferences to new data. After telling the story of a colleague who flat-out refused to make predictions from his model of buildings to new data, Phil wrote, “This is an interesting problem because it is sort of outside the realm of statistics, and into some sort [...]

Value-added assessment: What went wrong?

Jacob Hartog writes the following in reaction to my post on the use of value-added modeling for teacher assessment: What I [Hartog] think has been inadequately discussed is the use of individual model specifications to assign these teacher ratings, rather than the zone of agreement across a broad swath of model specifications. For example, the [...]

Average predictive comparisons when changing a pair of variables

Jay Jones writes: I recently came across your paper on average predictive comparisons (Gelman and Pardoe, 2007) and can see many applications for this in my work (I’m an applied statistician working for Weyerhaeuser Company at our R&D center near Seattle). At the moment, I am using APC’s to help describe the results of a [...]