His varying slopes don’t seem to follow a normal distribution

Posted on August 24, 2016 12:45 PM by Andrew

Bruce Doré writes:

I have a question about multilevel modeling I’m hoping you can help with.

What should one do when random effects coefficients are clearly not normally distributed (i.e., coef(lmer(y~x+(x|id))) )? Is this a sign that the model should be changed? Or can you stick with this model and infer that the assumption of normally distributed coefficients is incorrect?

I’m seeing strongly leptokurtic random slopes in a context where I have substantive interest in the shape of this distribution. That is, it would be useful to know if there are more individuals with “extreme” and fewer with “moderate” slopes than you’d expect of a normal distribution.

My reply: You can fit a mixture model, or even better you can have a group-level predictor that breaks up your data appropriately. To put it another way: What are your groups? And which are the groups that have low slopes and which have high slopes? Or which have slopes near the middle of the distribution and which have extreme slopes? You could fit a mixture model where the variance varies, but I think you’d be better off with a model using group-level predictors. Also I recommend using Stan which is more flexible than lmer and gives you the full posterior distribution.

Doré then added:

My groups are different people reporting life satisfaction annually surrounding a stressful life event (divorce, bereavement, job loss). I take it that the kurtosis is a clue that there are unobserved person-level factors driving this slope variability? With my current data I don’t have any person-level predictors that could explain this variability, but certainly it would be good to try to find some.

25 thoughts on “His varying slopes don’t seem to follow a normal distribution”

Dean Eckles on August 24, 2016 1:40 PM at 1:40 pm said:

Besides using a finite mixture, one could use an infinite mixture (eg Dirichlet process mixture).

Also, I’m looking forward to better support for discrete variables in Stan. Writing mixtures is currently quite cumbersome in Stan (compared with BUGS and JAGS).

Reply ↓
Daniel Lakeland on August 24, 2016 1:41 PM at 1:41 pm said:

If you don’t have individual predictors that could help create an individual level model, the best you can do is create a frequency model for the size of the effects in the population where you have a representative sample.

In Stan, you can provide a prior over the coefficients that is broad enough to cover the range of things that are reasonable, and then set each person’s coefficient to be from this common prior… Then you’ll have a posterior distribution over each person’s coefficient. But, pooling samples across all the people (integrating the probability over all the people) gives you a posterior distribution of coefficients for “a randomly selected person from my population” which need not at all be normally distributed, even if the prior over coefficients was, and even if the posterior over each person was.

Reply ↓
- Daniel Lakeland on August 24, 2016 1:52 PM at 1:52 pm said:
  
  In other words, this *is* a mixture model where you’re relying on the sampling of people to make the mixture have valid coverage for the population. If you know something about the individuals and can “poststratify” you can weight the individuals by a coefficient. To get a sample from the corrected population just repeatedly choose a person according to their poststratification weight, and then choose a sample of their coefficient. Repeat until you have a large sample of coefficients.
  
  Reply ↓
- Corey on August 24, 2016 4:18 PM at 4:18 pm said:
  
  Right, so the issue here is that the coefficient population is being implicitly modelled as normal, the only uncertainty being in the overall mean and overall variance; we can never update out of the normal population model. This model forces the amount of shrinkage to depend on the ratio of the sampling variance and (estimated) population variance. (This is what lmer does, albeit in a non-fully-Bayes way.) If the underlying population is leptokurtic, then normal model shrinkage will over-shrink the tails and under-shrink the center. But we’re told that even with the over-shrinkage of the tails, the coefficient estimates are too leptokurtic to trust the normal population model — so yes, the model should be changed.
  
  Reply ↓
  - Daniel Lakeland on August 24, 2016 6:05 PM at 6:05 pm said:
    
    In a Bayesian model where you say
    
    coefficient[i] ~ normal(0,1000);
    
    and then you have data that puts the likelihood of coefficient[1] concentrated around say 10, coefficient[2] concentrated around 12, coefficient[3] concentrated around 15, coefficient[4] concentrated around 300 (in a big tail)…
    
    the resulting mixture would be non-normal because although the plausibility you assign to coefficients[i] pre-data is normal, each one concentrates separately around whatever the data[i] suggests. It’s the prior that’s normal, and maybe it’s also the case that the individual posteriors are normal, but *not* the marginal posterior summed across the individual people.
    
    on the other hand, if you say
    
    coefficient[i] ~ normal(mu,sigma)
    
    and mu,sigma are themselves hyper-parameters, then you’re going to have potentially weird distributions over mu,sigma, and the continuous normal mixture you get there could be non-normal. for example a t distribution is a continuous mixture of normals that have some kind of inverse-gamma distribution on the sigma right?
    
    None of this is apropos of lmer, but in Bayesian calcs, the normality of the initial priors and of the individual likelihoods doesn’t imply normality of the finite mixture across the n coefficients. In fact, what you’re doing by fitting a gaussian to each person and then marginalizing, is in essence fitting a gaussian mixture model to the full population.
    
    I think when you say the model should be changed you’re suggesting to move from lmer to the full type of model I’m talking about here, or are you suggesting something else, for example that my suggested model has problems at the prior or hyper-prior level?
    
    Reply ↓
    - Corey on August 24, 2016 10:12 PM at 10:12 pm said:
      
      When Doré writes:
      
      I’m seeing strongly leptokurtic random slopes in a context where I have substantive interest in the shape of this distribution. That is, it would be useful to know if there are more individuals with “extreme” and fewer with “moderate” slopes than you’d expect of a normal distribution.
      
      I take it to mean that what he cares about is not just the values of the coefficients for those individuals in his sample but rather the distribution of the coefficients in the general population of people who undergo stressful life events. If so, what he wants is the posterior predictive distribution of a new coefficient. Letting the index i refers to such a coefficient, then
      
      coefficient[i] ~ normal(0,1000)
      
      prevents any learning at all and
      
      coefficient[i] ~ normal(mu,sigma)
      
      necessarily has 0 excess kurtosis.
    - Corey on August 24, 2016 10:16 PM at 10:16 pm said:
      
      Ack, stupid brainfart:
      
      coefficient[i] ~ normal(mu,sigma)
      
      doesn’t have 0 excess kurtosis; forgot that it gets mixed over the posterior of the hyperparameters. Dumb dumb dumb.
    - Daniel Lakeland on August 24, 2016 10:29 PM at 10:29 pm said:
      
      Sorry, I was just writing the priors, since I have no knowledge of what his likelihood looks like, so
      
      coefficient[i] ~ normal(0,1000) is the prior, and then presumably there some
      
      data[i] ~ somedistribution(f(coefficient[i]))
      
      which is where the learning comes in. Posterior distribution over the coefficient[i] will be concentrated around the result for that individual.
      
      Then, the marginal distribution of ALL the coefficients (for all i) is the distribution he’s interested in (ie. the posterior predictive distribution of a new coefficient)
      
      and, yes, the normal(mu,sigma) mixes across the hyperparameters.
    - Daniel Lakeland on August 25, 2016 1:15 PM at 1:15 pm said:
      
      Ok, I’m now seeing a good point though. In the case where the prior is normal(0,1000) and the posterior is mixed over the N different subjects, you get a finite mixture model for the coefficients. If the number of subjects is small, and the tails are heavy, you might not have a great model here, each subject being spread out it might look like a bunch of little discrete lumps.
      
      On the other hand,
      
      normal(mu,sigma) with broad priors on mu, and sigma, after applying the likelihood to the data, will produce posterior distributions on mu,sigma which give a continuous mixture that is perfectly capable of having the appropriate heavy tail. Then if he uses an extra parameter “randomcoeff” and says:
      
      randomcoeff ~ normal(mu,sigma)
      
      and gets a sample of “randomcoeff” it will be a sample from the posterior predictive distribution for a randomly chosen person from the population using the continuous mixture model.
Diana Senechal on August 24, 2016 2:27 PM at 2:27 pm said:

I may be misinterpreting this–but it comes as no surprise to me that the slopes would be highly variable here and that some would be strongly leptokurtic. It seems that the author is considering a wide range of stressful events at once. Some might have a long-term effect on life satisfaction; others, a large effect over a short time.

A possible group-level predictor might be the effect of the event on one’s economic stability. That’s just a start. A job loss when you have plenty of money in the bank is different from such a loss when you’re steeped in debt.

Another might be the nature of bereavement–in particular, its suddenness and unexpectedness.

I could go on and on. It seems that the study may have overgeneralized stress and life satisfaction from the start. I may be wrong here; maybe the study does break down the various kinds of stressors and losses.

Reply ↓
Mark Benson on August 24, 2016 3:45 PM at 3:45 pm said:

Hi,
I have a question as well. Something that seems very basic but that I could never find (maybe it does not exist?).
I would like to have a cheatsheet to find a distribution.
I am thinking of a decision tree..
Something like this: Is your distribution discrete? => Yes => Does the events are success or failure such as a toss coin? => Yes => Do you run a single experiment? => Yes => Your distribution is a Bernoulli…explaining the distribution.
It seems to me that a something like that must already be out somewhere but I really can’t find it.
Did you ever run into such thing?
Best,

Reply ↓
- Rob on August 24, 2016 8:03 PM at 8:03 pm said:
  
  I have seen various flowcharts for choosing statistical tests. Try Googling: “what statistical test to use flowchart”
  to get started.
  
  Reply ↓
  - Andrew on August 24, 2016 8:22 PM at 8:22 pm said:
    
    Rob:
    
    Oooh, I could do this! Flowchart has one box: “Do you want to do a statistical test?” The No arrow points to the word, “Good!”. The Yes arrow points back to the box.
    
    Reply ↓
    - Anoneuoid on August 24, 2016 11:04 PM at 11:04 pm said:
      
      Here you go:
      https://i.imgsafe.org/e5fcdd6ddb.jpg
    - Andrew on August 24, 2016 11:24 PM at 11:24 pm said:
      
      Thanks!
  - Mark Benson on August 25, 2016 9:59 AM at 9:59 am said:
    
    Thanks Rob,
    I have seen a lot of these flowcharts for choosing a statistical test. What I am looking for is one to choose a statistical distribution.
    Something close to this: https://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&ved=0ahUKEwiS7Ji33NzOAhVHrB4KHYrgAEcQjRwIBw&url=https%3A%2F%2Fmlbartley.wordpress.com%2Fgrad-student-resources%2F&psig=AFQjCNFeW_iz8BJFGe9dvDND7O_-FUs9Yg&ust=1472219831996113
    However, this one assume you already can see the pdf, I am looking for one with a story telling behind it not an observation of a pdf.
    
    Reply ↓
    - Mark Benson on August 25, 2016 10:25 AM at 10:25 am said:
      
      Sorry, somehow the previous link was not working. Here is the correct one:
      http://www.prioritysystem.com/images/distributionflowchart.png
- Matt Zefferman on August 25, 2016 8:22 AM at 8:22 am said:
  
  See Page 192 of McElreath’s Statistical Rethinking for exponential family distributions.
  
  Reply ↓
  - Mark Benson on August 25, 2016 10:23 AM at 10:23 am said:
    
    Hi Matt,
    I am assuming you meant Statistical Rethinking: A Bayesian Course with Examples in R and Stan (Chapman & Hall/CRC Texts in Statistical Science) (https://www.amazon.com/Statistical-Rethinking-Bayesian-Examples-Chapman/dp/1482253445) since I could not find Statistical Rethinking for exponential family distributions.
    I skimmed through the book and on the edition I have there is nothing such as a statistical distribution flowchart. Are you sure this is the right book?
    Best,
    
    Reply ↓
    - Matt Zefferman on August 25, 2016 10:48 AM at 10:48 am said:
      
      Yes. My comment would be more clear if it had read: “For exponential family distributions, see page 192 of McElreath’s Statistical Rethinking.” It is Figure 9.6 in my edition.
    - Matt Zefferman on August 25, 2016 10:52 AM at 10:52 am said:
      
      Also, Chapter 11 discusses mixtures like zero-inflated and ordered categorical distributions – but no flow chart.
    - Mark Benson on August 25, 2016 11:44 AM at 11:44 am said:
      
      Thank you Matt,
      I found the chart. It is a nice chart but not exactly what I am looking for. I am looking for a “story” distribution picker flowchart.
      To come back to the example I gave before, I could complete it like that:
      1. Is your distribution discrete? => 2.1 Yes
      =>3.1 Does the events are success or failure such as a toss coin? => 4.1 Yes =>5.1 Do you run a single experiment? => 6.1 Yes => 7.1 Your distribution is a Bernoulli
      If you had answered No at 5.1:
      5.1 Do you run a single experiment? => 6.2 No => Do you want to measure a first success => Yes => Do you want to include the success in the count? => No => This is the geometric distribution where X is the number of “failures” that we will achieve before we achieve our first success. Our successes have probability p and is noted Geom(p).
      
      Do you see what I mean?
Nick Menzies on August 25, 2016 8:54 AM at 8:54 am said:

I am surprised not to see a suggestion to use a more flexible distribution for the random effects. Would another option be to assume the random effects were distributed Student’s t, with an uncertain hyperparameter for the df? I have seen this kind of model at the observation level but not at the group level before, so not sure if there is a catch.

Reply ↓
- Andrew on August 25, 2016 9:13 AM at 9:13 am said:
  
  Nick:
  
  Yes, you can do that. See here for suggestions on a default prior distribution for the df parameter.
  
  Reply ↓
- Dean Eckles on August 25, 2016 12:37 PM at 12:37 pm said:
  
  I suggested using an infinite mixture above, though I was thinking of a countably infinite mixture rather than a continuous mixture over sigma, like the t.
  
  Reply ↓

Statistical Modeling, Causal Inference, and Social Science

His varying slopes don’t seem to follow a normal distribution

25 thoughts on “His varying slopes don’t seem to follow a normal distribution”

Leave a Reply Cancel reply