Mixture models in Stan: you can use log_mix()

Posted on August 21, 2017 9:58 AM by Andrew

From the Stan manual:

log_mix() . . . I like it. Super-clean.

21 thoughts on “Mixture models in Stan: you can use log_mix()”

G on August 21, 2017 10:18 AM at 10:18 am said:

Is there a way to implement a hierarchical mixture model longitudinally using stan in this same framework? (ie: classifying subjects based on observations made over time.) I’m looking to use a stan equivalent to something like proc traj in SAS.

I’m still a stan novice, so I’m trying to think it out.

Reply ↓
- Stephen Martin on August 21, 2017 10:28 AM at 10:28 am said:
  
  Yep, you can. You can basically mixture-ize anything in stan.
  
  I tend to forgo the log_mix method and use, e.g., target += log_sum_exp(log(theta[1]) + pdf(whatever | params), log(theta[2] + pdf(whatever | params)) etc, because it’s easier to see what is going on, it’s more programmable (can construct each component in a loop and sum across the vector with log_sum_exp), and it extends to k > 2 mixtures.
  
  If you want two trajectory states to be defined, just define two separate sets of parameters, order them in some way for identifiability (intercept tends to work well enough, but it depends on the data), and split the likelihood into k likelihoods weighted by log(theta[k]).
  
  Reply ↓
- Anonymous on August 21, 2017 9:16 PM at 9:16 pm said:
  
  I think mixture modeling is a terrific approach. However, I would caution against using proc traj in SAS. Research by Diallo and colleagues has shown how these assumptions can lead to misspecified models. This article in particular is really informative in this regard:
  
  Diallo, T. M., Morin, A. J., & Lu, H. (2016). Impact of misspecifications of the latent variance–covariance and residual matrices on the class enumeration accuracy of growth mixture models. Structural Equation Modeling: A Multidisciplinary Journal, 23(4), 507-531.
  
  Reply ↓
Björn on August 21, 2017 10:59 AM at 10:59 am said:

It would be a great future to expan this beyond two mixture components (as far as I am aware that is not yet something log_mix covers). With two mixture components the log_sum_exp solution is still not too unpleasant, it is for, say, 5 component mixtures where that writing out all those nested log_sum_exps and all the brackets without any mistakes gets really painful. For that kind of situation a more general log_mix would be super-useful.

Reply ↓
- Bob Carpenter on August 21, 2017 12:20 PM at 12:20 pm said:
  
  [OK, let’s try again. My login must have elapsed while writing the post.]
  
  That was in the original feature request, but nobody’s ever gotten to it. The more-than-two-component version would use a simplex to parallel what we already have.
  
  I’d also like to see a log-odds parameterization for both of these as we do with bernoulli_logit and categorial_logit. These are relatively easy functions to add to the Stan math library.
  
  Longer term, we’ll be looking at even better ways to do this with higher-order types. I want to add simple types and lambdas with closures to Stan. Mitzi’s already got the basic infrastructure plumbed through to let us start generalizing the type system. She’ll be adding tuples first, but we’ll also be looking at adding higher-order types in the future.
  
  Reply ↓
- Sebastian on August 21, 2017 2:33 PM at 2:33 pm said:
  
  … but there is log_sum_exp defined for vectors, 1d arrays and even matrices. So just do
  
  real lmix[5];
  
  lmix[1] = …;
  lmix[2] = …;
  …
  
  And then
  
  target += log_sum_exp(lmix);
  
  Will do it. Not as nice as a log_mix with multiple entries, but better than nesting.
  
  Reply ↓
  - Daniel Lakeland on August 21, 2017 7:31 PM at 7:31 pm said:
    
    This is exactly how I’m doing it for a 7 component mixture prior I have in one of my models, just make the local variable inside the loop to accumulate the individual components of the model, and then at the end log_sum_exp(lmix).
    
    Reply ↓
    - Bioxin on August 21, 2017 9:13 PM at 9:13 pm said:
      
      How do you predict on lmix?
      I have some models where something like log_mix_lpdf
      would solve some issues.
    - Daniel Lakeland on August 21, 2017 10:49 PM at 10:49 pm said:
      
      You mean you want to generate from a mixture posterior? Just at each cycle, for each thing you want to generate, randomly select one of the mixtures with probabilities equal to the mixture probabilities, and generate from that.
    - Sebastian on August 22, 2017 3:23 AM at 3:23 am said:
      
      Uhm… for prediction you should probably read the stan manual. You will need the *posterior* probabilities of each mixture component which is given by the prior predictive of that component and its prior weight. You end up needing the log_sum_exp sum as a normalization constant and the weight of each component is given by each lmix… (so lmix[i]/log_sum_exp(lmix) is the weight for component i) but again, it’s well described in the manual.
    - Bob Carpenter on August 22, 2017 5:09 AM at 5:09 am said:
      
      Both the approach of Daniel Lakeland and Sebastian are presented in the manual. Where possible, the method Sebastian recommends will be a lot more accurate in the sense of computing posterior expectations with lower MCMC error (the Rao-Blackwell theorem provides the theory, but it’s easy to see in practice, as in the example in the manual).
      
      The first change-point example in the latent discrete parameters explains how to do what Sebastian is recommending. Just be careful, because it’s a subtraction, not a division as Sebastian wrote, because it’s on the log scale:
      
      log Pr[y = i] = lmix[i] - log_sum_exp(lmix)
    - Sebastian on August 22, 2017 9:15 AM at 9:15 am said:
      
      Thanks Bob for catching my mistake… of course I meant subtraction as we are on the log scale. Sorry.
    - Daniel Lakeland on August 22, 2017 11:04 AM at 11:04 am said:
      
      I’m not quite clear, I’m pretty sure “You will need the *posterior* probabilities of each mixture component which is given by the prior predictive of that component and its prior weight” has a typo somewhere because prior predictive component * prior weight = prior model
      
      If you’re trying to create a parameter that samples from the posterior of the mixture then you create a parameter “FakeData” and give it the same distribution as your real data.
      
      p(FakeData | Component = 1,other parameters…) p(Component=1) + … + p(FakeData | Component = n, other parameters….) p(Component=n)
      
      Since each component will be calculated as logarithms, remember that to multiply on regular scale, you add logarithms, and to add things, you take the log of the sum of the exponentials… log_sum_exp
      
      if p(Component=1) is itself a simplex parameter then you can just use it, if these parameters don’t already sum to 1, then you need the normalization described above.
      
      I hope that helped someone, at least maybe me. ;-)
    - Stephen Martin on August 22, 2017 12:22 PM at 12:22 pm said:
      
      You can predict in all sorts of ways.
      
      You can technically predict individual weights directly, with some prespecified allowable error. You can predict the posterior weights. You can predict the prior weights as well. All of these tend to work well, to be honest. I actually think most people predict log(p(theta_i)), and then compute the posterior p(theta_i|data) afterward. Seems like most mixture packages work that way.
  - Bob Carpenter on August 22, 2017 5:13 AM at 5:13 am said:
    This is the case where it would be nice to have the feature Aki requested, which is a vector-return log pdf. Then a k-mixture is a one-liner.
    
    int N; int K; simplex[K] theta; real y[N]; ordered[K] mu; vector[K] sigma; ... vector[K] Pr[N]; for (n in 1:N) { vector[K] lp = log(theta) + normal_vec_lpdf(y[n] | mu, sigma); real log_Z = log_sum_exp(lp); target += log_Z; Pr[n] = lp - log_Z; }
    
    and then Pr[n, k] gives the probability that the n-th item is assigned to mixture component k (it’s Pr[z[n] = k] if you think in terms of the latent responsibility parameter z[n] in 1:K).
    Reply ↓
    - Bioxin on August 23, 2017 5:31 AM at 5:31 am said:
      
      Thank you all for the replies.
      I’ll work on an example for the forum.
Alex on August 31, 2017 12:42 PM at 12:42 pm said:

For the first line of Stan code, there is an extra parenthesis at the end.

Reply ↓
- Bob Carpenter on August 31, 2017 1:08 PM at 1:08 pm said:
  
  Thanks, I’ll fix that the next time I update the manual; here’s the issue comment link.
  
  https://github.com/stan-dev/stan/issues/2336#issuecomment-326360838
  
  In general, there’s always a next-manual issue where bugs in the manual may be reported.
  
  Reply ↓
Pavel Garcia on September 7, 2020 9:39 PM at 9:39 pm said:

I am trying to write a model to estimate the proportions of at least three food categories for 10 species of aquatic insects. How should I write the code to make estimation for each species?

Reply ↓
- Andrew on September 7, 2020 9:52 PM at 9:52 pm said:
  
  Pavel:
  
  I recommend you post your question on the Stan Forums.
  
  Reply ↓
  - Pavel Garcia on September 7, 2020 10:00 PM at 10:00 pm said:
    
    Thank you Andrew
    
    Reply ↓

Statistical Modeling, Causal Inference, and Social Science

Mixture models in Stan: you can use log_mix()

21 thoughts on “Mixture models in Stan: you can use log_mix()”

Leave a Reply to Sebastian Cancel reply