Mixture models in Stan: you can use log_mix()

From the Stan manual:

log_mix() . . . I like it. Super-clean.

1. G says:

Is there a way to implement a hierarchical mixture model longitudinally using stan in this same framework? (ie: classifying subjects based on observations made over time.) I’m looking to use a stan equivalent to something like proc traj in SAS.

I’m still a stan novice, so I’m trying to think it out.

• Stephen Martin says:

Yep, you can. You can basically mixture-ize anything in stan.

I tend to forgo the log_mix method and use, e.g., target += log_sum_exp(log(theta[1]) + pdf(whatever | params), log(theta[2] + pdf(whatever | params)) etc, because it’s easier to see what is going on, it’s more programmable (can construct each component in a loop and sum across the vector with log_sum_exp), and it extends to k > 2 mixtures.

If you want two trajectory states to be defined, just define two separate sets of parameters, order them in some way for identifiability (intercept tends to work well enough, but it depends on the data), and split the likelihood into k likelihoods weighted by log(theta[k]).

• Anonymous says:

I think mixture modeling is a terrific approach. However, I would caution against using proc traj in SAS. Research by Diallo and colleagues has shown how these assumptions can lead to misspecified models. This article in particular is really informative in this regard:

Diallo, T. M., Morin, A. J., & Lu, H. (2016). Impact of misspecifications of the latent variance–covariance and residual matrices on the class enumeration accuracy of growth mixture models. Structural Equation Modeling: A Multidisciplinary Journal, 23(4), 507-531.

2. Björn says:

It would be a great future to expan this beyond two mixture components (as far as I am aware that is not yet something log_mix covers). With two mixture components the log_sum_exp solution is still not too unpleasant, it is for, say, 5 component mixtures where that writing out all those nested log_sum_exps and all the brackets without any mistakes gets really painful. For that kind of situation a more general log_mix would be super-useful.

• [OK, let’s try again. My login must have elapsed while writing the post.]

That was in the original feature request, but nobody’s ever gotten to it. The more-than-two-component version would use a simplex to parallel what we already have.

I’d also like to see a log-odds parameterization for both of these as we do with bernoulli_logit and categorial_logit. These are relatively easy functions to add to the Stan math library.

Longer term, we’ll be looking at even better ways to do this with higher-order types. I want to add simple types and lambdas with closures to Stan. Mitzi’s already got the basic infrastructure plumbed through to let us start generalizing the type system. She’ll be adding tuples first, but we’ll also be looking at adding higher-order types in the future.

• Sebastian says:

… but there is log_sum_exp defined for vectors, 1d arrays and even matrices. So just do

real lmix[5];

lmix[1] = …;
lmix[2] = …;

And then

target += log_sum_exp(lmix);

Will do it. Not as nice as a log_mix with multiple entries, but better than nesting.

• This is exactly how I’m doing it for a 7 component mixture prior I have in one of my models, just make the local variable inside the loop to accumulate the individual components of the model, and then at the end log_sum_exp(lmix).

• Bioxin says:

How do you predict on lmix?
I have some models where something like log_mix_lpdf
would solve some issues.

• You mean you want to generate from a mixture posterior? Just at each cycle, for each thing you want to generate, randomly select one of the mixtures with probabilities equal to the mixture probabilities, and generate from that.

• Sebastian says:

Uhm… for prediction you should probably read the stan manual. You will need the *posterior* probabilities of each mixture component which is given by the prior predictive of that component and its prior weight. You end up needing the log_sum_exp sum as a normalization constant and the weight of each component is given by each lmix… (so lmix[i]/log_sum_exp(lmix) is the weight for component i) but again, it’s well described in the manual.

• Both the approach of Daniel Lakeland and Sebastian are presented in the manual. Where possible, the method Sebastian recommends will be a lot more accurate in the sense of computing posterior expectations with lower MCMC error (the Rao-Blackwell theorem provides the theory, but it’s easy to see in practice, as in the example in the manual).

The first change-point example in the latent discrete parameters explains how to do what Sebastian is recommending. Just be careful, because it’s a subtraction, not a division as Sebastian wrote, because it’s on the log scale:

```log Pr[y = i] = lmix[i] - log_sum_exp(lmix)
```
• Sebastian says:

Thanks Bob for catching my mistake… of course I meant subtraction as we are on the log scale. Sorry.

• I’m not quite clear, I’m pretty sure “You will need the *posterior* probabilities of each mixture component which is given by the prior predictive of that component and its prior weight” has a typo somewhere because prior predictive component * prior weight = prior model

If you’re trying to create a parameter that samples from the posterior of the mixture then you create a parameter “FakeData” and give it the same distribution as your real data.

p(FakeData | Component = 1,other parameters…) p(Component=1) + … + p(FakeData | Component = n, other parameters….) p(Component=n)

Since each component will be calculated as logarithms, remember that to multiply on regular scale, you add logarithms, and to add things, you take the log of the sum of the exponentials… log_sum_exp

if p(Component=1) is itself a simplex parameter then you can just use it, if these parameters don’t already sum to 1, then you need the normalization described above.

I hope that helped someone, at least maybe me. ;-)

• Stephen Martin says:

You can predict in all sorts of ways.

You can technically predict individual weights directly, with some prespecified allowable error. You can predict the posterior weights. You can predict the prior weights as well. All of these tend to work well, to be honest. I actually think most people predict log(p(theta_i)), and then compute the posterior p(theta_i|data) afterward. Seems like most mixture packages work that way.

• This is the case where it would be nice to have the feature Aki requested, which is a vector-return log pdf. Then a k-mixture is a one-liner.

```int N;
int K;
simplex[K] theta;
real y[N];
ordered[K] mu;
vector[K] sigma;
...
vector[K] Pr[N];
for (n in 1:N) {
vector[K] lp = log(theta) + normal_vec_lpdf(y[n] | mu, sigma);
real log_Z = log_sum_exp(lp);
target += log_Z;
Pr[n] = lp - log_Z;
}
```

and then Pr[n, k] gives the probability that the n-th item is assigned to mixture component k (it’s Pr[z[n] = k] if you think in terms of the latent responsibility parameter z[n] in 1:K).

• Bioxin says:

Thank you all for the replies.
I’ll work on an example for the forum.

3. Alex says:

For the first line of Stan code, there is an extra parenthesis at the end.