From the Stan users list:
I’m trying out to fit a time series that can have 0 or 1 change point using the sample model from Ch 11.2 of the manual.
To determine if the time series has a change-point or not, would I need to do comparison model (via loo) between 1-change model developed and a model with no change point? Or, would s being close to 1 is sufficient?
You’re gonna hate this, but in general I don’t like models where you try to see if there is a change point. Instead I recommend allowing the size of the change to be a continuous parameter, so that small or zero changes are part of your model. Problem solved!
And then Bob followed up with a question to me:
I do hate it because it shows I don’t know where you want to apply model comparisons and where you don’t. There are chapters in BDA3 that cover LOO and chapters covering predictive posterior checks and parts recommending continuous model expansion. I just can’t reconcile it all.
I think the overall suggestion is the same as mine—fit the change point model and wee what the fit looks like. But do you then simplify the model if the change point isn’t doing anything?
How do you decide?
Bob is right that my philosophy (or, perhaps I should say, my practice) is incoherent. I don’t like model choice, model comparison, or model averaging—but then in practice I do choose models, I do have to compare the models I’m considering, and, much as I dislike model averaging, it’s gotta be better than choosing just one of several candidate models.
Another thing I often say is that I prefer continuous model expansion to discrete model averaging. But where do you draw the line? Any continuous class of models can be discretized, and then what do you do? So I think all I can really recommend is to go continuous as much as possible.
And it does turn out that many problems which are conventionally framed as discrete choices, can instead be written continuously. An example is change point models.
I answered Bob’s question as follows:
There are times when we fit unrelated models, or we have models coming from different sources, and predictive model comparison is one way to evaluate these. Especially the new form where we get the contribution of loo (or waic) for each data point.
But if you have a model (in this case, the change-point model in which the size of the change is an unknown parameter) which includes the smaller model (in this case, the no-change-point model, which is theta=0) as a special case, i’d rather just fit the big model. If you want to say, hey, let’s use the smaller model because it’s simpler and easier to work with, then, sure, I do think that loo or waic can be a way to make this decision. But I definitely wouldn’t frame it as “To determine if the time series has a change-point or not.” The time series, whatever it is, has a change point at every time. The question might be, “Is a change point necessary to model these data?” That’s a question I could get behind.
I never understand specifically what’s Andrew’s suggesting.
I think the takeaway on model-non-comparison was this:
If you have a model (in this case, the change-point model in which the size of the change is an unknown parameter) which includes the smaller model (in this case, the no-change-point model, which is theta=0) as a special case, i’d rather just fit the big model.
My guess is that he has a whole suite of models of varying granularity in mind. Because he goes on to say
The time series, whatever it is, has a change point at every time.
This implies to me that Andrew wants to unask the question of whether the sequence has 0, 1, or T-1 change points? By definition the sequence has T-1 observations. So it’s just a question of what you want to model.
But it’s not a simple modeling change.
The no change-point model is easyfor (t in 1:T) y[t] ~ normal(mu, sigma);
Pardon the non-vectorization — it’s just trying to make it unambiguous there’s a single mu and sigma and T different y.
Allowing one change point you need to marginalize out as in the manual. Then it’s a question of how you model where the change occurs — for a proper prior on a latent continuous change point, you need the CDF differences (for same reason as in our last discussion here on CDF diffs for rounding). Or you can just hack it up as uniform as I did in the example in the manual where the times are evenly spaced.
But when you allow T-1 change points, all of a sudden you’re back to something like the first model, with a time seriesfor (t in 1:T) y[t] ~ normal(mu[t], sigma);
Assuming no stochastic volatility (i.e., a fixed sigma). Then you need a prior for mu, probably one based on autoregression.
And I clarified:
There are 2 issues here.
First is the distribution of change points over time. As Bob notes, I prefer to avoid change points entirely as I think that every moment has its own changes. Something that looks roughly like discrete change points can be constructed using continuous change points with a long tailed distribution of the amount of change.
The second issue is, if we do have a model with a discrete change point at just one time, to avoid the question, “Is there a change point?” by saying that there definitely _is_ a change point of unknown magnitude.
It was the second point that I was trying to make in my response to that earlier question. I think of this second model as a bit of a compromise. Yes, it assumes a particular change point. But it is still a continuous-parameter model, and I like that.