## A couple of quick thoughts on statistical computing

I just read an article on a faster Gibbs/slice sampler algorithm and it sparked the following thoughts:

1. Faster convergence is not just about speed, it’s also about having confidence that your results are correct. The worst thing about not having a good algorithm is never knowing whether I’ve really fit the model I’m trying to fit. The next worst thing is that then I don’t have the confidence to go to the next, more complicated, model I’d like to try.

2. The researchers run things for 10,000 iterations. In statistical simulation research, 10,000 is a small number of iterations. But in practice, 10,000 can take a long time. If the algorithm is as efficient as claimed, wouldn’t 1000 be enough? This might sound silly but in big problems–or in routine analyses–you don’t necessarily want to wait for 10,000.

1. Corey says:

Could you post a reference or a link to the article?

2. Andrew says:

Corey,

I reviewed the article for a journal so I'm not supposed to post it. If the authors want to post it, that would be fine with me.

3. Jouni says:

Some more thoughts about Markov chain sampling.

I start first to think how many independent draws I would like to have. Say 1000. If I don't know anything about the convergence in the particular inferential situation (model+specific data set), I first run with thinning rate 1 and discard the first half of the simulations. If the chains seem to be running too slowly, I increase the thinning rate (doubling it) and rerun – and repeat until I get to approximate independence. I monitor the R-hat to decide whether we have good enough convergence to stop, after which I also want to take a quick look at the trace plots (because I like Jackson Pollock’s works).

This is an algorithm that can be automated for reasonably uncomplicated problems. It could of course happen that a model fit takes too much time, but then I stop and try to improve the convergence by reparameterization. I might get lucky and get good convergence at thinning rate 1 or then I could end up with thinning rate 16. But I’m not really interested how many iterations I ended up doing.

My point is that in general I don't want to start with the question that how many iterations I need: this is too much dependent on the choice of model+data itself. If I want to re-fit the model to a new set of data, my assumptions about the size of burn-in period and number of iterations may be not valid any more.

All I want is a sample of a certain size, not to "run 10,000 iterations". Perhaps it would be best to think, from the start, about the number of iterations (per chain) needed in terms of multiplies of the desired sample size (per chain)? That is, essentially, the thinning rate.

4. mike says:

which article is this? please provide a reference. thanks!

5. Andrew says:

Jouni,

I agree with you. I think that researchers in simulation methods simply assume large number of iterations without thinking about the question, because they're focused on research on efficiency (which requires lots of simulations to make really sure) rather than on applications.

But, more to the point, what's up with Autograph?

6. David says:

I've got a random question for you. This may sound like gobbledegook, so feel free to say that.

I have a dataset. I want to construct a model. If I do the usual thing, I arrive at some set of [polynomial|fourier|…] coefficients which I regard as 'exact', then talk about how the data is distributed with respect to the model [by talking about correlation coefficient|residuals|…].

How should I think about it if I want to characterize the uncertainty of the coefficients. I have a process where the underlying mechanics are not constant, but have some [polynomial|periodic] variation. I want to regard the data as exact, and talk about the distributions of model coefficients.

Likelihood jumps out as the first answer, but in general, it doesn't normalize [this is why we talk about likelihood ratios, right?]. The chi-squared test doesn't let me relate directly to the coefficients themselves. So, I'm stuck.

In the end, the graph of the temperatures showed a pretty strong periodicity over 400k years [annual samples], but at the end it's a little wonky. I want to see how out of whack those ~10k years are with respect the rest of they samples [so I need to understand the distribution of the possible amplitudes and frequencies].

7. David says:

Whoops. The last paragraph of my comment looks like it comes out of the blue. I was thinking, but failed to mention, about the Vostok ice core data, in particular the temperature reconstructions from them.