From a response on the Stan help list:

Yes, indeed, I think it would be a good idea to reduce the scale on priors of the form U(0,100) or N(0,100^2). This won’t solve all problems but it can’t hurt.

If the issue is that the variance parameter can be very small in the estimation, yes, one approach would be to put in a prior that keeps the variance away from 0 (lognormal, gamma, whatever), another approach would be to use the Matt trick. Some mixture of these ideas might help.

And, by the way: when you do these things it might feel like an awkward bit of kluging to play around with the model to get it to convert properly. But the kluges of today are the textbook solutions of tomorrow. When it comes to statistical modeling, we’re living in beta-test world; we should appreciate the opportunities this gives us!

[…] “When it comes to statistical modeling, we’re living in beta-test world” http://andrewgelman.com/2013/… […]

At first I thought it was just a typo, then I discovered the interesting distinction between kluge (US usage) and kludge (UK usage): http://en.wiktionary.org/wiki/kluge

Thanks! I was confused why we were appreciating a “hack”.

Thanks for this — I was in the same confused boat as Rahul. Actually, I initially thought that the title of this post was a warning, viz., “Get it right today because whatever example you set could be followed for way longer than you envision.” A sort of statistical “Think of the CHILDREN!”

David Cox often said today’s adhockery is tomorrow’s good theory.

All the challenges of nuisance parameters does seem to be less well appreciated when they can be just averaged over – without much thought. Models (or representations) need to be purposeful rather than just practical (as CS Peirce would argue).