sigma = 0.1 + 1.9 * logistic(sigma_free)

sigma_free = logit((sigma – 0.1) / 1.9)

?

]]>If you choose the bounds carefully, it shouldn’t matter to optimization. If you don’t, you should get some signal that the system’s trying to fit at a boundary (a no-no with optimization as things like the Laplace approximation for uncertainty break down at boundaries).

We’ve found normal priors to be more robust. They can also help guide computation slightly if the data isn’t very informative about values. If you go to the technically more robust priors like low-degree-of-freedom student-t, then you can get strong tail effects when the data isn’t informative, especially if you go all the way to a Cauchy. If you make the interval priors wide enough, they won’t have much effect at all.

]]>The deeper issue is that I think rarely is someone’s prior knowledge in the form of a uniform interval constraint. In my experience, scientists usually know the rough range in which parameters are going to fall and say things like they expect it to be between -5 and 5. When prodded, they’ll usually say things like they think it’ll be 1 and would be very surprised if it were 10. Then they’ll put down a uniform(-5, 5) prior and it winds up having an effect other than what they intended, namely producing a different estimate than a uniform(-10, 10) prior would.

Bouncing might work better. We haven’t tried it in Stan. A simple hand-tooled comparison probably wouldn’t be too hard (where you hand code which parameters have which bounds in the solver rather than plumbing that through the whole framework). We also haven’t tried other alternative parameterizations. These would make great computational projects.

We also haven’t tried any kind of steps-ize variation other than jitter, which didn’t help with anything we tried. With Stan, a step size of 1/x does x times as much work to get to a U-turn point (at least until it hits the maximum tree depth, at which point you begin losing the advantage of HMC as you haven’t moved as far as you could have from the starting point).

]]>However, there may be a problem with people using uniform priors when they don’t actually believe them, because… Well, I don’t know why.

For the most part I think people treat priors (and even model specification) as hyperparameters that need to be tuned in order to get something done in a reasonable time. They are commonly left to sane defaults, at least at first, then become subject to the dark arts. The uniform prior is just a “sane default”.

]]>I’ve been researching using Bayes to infer parameters for models of composite materials with hierarchical structure (we call these models multiscale instead of multilevel) from experimental data. There are a number of differences between this and many data applications since we have deterministic relationships between scales derived from physical principles as well as a pretty good understanding of the ballpark values of our parameters, but uniform priors over a specified interval tend to work pretty well if chosen carefully.

The standard practice for this type of inference is to use optimization to match the model with experimental data where the bounds for each parameter are pre specified, so the analogous prior is uniform over an interval. The optimization approach is riddled with pitfalls since experimental data is expensive and therefore often limited, and Bayes resolves a lot of these pitfalls pretty nicely. However with uniform priors an incorrect specification of the bounds remains a potential issue, but maybe normal priors would be a more robust choice.

]]>In equation “sigma_free = 0.1 + 1.9 * logit(sigma), where logit(u) = log(u / (1 – u))”, if you set sigma := 2, you get logit(sigma) = log(2 / (1 – 2)) which blows the log. ]]>

Thanks for commenting! I have three responses:

1. I don’t generally think that a model is about belief, but, sure, if a uniform distribution is reasonable, I have no problem with it. For example, there are a lot of problems for which it can be reasonable to assign a uniform(0,1) prior distribution to a parameter that represents a probability. My problem (and Bob’s) is with uniform distributions applied when there is no logical or physical constraint. For example, if you have a parameter that you want to say is not going to be too large in absolute value, I don’t recommend a uniform(-10,10) prior; instead I’d prefer something like normal(0,5), because soft constraints generally make more statistical sense and also make the computation smoother (recall the folk theorem).

2. Another problem can arise because other aspects of a model are misspecified. You might think a certain coefficient just has to be positive, but it could be negative in the data because of a hidden interaction. For example, if you raise the price on an item, sales should go down, but price can be correlated with quality, and if you have observational data and quality is not appropriately accounted for in the model, you can end up with the “wrong” sign of the coefficient of price on sales. For this sort of reason I’m wary of hard constraints.

3. With HMC, it can make sense to bounce off boundaries. We didn’t put this bouncing into Stan because it complicates the algorithm, especially in multidimensional problems.

]]>However, there may be a problem with people using uniform priors when they don’t actually believe them, because… Well, I don’t know why. We’re talking here about a one-dimensional parameter. If necessary, you can just sketch a density function by hand, scan your drawing, and after a bit of smoothing and normalization, use this completely custom representation of your prior belief. So why would you use something drastically wrong…?

As far as the computational issue, I think keeping the original parameterization and bouncing off the boundaries might work better than a logistic transformation. Supposing you are transforming, though, you probably should do more than “jitter the step size a bit”. You need to randomly (or systematically) vary the step size by a lot. If you randomly choose amongst 10 step sizes, the penalty is at most a factor of 10, which is probably better than fixing a single (not always appropriate) step size. But you could reduce this penalty a lot using the “short-cut” method described in section 5.6 of my Hamiltonian MCMC review paper, http://www.mcmchandbook.net/HandbookChapter5.pdf

]]>I know there are variants of HMC for bounded parameter spaces (that do sensible things like reflecting the trajectory as it hits the boundary), which might do better for these types of situations.

]]>If you believe the mass can be exactly zero, you’ll have a slightly different computational problem. At the boundaries, derivatives are going to be infinite or not-a-number and thus reject.

]]>The one I’ve come across is source apportionment for particulate air pollution, where your data Y are mass concentrations of p specific ‘species’ (in the simplest cases, elements, so micrograms of arsenic in particles per cubic metre of air) and your model is that Y~FG+e where F are percentage concentrations of each species in k sources (eg, % arsenic in wood smoke) and G are mass concentrations of the sources (micrograms of wood smoke per cubic metre), and e are measurement errors with approximately known distribution. The constraint that all the entries of F and G are non-negative is important; it’s where any identification in the model comes from. And some species are essentially absent from some sources, and some sources are essentially absent at some times, which is again critical to F and G being even partly identifiable.

I don’t think there’s a simple solution here, but fortunately it’s a fairly unusual problem

]]>