What are the trickiest models to fit?

Posted on February 15, 2011 9:54 AM by Andrew

John Salvatier writes:

What do you and your readers think are the trickiest models to fit? If I had an algorithm that I claimed could fit many models with little fuss, what kinds of models would really impress you? I am interested in testing different MCMC sampling methods to evaluate their performance and I want to stretch the bounds of their abilities.

I don’t know what’s the trickiest, but just about anything I work on in a serious way gives me some troubles. This reminds me that we should finish our Bayesian Benchmarks paper already.

7 thoughts on “What are the trickiest models to fit?”

Dana on February 15, 2011 6:44 AM at 6:44 am said:

Hierarchical Bayes models of population variability (heterogeneity) are often the most troublesome for me, sometimes requiring very careful choice of hyperpriors and reparameterization to achieve convergence.
Matt Hoffman on February 15, 2011 10:05 AM at 10:05 am said:

I'd say that the short answer to this question is: any model with an exponential or factorial number of posterior modes. Many mixture models, for example, are easy to get good practical results with, but impossible to get to truly mix between all possible modes in polynomial time. HMMs are worse than basic mixtures, and neural network/deep learning models can be worse than either. Most models of discrete data are also hopelessly multimodal.

For many (most?) commonly used models even finding the point estimate that globally maximizes posterior probability is intractable (in human timescales). If you need to get unbiased samples from the true posterior, you're totally out of luck. (Except in a few pathological cases.)

It all depends on what you mean by "fit," of course. If you can live with the fact that you may be ignoring huge high-mass swaths of the posterior (I usually can) then the question is quite different.
jsalvatier on February 15, 2011 10:15 AM at 10:15 am said:

That does sound terrifyingly difficult. Perhaps a better question is: what models are difficult, but possible to fit? In other words, what models take a lot of human effort to fit?
Bob Carpenter on February 15, 2011 11:38 AM at 11:38 am said:

Just the problem we (in a project with Andrew) are working on, so I'll add my two cents to what Matt's already said about multi-modality.

Cent 1: It's also hard to fit models with something like BUGS or JAGS that involve highly correlated parameters. Not necessarily multimodal, but fairly poor to mix using vanilla Gibbs samplers.

That's why we've been exploring Hamiltonian Monte Carlo, which addresses this problem.

But it brings up another problem, which is efficiently computing gradients and dealing with bounds. For instance, with hierarchical models of covariance, the underlying correlation matrix must be positive definite. So we've been looking at some alternative densities for correlation matrices like Lewandowski et al.'s.

Cent 2: Which brings me to point two: scale. The existing general systems like BUGS and JAGS don't scale well to either large numbers of parameters or large amounts of data.

These two issues interact. For instance, if there aren't very many parameters, we could go from gradient-based Hamiltonian Monte Carlo to Hessian-based Monte Carlo of the type Girolami and Calderhead just introduced. But if we have lots of params, the matrix inversions become intractable.

Bonus cent 3: It'd also be nice to roll in non-parametric models, like say Dirichlet processes or Pitman-Yor processes for discrete data.

Bonus cent 4: It'd also be nice to roll in structure in the models, so that I could do HMMs or even more general undirected graphical models like CRFs, but these often require special-purpose algorithms for efficiency.
Wayne on February 16, 2011 5:21 AM at 5:21 am said:

@Bob: Have you considered ADMB (AD Model Builder: http://admb-project.org/) as your general purpose tool? I just discovered it a week ago, so I can't say much from first-hand usage, but it I think it would be worth a look for many of Andrew's readers.

It's a non-linear modeling tool that started in the fisheries world, and it's basically a fancy preprocessor that takes a template (which can include C++ code) and generates C++ code which it then compiles. (So you need to have a C++ compiler installed on your machine.) It uses automatic differentiation to calculate gradients (using the AUTODIF library) and supports MCMC and random effects models, and they have an example of converting a BUGS model to ADMB that seems reasonable. (In ADMB you have to code some of the machinery that BUGS/JAGS provides under-the-hood, of course.)

It seems to be very fast, and they evidently have models in the fisheries realm that are simply not tractable in R or BUGS/JAGS. The downside is that if you get an error, it can be hard to trace, since you have the ADMB syntax, then the C++ compile, and if your error results in a compile-time message it can be a bit hard to figure out what you did wrong.
Bob Carpenter on February 16, 2011 1:18 PM at 1:18 pm said:

Thanks for the pointer. This dovetails nicely with my earlier blog post on automatic differentiation, where I was asking for advice on which auto-dif software to use.

ADMB is one of the tools I explored during my first pass. I originally dismissed it because their modeling language is too limited for our purposes. I think I also confused it with AMPL, which is another custom modeling language with an auto-dif sub-package — even the first author names are confusable (Fournier and Fourer).

The AUTODIF library in ADMB looks like it's worth investigating. It's not quite as clean as David Gay's RAD package (now part of Sacado, which is part of Trilinos) in that you can't just write templated C++ code and plug in auto-dif. Instead, like CppAD, another widely used auto-dif package, you need to use the reverse AUTODIF types for matrices, vectors, etc. One plus is that they've actually implemented most of the matrix ops over the autodif vars that we care about, so I could skip trying to integrate a templated C++ matrix package like Eigen or Boost or Scythe. Another advantage is that you get more opportunity for optimizing pieces of the gradient within the overall function where they have simple analytic solutions. I really like that they used a BSD license, too — it makes the code much less toxic to non-academics than the GPL.
Wayne on February 16, 2011 2:04 PM at 2:04 pm said:

@Bob: Ah, I hadn't seen that posting. The comparison of BUGS and ADMB is in the Random Effects Module documentation. They claim results that were 25x faster than WinBugs for a mixed logistic regression, and they claim it's up to 50x faster on some problems. Seems like a great project.

Comments are closed.