Burak Bayramli writes:
In this paper by Sunjin Ahn, Anoop Korattikara, and Max Welling and this paper by Welling and Yee Whye The, there are some arguments on big data and the use of MCMC. Both papers have suggested improvements to speed up MCMC computations. I was wondering what your thoughts were, especially on this paragraph:
When a dataset has a billion data-cases (as is not uncommon these days) MCMC algorithms will not even have generated a single (burn-in) sample when a clever learning algorithm based on stochastic gradients may already be making fairly good predictions. In fact, the intriguing results of Bottou and Bousquet (2008) seem to indicate that in terms of “number of bits learned per unit of computation”, an algorithm as simple as stochastic gradient descent is almost optimally efficient. We therefore argue that for Bayesian methods to remain useful in an age when the datasets grow at an exponential rate, they need to embrace the ideas of the stochastic optimization literature.”
My [Bayramli's] argument against this is that Bayesian models are more expressive, and coupled with MCMC they allowed researchers to solve previously impossible problems. Performance issue can be solved in time, IMHO.
I glanced at the papers only quickly but the general idea makes sense. I’ve thought for awhile that the Bayesian central limit theorem should allow efficient inference via data partitioning, but my only attempt was not particularly successful (which is why this 2005 paper with Zaiying Huang is unpublished; in fact I don’t even recall if we submitted it anywhere). So, just in general terms, I like what Ahn et al. are doing.
I also feel warmly about ideas of combining stochastic optimization with Hamiltonian dynamics and MCMC sampling, as this is what we are doing with Nuts.
Finally, it often seems that methodological advances come from solving applied problems that are in our way. I like the papers you link to because they appear to be motivated by new applications rather than being new methods applied to benchmark problems.