Skip to content
 

Running into a Stan Reference by Accident

We were talking about parallelizing MCMC and I came up with what I thought was a neat idea for parallelizing MCMC (sample with fractional prior, average samples on a per-draw basis). But then I realized this approach could get the right posterior mean or right posterior variance, but not both, depending on how the prior was divided (for a beta-binomial example). Then Aki told me it had already been done in a more general form in a paper of Scott et al., Bayes and Big Data, which was then used as the baseline in:

Willie Neiswanger, Chong Wang, and Eric Xing. 2013. Asymptotically Exact, Embarrassingly Parallel MCMC. arXiv 1311.4780.

It’s a neat paper, which Xi’an already blogged about months ago. But what really struck me was the following quote:

We use Stan, an automated Hamiltonian Monte Carlo (HMC) software package, to perform sampling for both the true posterior (for groundtruth and comparison methods) and for the subposteriors on each machine. One advantage of Stan is that it is implemented with C++ and uses the No-U-Turn sampler for HMC, which does not require any user-provided parameters.

It’s sort of like telling someone a story at a cocktail party and then having the story retold to you an hour later by a different person.

9 Comments

  1. Shira says:

    do you mean 2013 (not 2003)?

  2. Aki Vehtari says:

    Year in the reference should be 2013 (no Stan in 2003)

  3. Andrew Beam says:

    If you’re interested in sub-gradients or minibatches, this paper might be of interest as well:

    Stochastic Gradient Hamiltonian Monte Carlo
    http://arxiv.org/abs/1402.4102

  4. Dan Rice says:

    Bob Carpenter, before worrying about parallelizing MCMC, you need to ask why even do MCMC anymore at least for logistic regression problems? Bob has complained about my RELR patent in another post in March of last year here on Andrew’s blog, but it is clear that the RELR (Reduced Error Logistic Regression) method works very well to provide rapid optimal solutions (and yes it is mostly embarrassingly parallel) which have greatly reduced error regression coefficients compared to standard methods, as you can read in my peer-reviewed book Calculus of Thought. At the time the patent was awarded, we had provided similar evidence with numerous papers including proceedings papers reviewed by a peer and presented before a large audience at JSM, along with presentations at a number of other conferences. So, there was substantial evidence to support this patent.

    If it were a bogus method, why would you even be concerned about a patent? Obviously, you are concerned about the patent because you understand that it does exactly what it claims which is to produce dramatically reduced error regression coefficients especially with multicollinear and/or high dimension candidate features and small sample data and automatically without any arbitrary user parameters. Because of the much lower error and the lack of arbitrary or biased user choices, RELR models are much more likely to replicate and which overcomes one of the big problems in science today related to the lack of replication in predictive models based upon observation data.

  5. Dan Rice says:

    P.S. I apologize about the off-topic comments, but I could not respond to the original posting about “Lame Statistics Patents”, as Andrew has already closed that discussion. One point that I acknowledge that was made in that post is that the patent did not give the mathematical or theoretical reasons for why the error modeling in RELR works. It simply showed evidence that is does work. A patent does not need to be a theoretical treatise on a subject; it just needs present a novel and useful and non-obvious invention, which is true of the RELR method. However, that theoretical treatise on why it works mathematically is also now published – and that is the subject of my book recently published by Academic Press called Calculus of Thought.

    Software patents are here to stay as made clear in this week’s Supreme Court hearing and sophisticated, novel and useful machine learning methods seem to be the very kind of software patents that the justices this week indicated that they strongly support (even the opposing attorneys who wished to limit software patents gave the example of mathematical algorithms for encryption as the kind of machine-based method that deserves patenting). Business methods like software that implements hedging likely will not be able to be patented in the future though as was also evident in this supreme court hearing.

    • Andrew says:

      Dan:

      Just to clarify: I did not specifically close that earlier discussion, it’s just that we’ve set the blog to automatically close old discussions after some length of time, to reduce the amount of spam that we get.

  6. Dan Rice says:

    Andrew, thanks for the response. I certainly did not think that you had done it with any malicious intent and understood that it was done automatically as a procedure on your blog. However, it would be nice if you could somehow re-open it. I was not aware that my patent was blatantly and publicly being characterized as “lame” until yesterday.

    Obviously giving a scientist the right to defend his work is not spam.

    Best Regards,

    Dan

  7. Dan Rice says:

    Andrew if you are not going to allow me to defend incredible charges from Bob Carpenter that an extremely powerful and stable method like RELR is “lame statistics” in the actual thread where he made the charges – I will have to point out right here why Bob Carpenter is actually the one who uses lame statistics.

    Bob please correct me if I have misinterpreted this, but I just looked at your LingPipe logistic regression example and your different folds where you compare regression coefficients across folds for stability do not seem to be independent samples – but instead are based upon sampling with replacement as you reference the Wikipedia bootstrapping article in this LingPipe documentation that shows your example logistic regression model. All regression coefficients look stable when there is substantial overlap across training samples in these very artificial and contrived bootstrap validation schemes (with 100% overlap there is 100% stability). Yet such predictive models seldom replicate in the scientific sense where independent validation is required including true independent data samples as we now know from large quality control studies like MAQC-II and Ioannidis.

    Try to understand that the only reason that you see such stability is because you have artificially forced such stability across models built from different training samples because your logistic regression training samples are not independent samples and instead have substantial overlaps with one another. RELR, on the other hand, does show extremely good stability across true independent samples and anyone is free to look at the simple toy Excel models that we provide on the Elsevier companion website to Calculus of Thought to see this.

    Your logistic regression would have horrible stability in the real world scenario where someone else is going to replicate your classification models blindly and independently by pulling an independent sample of data. In fact, it is not unusual to see regression coefficients that have near zero correlations across true independent samples using standard logistic regression or shrinkage methods like L1 or L2 in the face of multicollinear data (see data I present in my book). When we look at the now famous quality control studies like MAQC II or Ioannidis that suggest that predictive models using the standard methods that Bob Carpenter advocates and sells through LingPipe seldom replicate, we have to look at the lame validation and nothing is more lame than overlapping samples when you wish to assess stability of regression coefficients.

    Beyond the obvious problem with overlapping samples when stability is assessed, the lack of replication that is seen in these large quality control studies is more fundamentally related to problematic assumptions that the traditional predictive modeling methods make. Until all the high-minded talk on this blog about lack of replication starts to look at why traditional predictive modeling methods do not replicate including multi-level models that use MCMC where the controversial Random Effects assumption is usually not appropriate, there will be little real science here.

Leave a Reply