Skip to content
Archive of posts filed under the Statistical computing category.

A fistful of Stan case studies: divergences and bias, identifying mixtures, and weakly informative priors

Following on from his talk at StanCon, Michael Betancourt just wrote three Stan case studies, all of which are must reads: Diagnosing Biased Inference with Divergences: This case study discusses the subtleties of accurate Markov chain Monte Carlo estimation and how divergences can be used to identify biased estimation in practice.   Identifying Bayesian Mixture […]

Advice when debugging at 11pm

Add one feature to your model and test and debug with fake data before going on. Don’t try to add two features at once.

Facebook’s Prophet uses Stan

Sean Taylor, a research scientist at Facebook and Stan user, writes: I wanted to tell you about an open source forecasting package we just released called Prophet:  I thought the readers of your blog might be interested in both the package and the fact that we built it on top of Stan. Under the hood, […]

Research fellow, postdoc, and PhD positions in probabilistic modeling and machine learning in Finland

Probabilistic modeling and machine learning are strong in Finland. Now is your opportunity to join us in this cool country! There are several postdoc and research fellow positions open in probabilistic machine learning in Aalto University and University of Helsinki (deadline Marh 19). Some of the topics are related also to probabilistic programming and Stan […]

Exposure to Stan has changed my defaults: a non-haiku

Now when I look at my old R code, it looks really weird because there are no semicolons Each line of code just looks incomplete As if I were writing my sentences like this Whassup with that, huh Also can I please no longer do <- I much prefer = Please

Lasso regression etc in Stan

[cat picture] Someone on the users list asked about lasso regression in Stan, and Ben replied: In the rstanarm package we have stan_lm(), which is sort of like ridge regression, and stan_glm() with family = gaussian and prior = laplace() or prior = lasso(). The latter estimates the shrinkage as a hyperparameter while the former […]

HMMs in Stan? Absolutely!

I was having a conversation with Andrew that went like this yesterday: Andrew: Hey, someone’s giving a talk today on HMMs (that someone was Yang Chen, who was giving a talk based on her JASA paper Analyzing single-molecule protein transportation experiments via hierarchical hidden Markov models). Maybe we should add some specialized discrete modules to […]

You can fit hidden Markov models in Stan (and thus, also in Stata! and Python! and R! and Julia! and Matlab!)

[cat picture] You can fit finite mixture models in Stan; see section 12 of the Stan manual. You can fit change point models in Stan; see section 14.2 of the Stan manual. You can fit mark-recapture models in Stan; see section 14.2 of the Stan manual. You can fit hidden Markov models in Stan; see […]

Thanks for attending StanCon 2017!

Thank you all for coming and making the first Stan Conference a success! The organizers were blown away by how many people came to the first conference. We had over 150 registrants this year! StanCon 2017 Video The organizers managed to get a video stream on YouTube: https://youtu.be/DJ0c7Bm5Djk. We have over 1900 views since StanCon! (We lost […]

Come and work with us!

Stan is an open-source, state-of-the-art probabilistic programming language with a high-performance Bayesian inference engine written in C++. Stan had been successfully applied to modeling problems with hundreds of thousands of parameters in fields as diverse as econometrics, sports analytics, physics, pharmacometrics, recommender systems, political science, and many more. Research using Stan has been featured in […]

Stan is hiring! hiring! hiring! hiring!

[insert picture of adorable cat entwined with Stan logo] We’re hiring postdocs to do Bayesian inference. We’re hiring programmers for Stan. We’re hiring a project manager. How many people we hire depends on what gets funded. But we’re hiring a few people for sure. We want the best best people who love to collaborate, who […]

Stan JSS paper out: “Stan: A probabilistic programming language”

As a surprise welcome to 2017, our paper on how the Stan language works along with an overview of how the MCMC and optimization algorithms work hit the stands this week. Bob Carpenter, Andrew Gelman, Matthew D. Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. Stan: […]

“A Conceptual Introduction to Hamiltonian Monte Carlo”

Michael Betancourt writes: Hamiltonian Monte Carlo has proven a remarkable empirical success, but only recently have we begun to develop a rigorous understanding of why it performs so well on difficult problems and how it is best applied in practice. Unfortunately, that understanding is con- fined within the mathematics of differential geometry which has limited […]

Michael found the bug in Stan’s new sampler

Gotcha! Michael found the bug! That was a lot of effort, during which time he produced ten pages of dense LaTeX to help Daniel and me understand the algorithm enough to help debug (we’re trying to write a bunch of these algorithmic details up for a more general audience, so stay tuned). So what was […]

“The Fundamental Incompatibility of Scalable Hamiltonian Monte Carlo and Naive Data Subsampling”

Here’s Michael Betancourt writing in 2015: Leveraging the coherent exploration of Hamiltonian flow, Hamiltonian Monte Carlo produces computationally efficient Monte Carlo estimators, even with respect to complex and high-dimensional target distributions. When confronted with data-intensive applications, however, the algorithm may be too expensive to implement, leaving us to consider the utility of approximations such as […]

Some U.S. demographic data at zipcode level conveniently in R

Ari Lamstein writes: I chuckled when I read your recent “R Sucks” post. Some of the comments were a bit … heated … so I thought to send you an email instead. I agree with your point that some of the datasets in R are not particularly relevant. The way that I’ve addressed that is […]

Deep learning, model checking, AI, the no-homunculus principle, and the unitary nature of consciousness

Bayesian data analysis, as my colleagues and I have formulated it, has a human in the loop. Here’s how we put it on the very first page of our book: The process of Bayesian data analysis can be idealized by dividing it into the following three steps: 1. Setting up a full probability model—a joint […]

Only on the internet . . .

I had this bizarrely escalating email exchange. It started with this completely reasonable message: Professor, I was unable to run your code here: https://www.r-bloggers.com/downloading-option-chain-data-from-google-finance-in-r-an-update/ Besides a small typo [you have a 1 after names (options)], the code fails when you actually run the function. The error I get is a lexical error: Error: lexical error: […]

Kaggle Kernels

Anthony Goldbloom writes: In late August, Kaggle launched an open data platform where data scientists can share data sets. In the first few months, our members have shared over 300 data sets on topics ranging from election polls to EEG brainwave data. It’s only a few months old, but it’s already a rich repository for […]

Stan Webinar, Stan Classes, and StanCon

This post is by Eric. We have a number of Stan related events in the pipeline. On 22 Nov, Ben Goodrich and I will be holding a free webinar called Introduction to Bayesian Computation Using the rstanarm R Package. Here is the abstract: The goal of the rstanarm package is to make it easier to use Bayesian […]