Skip to content
Archive of posts filed under the Statistical computing category.

How Many Mic’s Do We Rip

Yakir Reshef writes: Our technical comment on Kinney and Atwal’s paper on MIC and equitability has come out in PNAS along with their response. Similarly to Ben Murrell, who also wrote you a note when he published a technical comment on the same work, we feel that they “somewhat missed the point.” Specifically: one statistic […]

“A hard case for Mister P”

Kevin Van Horn sent me an email with the above title (ok, he wrote MRP, but it’s the same idea) and the following content: I’m working on a problem that at first seemed like a clear case where multilevel modeling would be useful. As I’ve dug into it I’ve found that it doesn’t quite fit […]

Cool new position available: Director of the Pew Research Center Labs

Peter Henne writes: I wanted to let you know about a new opportunity at Pew Research Center for a data scientist that might be relevant to some of your colleagues. I [Henne] am a researcher with the Pew Research Center, where I manage an international index on religious issues. I am also working with others […]

Stanny Stanny Stannitude

On the stan-users list, Richard McElreath reports: With 2.4 out, I ran a quick test of how much speedup I could get by changing my old non-vectorized multi_normal sampling to the new vectorized form. I get a 40% time savings, without even trying hard. This is much better than I expected. Timings with vectorized multi_normal: […]

SciLua 2 includes NUTS

The most recent release of SciLua includes an implementation of Matt’s sampler, NUTS (link is to the final JMLR paper, which is a revision of the earlier arXiv version). According to the author of SciLua, Stefano Peluchetti: Should be quite similar to your [Stan’s] implementation with some differences in the adaptation strategy. If you have […]

Stan 2.4, New and Improved

We’re happy to announce that all three interfaces (CmdStan, PyStan, and RStan) are up and ready to go for Stan 2.4. As usual, you can find full instructions for installation on the Stan Home Page. Here are the release notes with a list of what’s new and improved: New Features ———— * L-BFGS optimization (now […]

NYC workshop 22 Aug on open source machine learning systems

The workshop is organized by John Langford (Microsoft Research NYC), along with Alekh Agarwal and Alina Beygelzimer, and it features Liblinear, Vowpal Wabbit, Torch, Theano, and . . . you guessed it . . . Stan! Here’s the current program: 8:55am: Introduction 9:00am: Liblinear by CJ Lin. 9:30am: Vowpal Wabbit and Learning to Search (John […]

Stan World Cup update

The other day I fit a simple model to estimate team abilities from World Cup outcomes. I fit the model to the signed square roots of the score differentials, using the square root on the theory that when the game is less close, it becomes more variable. 0. Background As you might recall, the estimated […]

Stan goes to the World Cup

I thought it would be fun to fit a simple model in Stan to estimate the abilities of the teams in the World Cup, then I could post everything here on the blog, the whole story of the analysis from beginning to end, showing the results of spending a couple hours on a data analysis. […]

Useless Algebra, Inefficient Computation, and Opaque Model Specifications

I (Bob, not Andrew) doubt anyone sets out to do algebra for the fun of it, implement an inefficient algorithm, or write a paper where it’s not clear what the model is. But… Why not write it in BUGS or Stan? Over on the Stan users group, Robert Grant wrote Hello everybody, I’ve just been […]

Comment of the week

This one, from DominikM: Really great, the simple random intercept – random slope mixed model I did yesterday now runs at least an order of magnitude faster after installing RStan 2.3 this morning. You are doing an awesome job, thanks a lot!

(Py, R, Cmd) Stan 2.3 Released

We’re happy to announce RStan, PyStan and CmdStan 2.3. Instructions on how to install at: http://mc-stan.org/ As always, let us know if you’re having problems or have comments or suggestions. We’re hoping to roll out the next release a bit quicker this time, because we have lots of good new features that are almost ready […]

Judicious Bayesian Analysis to Get Frequentist Confidence Intervals

Christian Bartels has a new paper, “Efficient generic integration algorithm to determine confidence intervals and p-values for hypothesis testing,” of which he writes: The paper proposes to do an analysis of observed data which may be characterized as doing a judicious Bayesian analysis of the data resulting in the determination of exact frequentist p-values and […]

Average predictive comparisons in R: David Chudzicki writes a package!

Here it is: An R Package for Understanding Arbitrary Complex Models As complex models become widely used, it’s more important than ever to have ways of understanding them. Even when a model is built primarily for prediction (rather than primarily as an aid to understanding), we still need to know what it’s telling us. For […]

My answer: Write a little program to simulate it

Brendon Greeff writes: I was searching for an online math blog and found your email address. I have a question relating to the draw for a sports tournament. If there are 20 teams in a tournament divided into 4 groups, and those teams are selected based on four “bands” (Band: 1-5 ranked teams, 6-10, 11-15, […]

Stan is Turing Complete. So what?

This post is by Bob Carpenter. Stan is Turing complete! There seems to a persistent misconception that Stan isn’t Turing complete.1, 2 My guess is that it stems from Stan’s (not coincidental) superficial similarity to BUGS and JAGS, which provide directed graphical model specification languages. Stan’s Turing completeness follows from its support of array data […]

Superfast Metrop using data partitioning, from Marco Banterle, Clara Grazian, and Christian Robert

Superfast not because of faster convergence but because they use a clever acceptance/rejection trick so that most of the time they don’t have to evaluate the entire target density. It’s written in terms of single-step Metropolis but I think it should be possible to do it in HMC or Nuts, in which case we could […]

Bayesian nonparametric weighted sampling inference

Yajuan Si, Natesh Pillai, and I write: It has historically been a challenge to perform Bayesian inference in a design-based survey context. The present paper develops a Bayesian model for sampling inference using inverse-probability weights. We use a hierarchical approach in which we model the distribution of the weights of the nonsampled units in the […]

WAIC and cross-validation in Stan!

Aki and I write: The Watanabe-Akaike information criterion (WAIC) and cross-validation are methods for estimating pointwise out-of-sample prediction accuracy from a fitted Bayesian model. WAIC is based on the series expansion of leave-one-out cross-validation (LOO), and asymptotically they are equal. With finite data, WAIC and cross-validation address different predictive questions and thus it is useful […]

An interesting mosaic of a data programming course

Rajit Dasgupta writes: I have been working on a website, SlideRule that in its present state, is a catalog of online courses aggregated from over 35 providers. One of the products we are building on top of this is something called Learning Paths, which are essentially a sequence of Online Courses designed to help learners […]